01How an LLM Writes Code ── It Is Only Guessing "the Next Word"
Let's clear up the most common misunderstanding first. An LLM does not write by understanding the meaning of a program. What it does is remarkably simple ── it predicts, one at a time, the word (token) most likely to come next as a continuation of the string so far, and lines those tokens up. This mechanism rests on a structure called the Transformer (= an architecture, published in 2017, that computes which words each word in a text should pay attention to).
To an LLM, code is just another kind of text. Write def, and a function name is likely to follow. Write for i in, and the target of the loop tends to come next. Because the model has learned from an enormous volume of public code, these "continuations" come out looking like surprisingly natural programs.
The crucial point is that this prediction optimizes not for "correctness" but for "plausibility." If a piece of code is grammatically natural and resembles a common way of writing, the LLM will output it with full confidence ── even when it does not run. In a single sentence, the principle is this.
02Strengths and Weaknesses ── The Boundary Is Set by "How Much Was in the Training Data"
From the principle of "predicting the continuation," the boundary between strengths and weaknesses follows directly. Patterns that appear abundantly in the training data are strengths; rare patterns are weaknesses. That explains almost everything.
Routine implementation
API calls, data shaping, reading and writing CSV, regular expressions, standard algorithms. For code with countless precedents in the world, both accuracy and speed are high.
Translation and conversion
From Python to JavaScript, from pseudocode to real code, from an error message to a proposed fix. Moving one format into another is well suited to prediction.
Highly novel design
Specifications unique to one organization, in-house APIs, architectures that do not yet exist anywhere. With no model to imitate in the training data, it drifts easily into plausible fabrication (discussed below).
Strict computation and counting
Multi-digit arithmetic, boundary conditions, off-by-one (= off by one) judgments. Probabilistic prediction tends to miss this kind of single, unique correct answer.
The evaluation study of Codex (= a GPT for code generation), published by Chen et al. in 2021, put this boundary into numbers. On a test called HumanEval (= 164 hand-written coding problems), the rate of getting a basic function right on the first try was around 30 percent. But when the model is allowed to attempt the same problem dozens of times and counted as correct if even one attempt succeeds, the success rate rises sharply. "Misses on the first shot, but hits if it fires enough rounds" ── this is the raw nature of LLM code generation, and while later models improved the accuracy, the property itself has not changed.
03The Hallucination Trap ── Calling Functions That Don't Exist, With Full Confidence
The most dangerous form of an LLM's limits is hallucination (= plausible fabrication). Nonexistent libraries, unimplemented functions, wrong argument order ── the LLM writes these with exactly the same confidence as correct code. As a rule, it will not tell you, "This is probably wrong."
Why does it happen? Go back to the principle and it is obvious. Because the LLM writes "the plausible continuation," "the function you wish existed" gets output as if it really existed. If the model decides "there is probably a function that converts a date to the Japanese imperial calendar," it will calmly write a call to a nonexistent to_wareki(). The code looks flawless, and it breaks only when you run it.
04What Is the Human Role ── From Author to Verifier and Designer
Taking all of this together, the human role in AI-era programming comes into focus. The more AI takes on the "writing," the more human work shifts to "what to have it write" and "whether what was written is correct." From the person moving their hands to the person setting the frame and verifying.
Concretely, the work that remains for humans comes down to three things.
- Define the requirements ── What do you want to build, and what output should it return for what input? When this is vague, AI fills in "plausible" code while staying vague.
- Verify correctness ── Does the code that came out really run, does it break at boundary conditions, does it meet the requirements? In principle this cannot be left entirely to AI.
- Bear responsibility ── The decision to publish, deliver, or put into production. "Because AI wrote it" is never a defense, in any situation.
The idea that "if AI writes the code, people get an easier job" is only half right. The effort of writing goes down, but the responsibility for verification and design grows heavier instead. We will confirm this asymmetry again and again throughout the series.
05Why Verification Is Mandatory ── "It Ran" Is Not "It's Correct"
You run AI-written code once, it produces the expected result, and you relax ── that is the most common pitfall. "It ran" is not "it's correct." It merely happened to work for the input you gave; any number of other inputs may still break it.
Verifying traditional code and verifying AI-generated code place their emphasis differently. Laid out, it looks like this.
| When a human writes it all | When AI writes the code |
|---|---|
| The author knows the intent | You first have to read it to confirm intent and implementation have not diverged |
| Bugs appear as "typos" | Bugs slip in as "plausible fabrications," in a form that looks correct |
| The existence of the libraries used is self-evident | You have to check, one by one, whether the libraries and functions actually exist |
| Testing is a finishing step | Testing becomes the central step that determines whether it can be trusted |
So in AI programming, testing (= a mechanism that lines up inputs and expected outputs and checks them automatically) is upgraded from "nice to have" to "mandatory." Verification you could skip with human-written code cannot be skipped for AI-generated code ── the faster you can write, the thicker you make the checking step. That is how you strike the balance.
06Cautions for Medical Software ── Domains Where "Can Generate" Doesn't Mean "Can Use As-Is"
For readers approaching this series from the pharmaceutical and medical fields, there is one point worth stressing especially hard. AI being able to "generate" code, and that code being permissible to "use" in a medical context, are entirely separate matters.
Software that handles patient data, calculates dosages, or bears on diagnostic or treatment decisions falls under the international standard IEC 62304 (= the life-cycle standard for medical device software) as medical device software. This standard requires a record (traceability) of "who verified what, and how" at each stage of development. "We can't explain the basis for the contents because AI generated it" does not fly under this framework.
Further, when you build software that outputs information related to medicine and pharmaceuticals, the advertising regulations of the Pharmaceutical and Medical Devices Act (PMD Act) stand behind it. Under the Act, the prohibition of exaggerated advertising is in Article 66, the prohibition of advertising unapproved drugs is in Article 68, and the proper conduct of information provision in sales information provision activities is in Article 68-2. Even for explanatory text or reports generated by AI, these measures do not loosen in the slightest. Judgment is made not on "who wrote it" but on "what is written."
07The Order of Adoption ── Start Where Risk Is Low, Verify, Then Expand
So how do you bring AI programming into the field? There is one principle ── start where a failure does little harm, build up the verification machinery, and expand little by little into more responsibility-heavy territory. The reverse order ── putting AI into critical production processing from the outset ── is the approach most to be avoided.
- Stage 1 ── throwaway work: one-off data shaping, log aggregation, draft generation. Begin in territory where a mistake just means doing it over.
- Stage 2 ── repeated internal work: automating routine reports, assisting in-house tools. Add tests and confirm it can be used repeatedly.
- Stage 3 ── production with verification: build it into business processes on the firm premise of always passing human review and tests.
- Stage 4 ── regulated targets: territory bearing on patient safety, medical devices, and regulated information. Only through frameworks such as IEC 62304 and formal review.
The fence common to every stage is the verification described in Section 5. The higher the stage, the thicker you make the checking step ── if you think of the purpose of raising speed as freeing up time to spend on verification, you will not get the order wrong.
08The Map of All Ten Installments
Here is the structure of the ten installments this series covers, mapped out in advance. Use it as your compass while reading.
- Vol. 01The Foundations of Code Generation ── What an LLM Can and Cannot Write (this piece)The whole map of principles and limits; the starting point of the series
- Vol. 02Using Copilot ── Practical Craft for Completion-Type AIAI used inside the editor; writing without over-trusting completion
- Vol. 03Conversational Coding ── Working With ChatGPT / ClaudeHow to give instructions, pass context, and design the back-and-forth
- Vol. 04Prompt Design ── Writing Instructions That Get Through to AIThe technique of passing requirements as a frame; reducing ambiguity
- Vol. 05Testing and Verification ── The Machinery for Trusting AI CodeAutomated tests, boundary conditions, review patterns
- Vol. 06Debugging ── Hunting the Cause Together With AIHow to read errors, how to hand them to AI, how to chase root cause
- Vol. 07Agentic Development ── AI Running Multiple Steps on Its OwnThe light and shadow of autonomous execution; designing the scope you delegate
- Vol. 08Security and Confidentiality ── How to Protect Code and InformationHandling confidential information; vulnerabilities in generated output
- Vol. 09Implementation in Pharma ── Using AI Under RegulationIEC 62304, the PMD Act, and coexistence with internal review
- Vol. 10Integration ── Making AI Programming Take Root in the TeamOperating rules, division of responsibility, design as an organization
09Connections to Other Chapters ── Reading Alongside AI Marketing and Material Review
The AI Programming series connects to the other chapters on this site as follows. Reading them together makes your understanding of AI three-dimensional.
- AI Marketing Vol. 1 ── Marketing Redefined ── The whole map of an era in which content is mass-generated by AI. This series covers the "technology on the making side" of that.
- AI Marketing Vol. 5 ── Balancing Speed and Review ── How to review generated output with a human-in-the-loop (= a mechanism where a person intervenes partway). The verification philosophy is shared with Vol. 5 of this series.
- Material Review series ── The practice of review that receives generated output at the end. Whether code or promotional material, review stands between "can build" and "can use."
The era of AI writing code has genuinely arrived. But that "writing" is not done by understanding meaning; it is only lining up the most plausible continuation within the training data. That is precisely why it is astonishingly fast at routine work, and quietly wrong at unprecedented design and strict computation. It writes nonexistent functions with the same confidence as correct code ── this property will not disappear as models get smarter.
The conclusion this map points to is simple. Let AI write, and have humans verify. The faster you can write, the thicker you make the checking step. In the pharmaceutical and medical fields especially, between what can be generated and what can be used stand the walls of verification, recording, and regulation. Next time, map in hand, we move to the nearest entry point ── the practical craft of Copilot-type AI that completes lines inside the editor.
- An LLM does not write code by understanding meaning; it only predicts, by probability, "the word likely to come next in this context." That is why it is good at routine, precedent-rich code but poor at unprecedented design and strict computation, and why it can write nonexistent functions with the same confidence as correct code (hallucination).
- The more AI takes on the "writing," the more human work shifts to "what to have it write (requirements definition)" and "whether what was written is correct (verification)." "It ran" is not "it's correct," and testing is upgraded from optional to mandatory. Responsibility is not waived by "because AI wrote it."
- In pharma and medicine, being able to generate and being able to use are different things. Code that bears on patient safety or regulated information sits under the traceability of IEC 62304 and the measures of the PMD Act (exaggeration Art. 66 / unapproved Art. 68 / information provision Art. 68-2), and judgment is made not on "who built it" but on "what is written." Adopt from low-harm territory first, thickening verification as you expand.
- Chen, M. et al. Evaluating Large Language Models Trained on Code. arXiv:2107.03374, 2021. (The original source on the code-generation LLM "Codex" and HumanEval evaluation; shows the gap between first-try accuracy and repeated sampling.)
- Vaswani, A. et al. Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NeurIPS), 2017. (The original paper on the Transformer architecture; the basis for next-word prediction.)
- Ji, Z. et al. Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, Vol. 55, No. 12, 2023. (A survey that systematically organizes hallucination in generative AI.)
- OpenAI. OpenAI Platform Documentation. OpenAI, accessed 2026. (Official documentation on the capabilities and constraints of each model.)
- Anthropic. Claude Documentation. Anthropic, accessed 2026. (Official documentation on how to use Claude and its constraints.)
- International Electrotechnical Commission. IEC 62304:2006 Medical device software — Software life cycle processes. IEC, 2006 (Amendment 1: 2015). (The life-cycle standard for medical device software; the primary source for traceability requirements.)
- Ministry of Health, Labour and Welfare. Act on Securing Quality, Efficacy and Safety of Products Including Pharmaceuticals and Medical Devices (PMD Act), Articles 66, 68, and 68-2. (The respective provisions on exaggerated advertising, unapproved advertising, and sales information provision activities.)