AI Programming 06 — Generating Documentation: The Limits of Turning Code into Prose | AI Programming | Pharmaceutical Advertising Regulation: Material Creation, Review & Use in Japan

Generating Documentation — The Limits of Turning Code into Prose── Accuracy and staleness in automatic generation, and the discipline of the ADR

Generative AI (= AI that writes text and code automatically) has become good at producing explanatory documents from code. A list of functions, the specification of an API (= the interface through which programs talk to each other), a set of usage instructions — things people used to write by hand now take shape in minutes. But "being able to produce it" and "producing a document that is correct and usable" are two different things. Automatic generation runs into three walls: accuracy, staleness, and the fact that "why it was built this way" leaves no trace. This installment looks squarely at those limits, then shows how far the work can be carried in practice using tools such as the ADR (= architecture decision record) and Diátaxis (= a framework that sorts documents into four types). It is of a piece with records management in pharma.

01What kinds of "documentation" are there in the first place?

Before talking about generating documentation, we should settle that "document" is not one thing. Software documentation splits into at least the following three layers. Who reads them, how much accuracy they demand, and how often they change all differ by layer.

Documents tied closely to the code — descriptions of functions and classes, the input/output specification of an API. They sit right beside the code and should change whenever the code changes.
Documents about how to use it — setup instructions, tutorials (= learning by doing), frequently asked questions. Read by someone using the tool for the first time.
Documents of judgment — why this design was chosen, which options were discarded. They answer the "why?" that you yourself, or whoever inherits the work, will ask months later.

What generative AI is good at is the upper two layers. Read the code and it can produce the function descriptions and a first draft of the instructions. But the third layer — the record of "why we did it this way" — is not written in the code. Discarded options and the constraints of the moment leave no trace there. Here is the first limit of automatic generation.

02Generation by AI — what it can and cannot do, and how far

What documents can generative AI actually produce from code? Let us look precisely. Neither over- nor under-rating it is the precondition for using a tool well.

Doc 01

API reference

"can do"

Mechanically extracting arguments, return values, and types from the code and listing them. Close to transcribing facts, and the area AI produces most reliably.

Doc 02

Draft of usage docs

"draft only"

It can produce a rough draft of tutorials and how-to guides. But "where the reader will stumble" is hard for AI to see, and a human hand is required.

Doc 03

Summary / translation

"strong suit"

Condensing a long technical document, turning an English explanation into Japanese — this kind of restating is a central strength of AI.

Doc 04

The reason for a design

"cannot do"

"Why this approach was chosen" is information not written in the code. AI will compose a plausible-sounding reason, but it is not fact.

Line the four up and the boundary comes into view. Transcribing and restating facts written in the code is a strength; reconstructing information that is not in the code is a weakness. Ask AI to write the reason for a design and you get a passage that reads well but tends to be confabulation unrelated to the actual decision. Get this wrong and a mistaken reason ends up preserved as official documentation.

03Verifying accuracy — seeing through "plausible"

The most frightening thing about auto-generated documents is that errors hide inside natural-sounding prose. AI optimizes not for "correctness" but for "plausibility." So its errors, too, are readable and convincing. A smooth AI error slips past more easily than a person's rough sentence. This paradox has to be understood first.

The basis of verification comes down to checking the document against the code. Does the generated explanation match how the code actually behaves? Are the argument types correct; does it work as the written steps describe? Ideally, you automate this checking by machine. For example, put in place a mechanism that actually runs the code examples in the document and tests whether the result is what was written (= documentation testing), and you can mechanically catch gaps between explanation and code.

The same thinking as records management in pharma: manufacturing and quality records for drugs have a yardstick the FDA sets out — ALCOA (= Attributable "who," Legible "readable," Contemporaneous "at the time," Original, Accurate — the principles of data integrity). The same questions bite in documentation generation — on whose judgment does this statement rest (Attributable)? Does it match the source (Accurate)? Does it capture the state at the moment it was written (Contemporaneous)? The smoother the prose AI produces, the more these questions must be applied by both machine and human.

04The staleness problem — leave a document alone and it becomes a lie

Even if you produce it accurately, the next wall waits. The code keeps changing, but the document does not catch up on its own. An explanation correct today is left standing as an old explanation once the code changes next month. And the reader cannot tell the document is out of date. A wrong document is more dangerous than no document, because people believe "it is written down, so it is correct."

State	Effect on the reader	How to think about it
No document	Inconvenient, but they read the code directly, so misunderstanding is rare	Fill the gap with minimal generation
A correct document	Ideal. Faster understanding, fewer mistakes	Keep updating it together with the code
An old (stale) document	Most dangerous. They trust what is written and err	Make "as of when" explicit and automate generation

The best remedy for staleness is a mechanism that keeps documents in the same place as the code and regenerates them whenever the code changes. This idea is called docs-as-code (= managing documents with the same tools and procedures as code). Generative AI helps make this "regeneration" fast. Instead of a person rewriting everything, AI drafts the changed parts and a person checks them. But which documents may be regenerated by machine and which must be managed by a human leads into the discussion of types from the next section on.

05The ADR — a discipline for recording "why we did it this way"

As we saw in section 01, what generative AI is worst at is "the reason for a design." So how should that judgment be recorded? Here the ADR (= Architecture Decision Record) helps. It is a very lightweight format proposed by Michael Nygard in 2011. It sums up a single design decision in one short document.

The contents of an ADR are strikingly spare. Roughly, you write only these four things.

Context — in what situation, and what had to be decided.
Decision — what was chosen.
Rationale and discarded options — why it was chosen, and why the other candidates were rejected.
Consequences — what constraints or benefits the decision produces down the line.

The heart of an ADR is that information that leaves no trace in the code is recorded by the person who made the judgment, at the time. Discarded options and the constraints of the day cannot be reconstructed afterward. That is exactly why a human writes it rather than having AI compose it. Put the other way around: as long as an ADR exists, generative AI can read it and draft a summary or related documents. A human leaves "the judgment as fact," and AI fleshes out "the explanation around it" — this division of labor works for documents of judgment. Number them and arrange them in time order, and when a decision is changed, mark the old ADR as "superseded" and keep it. Stacking history rather than deleting it is the same thinking as change-control records in pharma.

06Diátaxis — sorting documents into four types

Another discipline that works in practice is Diátaxis (= a framework that sorts documents into four types). Organized by Daniele Procida, it divides any technical document into four quadrants by the reader's purpose. The two axes are "to learn" versus "to work," and "hands-on" versus "to understand."

Type	The reader's purpose	Fit with generative AI
Tutorial	A newcomer learns by doing	A draft is possible, but a human supplies the stumbling points
How-to	Carrying out a specific task by the steps	Good at extracting steps. Watch out for stating prerequisites
Reference	Looking up an exact specification on the spot	Most suited to auto-generation. Close to transcribing facts
Explanation	Understanding the background and the "why"	Needs human judgment such as an ADR. Watch out for confabulation

What Diátaxis teaches is that the four must not be mixed. Cram specification detail into a tutorial and the beginner gets lost; mix explanation into a reference and it becomes hard to look things up. And from the standpoint of auto-generation, this classification is a signpost too. Reference may be left to the machine; explanation must be managed by a human. Sort into the four types and the line between what to regenerate with AI and what a human guards comes into view directly. The "documents that may be regenerated by machine" from section 04 are mainly reference and how-to.

07Operation — keeping generation and verification running

How do you build all this into day-to-day development? Generate a document once and it goes stale as soon as the code changes. What matters is making generation, verification, and updating into a mechanism that keeps running. The following four are operating principles that work in practice.

Principle 01

Decide the type first

"sort, then build"

Decide up front, on the four Diátaxis quadrants, what kind of document this is. Once the type is fixed, whether it can be left to AI or must be written by a human is fixed.

Principle 02

Keep it beside the code

"docs-as-code"

Manage documents in the same place and by the same procedure as code. Make regenerating documents when the code changes part of the development flow.

Principle 03

Verify by running examples

"does it run as written"

Actually run the code examples in the document and check by machine whether the result matches what was written. Catch gaps between explanation and code automatically.

Principle 04

Humans record the judgment

"ADRs by hand"

Do not have AI compose the reason for a design; the person who made the judgment records it in an ADR. Judgment as fact is the one thing not left to the machine.

What the four share is the idea of separating "the part left to the machine" from "the part humans guard" in advance. Transcribing and verifying facts to the machine; judgment and covering the stumbling points to humans. With this division in place, generative AI greatly lightens the burden of making documents. Leave everything to AI without dividing the work and smooth errors and confabulation pile up, and trust in the documents collapses instead.

08Connections to other chapters on this site

This installment deepens if you read it alongside the following chapters.

AI Programming Vol. 5 — automating tests. The mechanism for verifying a document's code examples is of a piece with the thinking behind testing.
AI Programming Vol. 7 — AI Security and Vulnerabilities — the next installment, which handles the risks of generated output. Staleness of documents, too, becomes a different kind of risk if left alone.
Material Review series — the accuracy of records and how to keep a change history share the same questions as ADRs and docs-as-code.

In closing

Generative AI has greatly lightened the work of turning code into documents. Transcribing an API reference, drafting instructions, summarizing and translating — these are areas you may leave to it. But two walls remain. One is accuracy — because AI writes "plausible errors" in natural prose, verification against the code is required. The other is that "why we did it this way," which is not written in the code, cannot be produced by AI. Here there is nothing for it but to have a human leave an ADR.

Draw the limits correctly and the tool works powerfully. Sort documents into the four Diátaxis types — reference to the machine, explanation to humans. With docs-as-code, keep documents beside the code and keep generation and verification running. Let humans leave the judgment alone. This line is the foundation for keeping documents fast and free of staleness. It is also the same discipline as records management in pharma — the practice of recording precisely who decided what, when, and on what basis. Next time, we move on to the risks that generated code itself carries: AI security and vulnerabilities.

Key Points — three to take away

Generative AI is good at transcribing and restating facts written in the code (API references, summaries, translations) but cannot produce "why we did it this way," which is not written in the code. Because it becomes plausible confabulation, you must not have AI write the reason for a design.
The two big risks of auto-generation are errors hidden in smooth prose (accuracy) and staleness that fails to keep up when the code changes. The remedy is to keep documents beside the code (docs-as-code), run the code examples and verify by machine, and keep generation running.
Sort documents on the four Diátaxis quadrants (tutorial, how-to, reference, explanation) and the line between what to leave to the machine and what humans guard comes into view. Record the reason for a design as an ADR, written by the person who made the judgment, at the time — stacked as history rather than deleted.

Sources & References

Daniele Procida. Diátaxis documentation framework. diataxis.fr, 2017–. (Primary source for the framework that sorts technical documents into four types)
Michael Nygard. Documenting Architecture Decisions. 2011. (The original article proposing the ADR = architecture decision record)
Andrew Etter. Modern Technical Writing: An Introduction to the Discipline of Technical Writing. 2016. (An introduction that explains the practice of docs-as-code)
U.S. Food and Drug Administration. Data Integrity and Compliance With Drug CGMP: Questions and Answers — Guidance for Industry. FDA, 2018. (ALCOA and other principles for the accuracy and integrity of records)
厚生労働省医薬・生活衛生局長. 医療用医薬品の販売情報提供活動に関するガイドライン. 厚生労働省, 2018. (Primary source for the sales information provision guidelines)

← Back to AI Programming