AI Material Review 07 — Review Records, the Audit Trail, and AI: All the Way to CAPA | AI Material Review | Pharmaceutical Advertising Regulation: Material Creation, Review & Use in Japan

Review Records, the Audit Trail, and AI ── All the Way to CAPA── Toward a review you can trace and reproduce

In material review, "we looked," "we corrected," and "we cleared it." Unless these three are captured in a record, they are treated as if they never happened. When someone later asks whether the review was done properly, the only thing you can lean on is the record, not memory. Who judged what, when, against which basis, and what they had corrected ── being able to trace this is what we call an audit trail (= a continuous record of when, who, and what was changed). And once a deviation occurs, you do not merely fix it on the spot; you cut off its cause so the same shoot never sprouts again. This is corrective and preventive action, CAPA (= Corrective Action and Preventive Action, the mechanism of fixing and of preventing recurrence). This installment covers the design of records that make review something you can "trace and reproduce." Once you put AI into that picture, both the volume of records and the difficulty of reproducibility shift a notch. What does it take to preserve an AI's judgment in a form a human can verify? We build it up in order.

01Why Records Matter ── The Record Is the Only Proof You "Did It"

A review record is not the tidying-up that follows the clerical work. It is part of the act of reviewing itself. The reason is simple: judgment is invisible. A reviewer reads the material, holds it against the scope of approval, notices that a passage reads as overstating efficacy, and has it corrected ── this whole sequence of mental motion vanishes unless it is written down. What has vanished cannot be verified afterward.

So a record does at least three jobs. First, accountability. "Why was this material cleared?" can be answered later, even without the people who were there. Second, material for preventing recurrence. Unless you can count what kinds of deviations occur and how often, you cannot devise countermeasures. Third, an educational asset. The accumulation of past judgments becomes teaching material through which a new reviewer learns "here is where this kind of expression turns dangerous."

Conversely, a review with weak records is fragile even when it looks like it is running smoothly. The moment the person in charge changes, the standard of judgment wavers, the same point gets missed on a different piece of material, and no one can explain why something was cleared. Records are the work of moving review from an individual's memory to an organization's asset.

The core: Without records, review can claim it was "done" but cannot prove it was "done right." Preserving the three-piece set of judgment, basis, and disposition ── this is the smallest unit of a traceable review.

02The Audit Trail ── Who, When, and What Was Changed

Within records, the audit trail carries particular weight. Whereas an ordinary record preserves "here is how it finally ended up," the audit trail preserves each and every change that led there. A single sentence about an indication overstepped the scope of approval slightly in the first draft, drew a comment in review, was corrected in the second draft, and was finalized in the third ── every one of these round trips is traceable, complete with timestamps and the person in charge.

Why keep the history of changes as well? To prevent tampering and concealment. If you are shown only the final version, you cannot tell whether a dangerous expression was removed along the way or was never there to begin with. In the world of pharmaceutical quality records, this way of thinking has been organized under the yardstick of ALCOA+ (= a principle built from Attributable ⟨you can tell whose record it is⟩, Legible ⟨readable⟩, Contemporaneous ⟨recorded on the spot⟩, Original ⟨the original⟩, and Accurate ⟨correct⟩, plus completeness and more). For electronic records, the United States' 21 CFR Part 11 requires that the audit trail be preserved in a form that cannot be rewritten after the fact.

Brought into material review, the questions an audit trail must answer are fixed.

Question 01

Who

"Attributable"

Which reviewer, which author made that judgment or correction. Not an anonymous "someone," but responsibility identifiable by name.

Question 02

When

"Contemporaneous"

When the judgment was made and when the correction happened. Not written up together after the fact, but stamped with the time it was recorded on the spot.

Question 03

What

"Original"

Which expression was changed, and how. Both the before and the after remain, with nothing in between erased.

Question 04

Why

"Accurate"

The reason for the correction. Against which provision, which piece of approved information it was corrected. The basis for the judgment is linked to it.

Only when these four are in place can you say a review is "traceable." Conversely, if even one is missing, you may have a record but you cannot use it for verification. A comment with no name, a correction with no time, a deletion with no reason recorded ── the common holes in records almost always come from one of these four being absent.

03Corrective and Preventive Action (CAPA) ── Don't Stop at Fixing It on the Spot

When a deviation is found, the response comes in two stages. First, fix the material in front of you. This is corrective action. Next, trace why that deviation arose, cut off the cause itself, and prevent recurrence. This is preventive action. Together they are called CAPA, and even ICH Q10, which defines the pharmaceutical quality assurance system, places them at its center.

Where the field stumbles is almost always on the preventive side. Correction is visible, so it always gets done. But prevention takes the effort of digging out the root cause, so it tends to be skipped. If you fix that one sentence in the material and call it done, the same type of deviation shows up again, in a different piece of material by a different person. When you have patched the surface three times and the same problem surfaces a fourth, that is a sign you are fixing the wrong place. Time to stop your hands and go look at the cause side.

Correction (fix it on the spot)	Prevention (cut off the cause)
Revise this material's overstated sentence into an expression within the scope of approval	Trace why that expression arose (a template boilerplate? the author's insufficient understanding?)
For a flagged missing citation, supply the relevant passage of the package insert as the basis	Find the mechanism where the step of attaching citations tends to drop out, and build it into the checklist
The target is "this one material," this time	The target is "all the materials to come that share the same cause"
Fast. The effect is only right in front of you	Slow. The effect reaches every future case

What feeds this prevention is the record. Only when you count individual deviations and see which type keeps recurring can you finally know "fix this and the recurrence stops." The nation's published monitoring reports on sales information provision activities operate on exactly this idea, collecting anonymized deviation cases type by type. Bundle your own records the same way and you get an in-house map for preventing recurrence.

One boundary line, so you do not misjudge the scope of what to record. What material review records is, strictly, whether the substance of the information provided stays within the scope of approval and the regulations. What a medical representative (= MR, Medical Representative) handles is the provision of information about the drug; price, stock, delivery, ordering, and price negotiation are not their remit. Those belong to the transactions and logistics between the pharmaceutical wholesaler and hospital procurement ── a separate line from what material review records track. When you run CAPA, too, keeping them unmixed makes the isolation of causes more accurate.

04Traceability ── Can You Trace From a Sentence Back to Its Basis?

Traceability (= the ability to trace) is the backbone that connects the audit trail and CAPA. The meaning is this ── a given sentence in a material can be traced bidirectionally to which piece of approved information, which citation it is linked to. From a sentence stating an indication, you can go down to the relevant passage of the package insert that is its basis (backward trace). Conversely, when approved information is revised, you can pull up every sentence of material affected by it (forward trace).

Where this pays off in material review is when approved information changes. The wording of an indication is revised, or safety information is updated. At that moment, if traceability is wired up, "the materials hanging off that information" come out in a list. If it is not wired up, your only option is to have people re-read every piece of material by hand. In an organization holding hundreds of materials, that difference becomes fatal.

The "missing citations and basis" we covered in Vol. 3 was precisely the state where traceability is broken. The number is there, but you cannot trace where that number came from. The claim is there, but it is not linked to which part of the scope of approval it corresponds to. Restated from the recordkeeping angle, to attach a citation is to string a single traceable thread. Review is also the work of confirming that all those threads are connected.

05AI Logs and Reproducibility ── Turning Wobbly Judgment Into Something Traceable

Up to here we have been talking about records of human review. Once you build AI into review, there is one more thing to record ── the AI's own judgment. If an AI picked up "this sentence reads as overstating efficacy," then the way it picked that up also becomes a subject of record. Same reason as for preserving human judgment. Unless you can later verify "why did the AI flag this?", the AI's output is unusable.

The tricky part is that the judgment of a large language model (= an AI trained on massive amounts of text, hereafter LLM) wobbles. A program a human wrote returns the same answer to the same input. An LLM picks the next word by probability, so showing it the same material twice can return slightly different comments. Reproducibility ── the same input yielding the same judgment ── is not guaranteed as-is.

So, to make AI hold up as a record of review, at minimum you preserve and fix the following.

Preserve 01

Model version

which AI

The name and version of the model used. Since judgment can change when the version changes, record "which model, as of when."

Preserve 02

Input and instruction

what you fed it

The version of the material put through review, and the prompt (= the instruction) given to the AI. Reproducing the same result requires a complete record of the input.

Preserve 03

The output itself

what it returned

The full text of the comments the AI picked up. Not just the summary or conclusion but the raw output, so a human can re-read it later.

Fix 04

Suppressing the wobble

can you reproduce it

Fix low the settings that govern output variability, and do not bump the version on a whim. Run it so the same input converges toward the same judgment.

This way of thinking overlaps with the recordkeeping and traceability mindset required by ISO/IEC 42001 (= the standard for AI management systems, 2023), the international standard for managing AI within an organization. In short, the yardstick for records does not slacken just because it is AI. If anything, because the judgment wobbles, you need to keep logs more carefully than for human review. And what must not be forgotten: an AI log is a record of "the AI said this," not a record of the disposition itself. The final judgment and record are shouldered by a human.

06The Limits ── A Record Is No Substitute for Judgment

Thickening the records breeds a sense of security. That security is the next pitfall. Records have clear limits.

First, records easily become form over substance. Once filling in the boxes becomes the goal, thin "confirmed" entries just pile up. Even if the audit trail has every field complete, if the judgment itself is shallow, all you get is more shallow judgment preserved in a traceable form. The fullness of records and the quality of review are separate matters.

Second, AI logs drown you in volume. AI does not tire, so it puts out comments without limit. Record all of them and a vast pile of logs accumulates that no one reads. A record no one reads is nearly the same as none. What to keep and what not to keep ── this selection is itself a subject of design.

Third, even the audit trail is not absolute. Lay down a mechanism that cannot be rewritten, and room to fudge it in operation still remains. Entering the time after the fact, filling in reasons with boilerplate, keeping inconvenient versions out of the history as "drafts." Mechanisms narrow the loopholes; they do not zero them out. What speaks last is a culture of keeping records honestly.

The boundary line: A record only creates a state in which "having reviewed correctly" can be verified afterward; it does not make the review itself correct. Traceability (being able to trace) and correctness (the quality of judgment) must be confirmed separately. Do not mistake thick records for evidence of safety.

07Connections to Other Chapters ── Ties to Deviation Detection, Rule Design, and Standardization

This installment's recordkeeping story connects in a single line to the other chapters of the series. Read together, they make the whole picture of AI material review three-dimensional.

Vol. 3 ── Deviation Detection ── How to preserve, as records, the shoots of deviation the AI picked up. Detection and recordkeeping are two sides of one coin; picking something up but not preserving it means no material for CAPA.
Vol. 4 ── Rule Design ── When and how the fences (guardrails) given to the AI worked is also a subject of record. The effectiveness of a rule only becomes clear once you count the logs.
Vol. 5 ── Standardizing Review ── The common language that reduces individual dependence crystallizes as the format of records. Preserve it in the same shape and the scatter in judgment comes into view.
Vol. 8 ── Reviewers' AI Literacy ── The ability to read AI logs and see through the wobble. The mechanism that preserves records and the human skill to use it well are two wheels of one cart.

In Closing

Review is only half done when you look, correct, and clear. The remaining half is preserving that in a traceable form. Precisely because judgment is invisible, the record becomes the evidence of judgment. Preserve "who, when, what, and why" in the audit trail; when a deviation appears, fix what is in front of you with correction, and cut off the cause with prevention. Keep the thread traceable from approved information down to a sentence in the material ── this is the skeleton of a traceable review.

Put AI in, and a new record ── the AI's judgment ── is added to this skeleton. And because the AI's judgment wobbles, you must preserve version, input, and output, suppress the variability, and stack logs more carefully than for human review. But no matter how thick the records grow, all they do is create a state where "whether the review was done right" can be checked afterward. Correctness itself is judged by a human and shouldered by a human. Next time we move to that human side ── the literacy of reviewers who read and master AI logs.

Key Points ── 3 to Take Away

Because judgment is invisible, without records you cannot prove "the review was done right." The audit trail preserves "who, when, what, and why" with the change history attached (the ALCOA+ and 21 CFR Part 11 mindset), and if any of these four is missing, you have a record but cannot use it for verification.
The response to a deviation is two-staged. Correction, which fixes what is in front of you, and prevention, which cuts off the cause to stop recurrence, are together called CAPA (ICH Q10). What feeds the often-skipped prevention is the record; only by counting deviations of the same type do you finally see "fix this and it stops." What material review tracks is the substance of the information provided; price, stock, delivery, and transactions are outside the MR's remit and form a separate line.
If you use AI in review, record the AI's judgment too. Since an LLM's judgment wobbles by probability, preserve the model version, the input, and the raw output, and fix the variability to approach reproducibility (the ISO/IEC 42001 mindset). But thick records are not evidence of safety. Being traceable (traceability) and being correct (the quality of judgment) are separate, and the final judgment is shouldered by a human.

Sources & References

ICH (International Council for Harmonisation). ICH Q10 Pharmaceutical Quality System. 2008. (An internationally harmonized guideline positioning CAPA (corrective and preventive action) and continual improvement at the core of the quality system. In Japan, notified as PFSB/ELD Notification No. 0219-1.)
Ministry of Health, Labour and Welfare. Ministerial Ordinance on Standards for Manufacturing Control and Quality Control of Drugs and Quasi-drugs (GMP Ordinance, MHLW Ordinance No. 179). Enacted 2004, amended 2021. (The domestic primary norm setting deviation management, CAPA, and the creation and retention of records.)
U.S. Food and Drug Administration. 21 CFR Part 11 — Electronic Records; Electronic Signatures. 1997. (Reliability requirements for electronic records; the source text requiring the audit trail to be preserved in a form that cannot be altered after the fact.)
Medicines and Healthcare products Regulatory Agency (MHRA). 'GXP' Data Integrity Guidance and Definitions, Revision 1. 2018. (Definition and operational guidance on data integrity (record completeness) under the ALCOA+ principle.)
MHLW, Director-General of the Pharmaceutical Safety and Environmental Health Bureau. Guideline on Sales Information Provision Activities for Prescription Drugs. PSEHB Notification No. 0925-1, September 25, 2018. (The HanteiG. A notice requiring recording, monitoring, and a review structure for information provision activities.)
Ministry of Health, Labour and Welfare. Report on the Drug Advertising Activity Monitoring Project (Sales Information Provision Activity Monitoring Project). (Anonymized material-deviation cases tabulated by type. A working example of using records as material for preventing recurrence.)
ISO/IEC. ISO/IEC 42001:2023 Information technology — Artificial intelligence — Management system. 2023. (The international standard setting recordkeeping, tracking, and risk management for operating AI within an organization.)

← Back to AI Material Review