AI Material Review 05 — Standardizing Review with AI: A Shared Language That Reduces Reliance on Individuals | AI Material Review | Pharmaceutical Advertising Regulation: Material Creation, Review & Use in Japan

Standardizing Review with AI ── A Shared Language That Reduces Reliance on Individuals── A yardstick that brings everyone closer to the same judgment

Show the same material to two reviewers and their verdicts can split. One passes it, the other rejects it. Neither is wrong, yet the results differ. In material review, this property ── the judgment changing depending on who is looking ── is called reliance on the individual. Last time, we discussed giving AI the rules to protect up front. This time we go one step further ── turning those rules into a shared language that every reviewer can use in the same way. We put a veteran's "somehow this feels risky" into words, make AI a yardstick that everyone shares, and bring every reviewer closer to the same judgment. What exactly does standardization align? What can AI carry there, and what can it not? We take these in order.

01The Problem of Individual Reliance ── The Same Material, a Split Verdict

First, let us break down why verdicts split. Reliance on the individual does not happen "because the reviewer is careless." Even with a diligent review, the result changes from person to person for the reasons below. In fact, telling people to "be more careful" without understanding the source will not reduce the variation.

Source 01

Difference in experience

They see different things

A veteran has past cited cases stored in the body and notices a dangerous sentence by reflex. To a newcomer, the same sentence looks like plain explanation. Even for the same material, the number of dangers visible differs.

Source 02

Range in interpreting the standard

They draw the line differently

Is the single word "effective" a reasonable expression within the approved scope, or one step short of exaggeration? Because the standard itself has range, where each person draws the line shifts.

Source 03

Condition on the day

Even one person wavers

Rushing before a deadline, or when tired, even the same reviewer misses more. Individual reliance shows up not only as differences between people but as day-to-day wavering within the same person.

Source 04

Order and context of viewing

The flow softens the eye

After reviewing a strict material just before, the bar tightens; after a run of light materials, it loosens. The judgment is dragged not only by the material's content but by the surrounding flow.

Read the monitoring project reports on sales information provision activities that the Ministry of Health, Labour and Welfare publishes, and you find that most deviations arise not from flashy sales pitches but from "the single sentence that slipped past because no one caught it." Blatant violations anyone can stop. What is dangerous is the borderline expression that is visible to some reviewers and not to others. Reducing individual reliance means making this "visible-or-not" visible to everyone in the same way. The destination of standardization lies here.

02Putting the Standard into Words ── Turning "Somehow" into a Yardstick

The first step of standardization is neither AI nor a mechanism. It is putting the judgment inside a veteran's head into words. "This expression somehow feels risky" ── this hunch is genuine knowledge, years of cases soaked into the body. But because it is not in words, no one but the person can use it. So we first take it out.

Merely listing dangerous words is not enough to take it out. What matters is writing the conditions. Take the word "effective." Ban it across the board and you also reject sentences that use it correctly within the approved scope. Instead, put into words "when it is a problem and when it is not" ── permitted if backed by the approved efficacy and effect, not permitted if it hints at superiority beyond that. This written-out set of "conditions for drawing the line" is called a scoring rubric (= gloss: a table listing what to look at and how).

The very act of putting it into words has a large effect. Even before introducing AI, when reviewers discuss "what to do in this case" and write the conditions side by side, the range of interpretation narrows. A standard that cannot be written cannot be shared. Conversely, a standard that can be written works the same way for newcomers and veterans alike. Standardization begins, before any tool is introduced, with aligning the words first.

Do not erase tacit knowledge: In the process of putting things into words, some part of a veteran's hunch that cannot yet be worded will always remain. This is not "something unnecessary" but "a vein to be grown into a standard from here." Collect the cases where verdicts split, discuss why they split, and add conditions little by little. The rubric is not finished in one pass; it grows by feeding on cases.

03Holding Consistency with AI ── Applying the Same Yardstick, Untired, Unwavering

Once the standard is in words, AI first takes effect there. AI's greatest strength is not cleverness but consistency. A human tires by the thousandth item; AI applies exactly the same yardstick to the first item and the thousandth. Neither that day's deadline nor the flow of materials seen just before shakes AI's judgment. The wavering from "condition on the day" and "order of viewing" raised in Section 1, AI in principle does not have.

Replace the yardstick that differed by reviewer with a single shared yardstick called AI. This is the substance of using AI for standardization. Even when the person changes, even when the day changes, the same material gets the same finding back. This reproducibility shaves away the trickiest part of individual reliance.

Human review alone	AI as the shared language
The granularity of findings differs by reviewer	Everyone gets findings from the same perspective by the same standard
Even the same person's misses waver by the day	The same yardstick is applied to the first item and the thousandth
A veteran's judgment can be used only by that veteran	A newcomer can use the worded standard in the same way
Why it passed or was rejected is hard to keep on record	Which point of which standard was touched is recorded every time

One thing must not be misunderstood here. AI's consistency is not a guarantee of correctness. AI merely applies the given standard the same way every time. If the standard itself is wrong, AI repeats the mistake consistently. Being uniformly wrong is, if anything, harder to notice than being scattered-wrong. So before making it the shared yardstick, a human must confirm that the yardstick is graduated correctly. What AI aligns is "the same judgment," not "the correct judgment."

04Operating the Checklist ── Making the Aligned Standard Run on the Floor

When the worded standard is dropped into a form usable in daily review, in most cases it becomes a checklist. "Is it backed by the approved efficacy and effect?" "Are there superlative or absolute expressions?" "Are there words that suggest a transaction?" ── the conditions written out in Section 2, each turned into a confirmation item. AI fills this list automatically for each material and shows the reviewer the points that were flagged.

There are two knacks of operation here. One is that the checklist is not a tool for filling in but a frame for preventing misses. When ticking the items becomes the goal itself, review turns into mere form. A pilot reads out the confirmation list before takeoff not to fill in the list but to prevent misses that come from habit. The material-review checklist is the same: the goal is not "the ticks" but "zero misses."

The pitfall of hollowing out: Let your guard down and the checklist degrades into "filling it in counts as having reviewed." Even when every item is ticked, whether each one was really examined is another matter. The more you let AI fill it in automatically, the more this danger grows. So treat AI's filled-in result not as "reviewed" but as "an organized set of candidates for a human to confirm." The asymmetry stated last time ── screening-out is automatic, the pass judgment is human ── holds exactly here too.

The other is that the checklist goes stale. As approval information changes and new patterns of violation are found, you add and remove items. Who updated it when, and which version was used for the review ── recording this is continuous with last time's guardrails. Put the standard into words, drop it into a checklist, stamp the version. Standardization is not made once and done; it is something kept by running it continuously.

05Ripple Into Education ── The Standard Becomes Teaching Material for Newcomers

Standardization has an effect beyond aligning review. A worded standard becomes teaching material for newcomers as it is. Newcomers once stood beside a veteran and, at each finding, stole and memorized "why did they stop this now." It took time, and what was conveyed differed by teacher. When the standard is in words, a newcomer can first learn the yardstick itself.

Put AI in between and this learning speeds up. A newcomer reviews a material, and AI returns findings from another perspective. Set the two side by side, and the newcomer learns on the spot "the perspective I missed." Even without a veteran sitting beside them item by item, AI keeps returning findings along the standard, so the number of practice reps builds up. Individual reliance tends to be handed down across generations, but with an articulated standard and AI findings, that chain can be cut.

Teach the "why" as well: There is one strong caution here. Simply swallowing AI's "this expression is risky" makes the newcomer stop thinking about the reason. A reviewer who only memorizes findings cannot judge a new expression that is not in the rubric. Teach each standard with "why this line" ── from which article, from which intent it comes. Handing over not only the yardstick's graduations but the grounds for those graduations is the core of development.

06The Limit ── Standardization Cannot Erase the Delicate Judgment

We have described the effects of standardization so far; let us honestly set down the limit. What can be fully written into a standard is only the part where the line is clear. The difficulty of material review lies mostly on the border ── just inside the approved scope, or one step outside. This delicate judgment is settled by context, hard to put into words, and exactly where AI is weakest. Standardization does not erase this border.

If anything, aligning too much brings another harm. Apply a standard swung to the safe side mechanically, and expressions used correctly within the approved scope get rejected in bulk. The floor gets buried in the work of pushing back "this is a false positive," and eventually stops believing the standard itself. Conversely, once reviewers start thinking "I only need to look at what is in the rubric," no one stops a new kind of deviation that is not in the table. Standardization is not a tool for making thinking unnecessary.

The right place for standardization: What should be aligned goes only as far as sharing "the clear line" among everyone. Stop the clearly bad thing the same way whoever looks ── firm that up with AI and the rubric, and pour the freed-up strength into the delicate judgment at the border. The aim of standardization is not to replace human judgment but to concentrate strength where humans truly need to use their heads. The final judgment, as in every previous volume, is made by a human.

07Connections to Other Chapters ── The Shared Language Ties the Whole Series Together

The standardization assembled in this volume connects to the other volumes of this series, and to the sister series, as follows. A shared language is not a standalone technique but the ground that lets everyone use the tools of each volume.

AI Material Review Vol. 4 ── Giving Rules to AI ── This volume lifts the guardrails assembled last time, from a yardstick that differed by reviewer to a shared language held by everyone. Hold the fence together, and individual reliance drops.
AI Material Review Vol. 3 ── Efficacy and Effect Within the Approved Scope ── The center of the standard to be standardized is drawing the line of the approved scope. Drop the thinking confirmed in Vol. 3 into words that everyone can use the same way.
AI Material Review Vol. 6 ── The Real Power and Limits of AI Review-Support Tools ── How far the tool that actually runs the standard aligned in this volume can go, and where it should return to humans. Examined concretely next time.
AI Programming Vol. 1 ── The Basics of Code Generation ── The idea of applying the same standard the same way every time to hold consistency shares the same skeleton as verifying AI-written code uniformly with automated tests.

Conclusion

Individual reliance is not the reviewer's negligence. Difference in experience, range in interpreting the standard, condition on the day, order of viewing ── even with a diligent review, judgments vary. Standardization means, to reduce this variation, first putting a veteran's "somehow" into words, turning it into a conditional yardstick, making that a single shared yardstick called AI, and sharing it among everyone. AI's strength is not cleverness but a consistency that neither tires nor wavers. Yet that consistency is not a guarantee of correctness, so whether the yardstick is graduated correctly is something a human confirms first.

The aligned standard runs as a checklist and also becomes teaching material to raise newcomers. But what must not be forgotten: what standardization can erase is only "the clear line," while the delicate judgment at the border remains. Align too much and you reject even correct expressions, and reviewers stop thinking. The aim of standardization is not to replace human judgment but to concentrate strength where humans truly need to use their heads. Next time, we examine head-on how far an AI review-support tool that actually runs this standard can go, and where it runs out of strength.

Key Points ── Three to Take Home

Individual reliance (the property of review results splitting by person) is not the reviewer's negligence; it arises from difference in experience, range in interpreting the standard, condition on the day, and order of viewing. What is dangerous is not the flashy violation but the borderline sentence visible to some reviewers and not others. Standardization means making this "visible-or-not" visible to everyone the same way.
The first step of standardization is not AI but wording a veteran's "somehow risky" into a conditional yardstick (a scoring rubric). Make it a single shared yardstick called AI, and its untired, unwavering consistency returns the same finding to everyone. But AI's consistency is not a guarantee of correctness; if the standard is wrong, it errs uniformly. A human confirms the yardstick's graduations first.
The aligned standard runs as a checklist (as a frame to prevent misses, not the work of filling in) and also becomes teaching material to develop newcomers. But what standardization can firm up is only "the clear line"; the delicate judgment at the border remains context-dependent. Aligning too much rejects even correct expressions and stops thought. Screening-out is automatic, the final judgment is human ── this asymmetry does not change.

Sources & References

Ministry of Health, Labour and Welfare, Pharmaceutical Safety and Environmental Health Bureau, Compliance and Narcotics Division (commissioned project). Report on the Monitoring Project for Sales Information Provision Activities of Prescription Drugs. Each fiscal year. (A public primary source recording actual deviation cases with company names anonymized; lets you confirm typical borderline expressions easily missed depending on the reviewer.)
Ministry of Health, Labour and Welfare, Director-General of the Pharmaceutical Safety and Environmental Health Bureau. Guidelines on Sales Information Provision Activities for Prescription Drugs. Yakusei-hatsu No. 0925-1, September 25, 2018 (applied April 1, 2019). (Primary source defining the target, method, and structure of information provision activities. The foundation of review standards.)
Ministry of Health, Labour and Welfare. Q&A on the Guidelines on Sales Information Provision Activities for Prescription Drugs. (Operational Q&A for the Sales Information Provision Guidelines. Concrete examples for applying the standard on the floor.)
Ministry of Health, Labour and Welfare, Director of the Compliance and Narcotics Division, Pharmaceutical Safety and Environmental Health Bureau. On the Revision of the Standards for Fair Advertising of Drugs and Other Products. Yakusei-kanma-hatsu No. 0929-5, September 29, 2017. (A notice that drops the advertising regulation of the Pharmaceuticals and Medical Devices Act into practical standards. The grounds for drawing the line of judgment. Issued by the Director of the Compliance and Narcotics Division.)
Ministry of Health, Labour and Welfare. Act on Securing Quality, Efficacy and Safety of Pharmaceuticals, Medical Devices, etc. (Pharmaceuticals and Medical Devices Act), Articles 66, 68, and 68-2. (Prohibition of exaggerated advertising, prohibition of advertising pre-approval drugs, and appropriate information provision in sales information provision activities, respectively. The apex of the standard.)

← Back to AI Material Review