01Types of deviation ── what causes trouble in materials is largely fixed
The deviations that get caught in material review are not infinite. The types that recur as problems in prescription-drug materials are, to a fair degree, fixed. Sorting out these types first makes it clear what to have AI look for.
Exaggeration / Absolutes
Expressions such as "remarkable efficacy," "definitely," or "no worry of side effects" that make efficacy or safety appear greater than it is. These fall under the exaggerated advertising prohibited by Article 66 of the Pharmaceutical and Medical Device Act. The most frequent type.
Straying beyond the approval
Statements that go beyond the approved efficacy, effects, dosage, and administration. Suggesting a not-yet-approved use touches Article 68 (prohibition of advertising before approval).
Missing sources / evidence
Figures or graphs with no source, a mismatch between the cited work and the body text, or in-house estimates dressed up as clinical data. A type strictly warned against by the sales information provision activity guidelines.
Deviation in comparison / superiority
Making the drug read as superior to others without the backing of a direct comparative trial, or lining up only the convenient data. An area that also overlaps with the Fair Competition Code.
What the four share is this: each is a gap between "the words written" and "the facts of approval and sourcing." A deviation is not the text being bad on its own; it becomes a deviation only when held up against facts that lie outside the document (the approved content of the package insert, the original cited work). This is the crux that shapes the design of AI detection.
02How AI detection works ── flag by rule, read by meaning
When you have AI search for deviations in materials, the methods split broadly into two. Combining them is the practical form.
One is rule-based. You build a list of watch words such as "remarkable efficacy," "always," or "world first," and flag them mechanically. It is fast, and why something was flagged is clear. But it is weak against paraphrase. Ban "remarkable efficacy" and "striking improvement" sails straight through.
The other is meaning-based matching by a large language model (= an AI trained on huge amounts of text; hereafter LLM). An LLM reads not the words themselves but what a sentence is trying to say. "Striking improvement" and "remarkable efficacy" can both be caught as the same intent to overstate efficacy. But as earlier installments showed, an LLM does not understand meaning; it merely lays out, by probability, "the word likely to come next in this context." So it can be plausibly wrong.
| Rule-based detection | LLM-based detection |
|---|---|
| Flags on watch-word matches | Flags on the meaning / intent of the sentence |
| Weak against paraphrase and metaphor | Strong against paraphrase, but judgment wavers |
| Clear explanation of why it flagged | Gives reasoning, but you can't trust it without checking |
| Cannot match against approved content | Given the package insert, can point out out-of-range items |
In practice, a two-stage setup is realistic: first flag the obvious landmine words by rule, then flag paraphrased and context-dependent deviations with the LLM. And if you have the LLM do matching, always hand it the facts to check against. Ask "is this off-label?" without showing the approved content and the model will answer by pretending to know.
03Detecting exaggeration (Article 66) ── reading "too effective"
Article 66 of the Pharmaceutical and Medical Device Act prohibits, whether explicit or implicit, advertising, describing, or spreading false or exaggerated articles about the efficacy, effects, or safety of drugs and the like. This is the most frequent deviation in materials. Exaggeration arises not only from outright lies but also from a pile-up of assertions and emphasis.
When you have AI flag exaggeration, the things to watch for are as follows.
- Strength of assertion ── does it flatly state "always," "definitely," "safe"? There are no absolutes with drugs, and a flat assertion tilts toward exaggeration on its own
- Understating safety ── does it make risk look lighter than it is, e.g., "almost no side effects," "can be used without worry"?
- Superlatives / uniqueness ── unbacked first-place expressions such as "most effective," "the only"
- Implication through testimonials or photos ── even if the text says nothing, does a figure or case photo showing dramatic change over-impress the effect?
An LLM is relatively good at reading this "strength of assertion" and "excess of impression." It can catch euphemistic exaggeration that a watch-word list alone would miss. On the other hand, it also tends to over-flag, catching even properly bounded statements as "maybe exaggeration." A design that flags hard and lets a human decide can tolerate this over-flagging. The reverse cannot (Section 6).
04Detecting off-label efficacy (Article 68) ── checking against the package insert
Article 68 prohibits advertising the efficacy, effects, and so on of drugs and the like that have not received approval. Even for an approved drug, suggesting in a material a use that exceeds the range of approved efficacy and effects can, in substance, become advertising of the unapproved. Suggestion of off-label use is one of the most nerve-wracking areas in material review.
This detection differs in nature from exaggeration detection. You cannot judge it by reading the text alone. It is the work of checking sentence by sentence against the approved content written in that drug's package insert. So if you leave it to AI, you must always give it the approved efficacy, effects, dosage, administration, and target patients as matching material.
The practical use is this. Have AI extract every statement in the material about efficacy and effects, and lay out which entry in the package insert each corresponds to. Any statement for which no corresponding approved entry is found is passed to a human as a suspected off-label item. AI takes over the drudge work of extraction and matching, and the reviewer concentrates on judging the suspicious spots.
05Missing sources / evidence ── look behind the numbers and graphs
The guidelines on sales information provision activities (= the SIP Guidelines; the Guidelines on Sales Information Provision Activities for Prescription Drugs, a 2018 notice from the Director-General of the Pharmaceutical Safety and Environmental Health Bureau, MHLW) strongly require that the information provided rest on scientific, objective evidence and clearly state its source. In material deviations, this missing evidence is easily overlooked. The wording may be moderate while the numbers behind it float free.
What AI can catch is, for example, omissions and inconsistencies like these.
- Figures / graphs with no source ── "efficacy rate 80%" with no citation attached
- Gap between body text and cited work ── the conclusion of the cited paper and the material's phrasing disagree (the original states only a limited effect, yet the material generalizes it)
- Deceptive presentation of in-house data ── in-house estimates or surveys shown as if they were peer-reviewed results
- Exaggeration through graph styling ── truncating an axis or manipulating the scale to make a difference look larger
What to be careful of here is AI's limit. That "a source is not written" is relatively detectable even by AI. But whether "the source that is written truly backs that content" cannot be known without actually reading the original cited work against it. Unless you hand AI the original, the model has no way to verify the citation's soundness and will wave through a consistency that may not even exist as "probably fine." Presence of a source by AI, soundness of a source by a human against the original ── here too, the division of labor is clear.
06False positives and false negatives ── which you fear decides the design
The quality of a detection system is measured by two kinds of miss. False positives (= flagging as a deviation what is not) and false negatives (= missing a deviation that is real). These two are in a tug-of-war: reduce one and the other rises.
| False positive (over-detection) | False negative (a miss) |
|---|---|
| Flags a fine statement as "maybe a deviation" | Lets a real deviation slip through |
| Adds confirmation work for the reviewer | Exaggeration / off-label goes out into the world as is |
| A cost problem (time is eaten) | A harm problem (regulatory violation, impact on patients) |
| Bearable | Must not be borne |
The failure that is unforgivable in material review is clear. The false negative. Miss a deviation and let the material out, and real harm follows ── the transmission of exaggerated advertising or off-label efficacy. A false positive that adds human confirmation is a nuisance but not harm. So AI detection should, as a principle, err toward "flag too much rather than miss." Set sensitivity high and let a human knock down the candidates one by one. Loudly flagging and being overruled serves the purpose of review better than quietly waving things through.
07The human's final confirmation ── AI does the prep, the human judges
Pulling it all together, the roles of AI and human split cleanly. What AI takes on is flagging broadly, quickly, and without tiring. What the human takes on is judging the flagged candidates against the facts and bearing responsibility.
The work that remains in the reviewer's hands narrows to three things.
- Check against the facts ── match the candidates AI flagged against the package insert, the original cited work, and related notices, and confirm whether it is truly a deviation. This is the work of a human going to the primary sources, and cannot be fully entrusted to AI
- Judge by context ── the same wording can be a deviation or not depending on the flow of the whole material and the intended reader. Reading the whole through is a human judgment
- Bear responsibility for the verdict and the record ── the decision to pass, fix, or stop, and the record of its basis. "The AI said no problem" is not a reason for passing a review
If this division is kept, bringing AI in makes review faster and cuts misses. Conversely, swallow AI's "no problem" whole and skip human confirmation, and it grows dangerous exactly to the degree it got faster. The time freed by speed should go into thickening confirmation ── the principle repeated in earlier installments holds here just the same. Even the MHLW's published monitoring-project reports on sales information provision activities show that most deviations are born from "a single sentence no one checked." The last checkpoint is guarded by a human.
08Connections to other chapters ── from detection, onward
This installment's deviation detection connects to other installments of the AI Material Review series and to other chapters on this site. Read together, they make the whole picture of building AI into review three-dimensional.
- AI Material Review Vol. 4 ── Giving AI the Rules (Guardrail Design) ── this installment is about "flagging deviations." The next takes up how to give AI the frame (guardrails) that keeps deviations from being made in the first place. A design discussion that sits ahead of detection
- AI Programming Vol. 1 ── The Basics of Code Generation ── the principle that an LLM "lays out a likely continuation without understanding meaning." The foundation for why you cannot swallow AI detection whole in this installment
- AI Marketing Vol. 5 ── Regulation of Pharma Content × AI ── the balance between the speed of generation and the weight of review. Reading the creating side and the reviewing side against the same regulatory yardstick
The point of bringing AI into deviation detection is not "letting the machine decide pass or fail." It is that the machine flags the buds of deviation broadly and quickly, before a tired human eye lets them slip. Exaggeration (Article 66) is the strength of assertion; off-label (Article 68) is the check against the package insert; a missing source is behind the numbers. The knack differs by type, but what they share is a single point: a deviation is a gap with facts that lie outside the document.
So always hand AI the facts to check against. And the candidates it flags are judged by a human against the primary sources. Bear the false positives; forbid the false negatives. Even when AI says "no problem," that is not proof of a pass. Thicken the confirmation step exactly to the degree flagging got faster. The next installment moves one step ahead of this detection ── how to give AI the frame that keeps deviations from arising at all: guardrail design.
- Material deviations converge into "exaggeration (Article 66)," "off-label efficacy (Article 68)," and "missing sources or evidence (SIP Guidelines)," and each appears not from the text alone but as "a gap with facts outside the document ── the approved content, the original cited work." So if you have AI detect them, always hand over what to check against (the package insert, the original). Ask "is this off-label?" without handing it over, and the model pretends to know and is plausibly wrong.
- Use AI detection in one direction only ── "flag the suspicious" ── and not for a "pass verdict." The unforgivable failure in material review is the false negative (missing a deviation); the false positive (over-flagging) is only a cost. So swing sensitivity high and design it so a human knocks down the flagged candidates. Even when AI says "no problem," that is not proof there is no deviation.
- AI takes the prep of "flagging broadly, quickly, and without tiring"; the human takes the final confirmation of "judging against the facts and bearing responsibility." Keep this asymmetric division and review becomes fast and reliable; break it and it grows dangerous exactly to the degree it got faster. Note that what material review looks at is how efficacy and safety are described, not the terms of a transaction such as price, stock, or ordering (the domain of the wholesaler and hospital procurement).
- Ministry of Health, Labour and Welfare. Act on Securing Quality, Efficacy and Safety of Products Including Pharmaceuticals and Medical Devices (Pharmaceutical and Medical Device Act), Articles 66, 68, and 68-2. (Article 66 = prohibition of exaggerated advertising, etc.; Article 68 = prohibition of advertising drugs and the like before approval; Article 68-2 = provision of information for proper use. The basis articles for the three types of deviation.)
- Director-General, Pharmaceutical Safety and Environmental Health Bureau, MHLW. On the Revision of the Standards for Fair Advertising of Drugs and the Like. September 29, 2017, Yakusei-hatsu 0929 No. 4. (The notice that sets the Standards for Fair Advertising themselves.)
- Director, Compliance and Narcotics Division, Pharmaceutical Safety and Environmental Health Bureau, MHLW. On the Explanation of and Points of Attention for the Standards for Fair Advertising of Drugs and the Like. September 29, 2017, Yakusei-kanma-hatsu 0929 No. 5. (The operational commentary on the Standards for Fair Advertising. A primary source showing concrete criteria for exaggeration, comparison, and safety expressions.)
- Director-General, Pharmaceutical Safety and Environmental Health Bureau, MHLW. Guidelines on Sales Information Provision Activities for Prescription Drugs. September 25, 2018, Yakusei-hatsu 0925 No. 1 (2018). (The SIP Guidelines. Sets scientific evidence and clear sourcing, and the handling of off-label information.)
- Ministry of Health, Labour and Welfare. Monitoring Project Report on Sales Information Provision Activities for Prescription Drugs. Each fiscal year. (Records actual deviation cases with company names anonymized. Lets you confirm typical exaggeration, off-label, and missing-source cases.)
- Japan Pharmaceutical Manufacturers Association. Guidelines for Preparing Product Information Summaries for Prescription Drugs and the Like. JPMA. (The preparation guidelines for product information summaries, specialist-journal advertising, and the like. The practical standard for sourcing and data presentation in materials.)