AI Material Review 03 — Deviation Detection with AI: Exaggeration, Off-Label, and Missing Sources | AI Material Review | Pharmaceutical Advertising Regulation: Material Creation, Review & Use in Japan

Deviation Detection with AI ── Exaggeration, Off-Label, Missing Sources── The machine flags the buds of deviation; the human confirms them

The thing to fear most in material review is the miss. A single explanatory document may slip in a sentence that reads as "too effective," an efficacy claim that strays just past the approved range, a number with no source ── these "buds of deviation" work their way in. The more the volume grows and the closer the deadline looms, the more the human eye lets things through. The push to bring AI in here is growing stronger on the ground. But the point is not to let AI "issue a pass/fail verdict." AI mechanically flags the buds of deviation, and a human confirms each one. That division of labor is the subject of this installment. Exaggeration (Article 66), off-label efficacy (Article 68), and missing sources or evidence ── for the three types that most often cause trouble in materials, we will look, from first principles, at how AI can detect them and where it must always be kept out.

01Types of deviation ── what causes trouble in materials is largely fixed

The deviations that get caught in material review are not infinite. The types that recur as problems in prescription-drug materials are, to a fair degree, fixed. Sorting out these types first makes it clear what to have AI look for.

Type 01

Exaggeration / Absolutes

Reads as "too effective"

Expressions such as "remarkable efficacy," "definitely," or "no worry of side effects" that make efficacy or safety appear greater than it is. These fall under the exaggerated advertising prohibited by Article 66 of the Pharmaceutical and Medical Device Act. The most frequent type.

Type 02

Straying beyond the approval

States "out of range"

Statements that go beyond the approved efficacy, effects, dosage, and administration. Suggesting a not-yet-approved use touches Article 68 (prohibition of advertising before approval).

Type 03

Missing sources / evidence

Asserts "without basis"

Figures or graphs with no source, a mismatch between the cited work and the body text, or in-house estimates dressed up as clinical data. A type strictly warned against by the sales information provision activity guidelines.

Type 04

Deviation in comparison / superiority

Implies "we win"

Making the drug read as superior to others without the backing of a direct comparative trial, or lining up only the convenient data. An area that also overlaps with the Fair Competition Code.

What the four share is this: each is a gap between "the words written" and "the facts of approval and sourcing." A deviation is not the text being bad on its own; it becomes a deviation only when held up against facts that lie outside the document (the approved content of the package insert, the original cited work). This is the crux that shapes the design of AI detection.

Don't mistake what is under review: a material is, above all, a document for providing information. What an MR (= medical representative) handles is information provision, not price, stock, delivery, ordering, or price negotiation. Those belong to the transactions and logistics between the drug wholesaler and hospital procurement ── a different world from the deviations that material review looks at. What we have AI search for is "how efficacy and safety were described," not the terms of a transaction ── this line is fixed first.

02How AI detection works ── flag by rule, read by meaning

When you have AI search for deviations in materials, the methods split broadly into two. Combining them is the practical form.

One is rule-based. You build a list of watch words such as "remarkable efficacy," "always," or "world first," and flag them mechanically. It is fast, and why something was flagged is clear. But it is weak against paraphrase. Ban "remarkable efficacy" and "striking improvement" sails straight through.

The other is meaning-based matching by a large language model (= an AI trained on huge amounts of text; hereafter LLM). An LLM reads not the words themselves but what a sentence is trying to say. "Striking improvement" and "remarkable efficacy" can both be caught as the same intent to overstate efficacy. But as earlier installments showed, an LLM does not understand meaning; it merely lays out, by probability, "the word likely to come next in this context." So it can be plausibly wrong.

Rule-based detection	LLM-based detection
Flags on watch-word matches	Flags on the meaning / intent of the sentence
Weak against paraphrase and metaphor	Strong against paraphrase, but judgment wavers
Clear explanation of why it flagged	Gives reasoning, but you can't trust it without checking
Cannot match against approved content	Given the package insert, can point out out-of-range items

In practice, a two-stage setup is realistic: first flag the obvious landmine words by rule, then flag paraphrased and context-dependent deviations with the LLM. And if you have the LLM do matching, always hand it the facts to check against. Ask "is this off-label?" without showing the approved content and the model will answer by pretending to know.

03Detecting exaggeration (Article 66) ── reading "too effective"

Article 66 of the Pharmaceutical and Medical Device Act prohibits, whether explicit or implicit, advertising, describing, or spreading false or exaggerated articles about the efficacy, effects, or safety of drugs and the like. This is the most frequent deviation in materials. Exaggeration arises not only from outright lies but also from a pile-up of assertions and emphasis.

When you have AI flag exaggeration, the things to watch for are as follows.

Strength of assertion ── does it flatly state "always," "definitely," "safe"? There are no absolutes with drugs, and a flat assertion tilts toward exaggeration on its own
Understating safety ── does it make risk look lighter than it is, e.g., "almost no side effects," "can be used without worry"?
Superlatives / uniqueness ── unbacked first-place expressions such as "most effective," "the only"
Implication through testimonials or photos ── even if the text says nothing, does a figure or case photo showing dramatic change over-impress the effect?

An LLM is relatively good at reading this "strength of assertion" and "excess of impression." It can catch euphemistic exaggeration that a watch-word list alone would miss. On the other hand, it also tends to over-flag, catching even properly bounded statements as "maybe exaggeration." A design that flags hard and lets a human decide can tolerate this over-flagging. The reverse cannot (Section 6).

04Detecting off-label efficacy (Article 68) ── checking against the package insert

Article 68 prohibits advertising the efficacy, effects, and so on of drugs and the like that have not received approval. Even for an approved drug, suggesting in a material a use that exceeds the range of approved efficacy and effects can, in substance, become advertising of the unapproved. Suggestion of off-label use is one of the most nerve-wracking areas in material review.

This detection differs in nature from exaggeration detection. You cannot judge it by reading the text alone. It is the work of checking sentence by sentence against the approved content written in that drug's package insert. So if you leave it to AI, you must always give it the approved efficacy, effects, dosage, administration, and target patients as matching material.

Handing over what to check against is the premise: in judging the range of approval, asking an LLM "is this off-label?" without handing it the package insert is like asking for directions without showing the map. From its training-data memory, the model answers plausibly ── but inaccurately. Keep the approved content, a primary source, at hand and match each statement in the material against it, item by item, to see whether it fits within. Use AI for the prep work of this matching; the final judgment of in-range or out-of-range is made by a human looking at the package insert.

The practical use is this. Have AI extract every statement in the material about efficacy and effects, and lay out which entry in the package insert each corresponds to. Any statement for which no corresponding approved entry is found is passed to a human as a suspected off-label item. AI takes over the drudge work of extraction and matching, and the reviewer concentrates on judging the suspicious spots.

05Missing sources / evidence ── look behind the numbers and graphs

The guidelines on sales information provision activities (= the SIP Guidelines; the Guidelines on Sales Information Provision Activities for Prescription Drugs, a 2018 notice from the Director-General of the Pharmaceutical Safety and Environmental Health Bureau, MHLW) strongly require that the information provided rest on scientific, objective evidence and clearly state its source. In material deviations, this missing evidence is easily overlooked. The wording may be moderate while the numbers behind it float free.

What AI can catch is, for example, omissions and inconsistencies like these.

Figures / graphs with no source ── "efficacy rate 80%" with no citation attached
Gap between body text and cited work ── the conclusion of the cited paper and the material's phrasing disagree (the original states only a limited effect, yet the material generalizes it)
Deceptive presentation of in-house data ── in-house estimates or surveys shown as if they were peer-reviewed results
Exaggeration through graph styling ── truncating an axis or manipulating the scale to make a difference look larger

What to be careful of here is AI's limit. That "a source is not written" is relatively detectable even by AI. But whether "the source that is written truly backs that content" cannot be known without actually reading the original cited work against it. Unless you hand AI the original, the model has no way to verify the citation's soundness and will wave through a consistency that may not even exist as "probably fine." Presence of a source by AI, soundness of a source by a human against the original ── here too, the division of labor is clear.

06False positives and false negatives ── which you fear decides the design

The quality of a detection system is measured by two kinds of miss. False positives (= flagging as a deviation what is not) and false negatives (= missing a deviation that is real). These two are in a tug-of-war: reduce one and the other rises.

False positive (over-detection)	False negative (a miss)
Flags a fine statement as "maybe a deviation"	Lets a real deviation slip through
Adds confirmation work for the reviewer	Exaggeration / off-label goes out into the world as is
A cost problem (time is eaten)	A harm problem (regulatory violation, impact on patients)
Bearable	Must not be borne

The failure that is unforgivable in material review is clear. The false negative. Miss a deviation and let the material out, and real harm follows ── the transmission of exaggerated advertising or off-label efficacy. A false positive that adds human confirmation is a nuisance but not harm. So AI detection should, as a principle, err toward "flag too much rather than miss." Set sensitivity high and let a human knock down the candidates one by one. Loudly flagging and being overruled serves the purpose of review better than quietly waving things through.

The design fence: AI deviation detection must not be used for a "pass verdict." AI saying "no problem" is not proof that there is no deviation. It may be used in one direction only ── a human confirms the spots AI flagged as "suspicious." The pass is issued only by a human, held up against the facts. Break this asymmetry and false negatives ship as they are.

07The human's final confirmation ── AI does the prep, the human judges

Pulling it all together, the roles of AI and human split cleanly. What AI takes on is flagging broadly, quickly, and without tiring. What the human takes on is judging the flagged candidates against the facts and bearing responsibility.

The work that remains in the reviewer's hands narrows to three things.

Check against the facts ── match the candidates AI flagged against the package insert, the original cited work, and related notices, and confirm whether it is truly a deviation. This is the work of a human going to the primary sources, and cannot be fully entrusted to AI
Judge by context ── the same wording can be a deviation or not depending on the flow of the whole material and the intended reader. Reading the whole through is a human judgment
Bear responsibility for the verdict and the record ── the decision to pass, fix, or stop, and the record of its basis. "The AI said no problem" is not a reason for passing a review

If this division is kept, bringing AI in makes review faster and cuts misses. Conversely, swallow AI's "no problem" whole and skip human confirmation, and it grows dangerous exactly to the degree it got faster. The time freed by speed should go into thickening confirmation ── the principle repeated in earlier installments holds here just the same. Even the MHLW's published monitoring-project reports on sales information provision activities show that most deviations are born from "a single sentence no one checked." The last checkpoint is guarded by a human.

08Connections to other chapters ── from detection, onward

This installment's deviation detection connects to other installments of the AI Material Review series and to other chapters on this site. Read together, they make the whole picture of building AI into review three-dimensional.

AI Material Review Vol. 4 ── Giving AI the Rules (Guardrail Design) ── this installment is about "flagging deviations." The next takes up how to give AI the frame (guardrails) that keeps deviations from being made in the first place. A design discussion that sits ahead of detection
AI Programming Vol. 1 ── The Basics of Code Generation ── the principle that an LLM "lays out a likely continuation without understanding meaning." The foundation for why you cannot swallow AI detection whole in this installment
AI Marketing Vol. 5 ── Regulation of Pharma Content × AI ── the balance between the speed of generation and the weight of review. Reading the creating side and the reviewing side against the same regulatory yardstick

In closing

The point of bringing AI into deviation detection is not "letting the machine decide pass or fail." It is that the machine flags the buds of deviation broadly and quickly, before a tired human eye lets them slip. Exaggeration (Article 66) is the strength of assertion; off-label (Article 68) is the check against the package insert; a missing source is behind the numbers. The knack differs by type, but what they share is a single point: a deviation is a gap with facts that lie outside the document.

So always hand AI the facts to check against. And the candidates it flags are judged by a human against the primary sources. Bear the false positives; forbid the false negatives. Even when AI says "no problem," that is not proof of a pass. Thicken the confirmation step exactly to the degree flagging got faster. The next installment moves one step ahead of this detection ── how to give AI the frame that keeps deviations from arising at all: guardrail design.

Key Points ── three to take away

Material deviations converge into "exaggeration (Article 66)," "off-label efficacy (Article 68)," and "missing sources or evidence (SIP Guidelines)," and each appears not from the text alone but as "a gap with facts outside the document ── the approved content, the original cited work." So if you have AI detect them, always hand over what to check against (the package insert, the original). Ask "is this off-label?" without handing it over, and the model pretends to know and is plausibly wrong.
Use AI detection in one direction only ── "flag the suspicious" ── and not for a "pass verdict." The unforgivable failure in material review is the false negative (missing a deviation); the false positive (over-flagging) is only a cost. So swing sensitivity high and design it so a human knocks down the flagged candidates. Even when AI says "no problem," that is not proof there is no deviation.
AI takes the prep of "flagging broadly, quickly, and without tiring"; the human takes the final confirmation of "judging against the facts and bearing responsibility." Keep this asymmetric division and review becomes fast and reliable; break it and it grows dangerous exactly to the degree it got faster. Note that what material review looks at is how efficacy and safety are described, not the terms of a transaction such as price, stock, or ordering (the domain of the wholesaler and hospital procurement).

Sources · References

Ministry of Health, Labour and Welfare. Act on Securing Quality, Efficacy and Safety of Products Including Pharmaceuticals and Medical Devices (Pharmaceutical and Medical Device Act), Articles 66, 68, and 68-2. (Article 66 = prohibition of exaggerated advertising, etc.; Article 68 = prohibition of advertising drugs and the like before approval; Article 68-2 = provision of information for proper use. The basis articles for the three types of deviation.)
Director-General, Pharmaceutical Safety and Environmental Health Bureau, MHLW. On the Revision of the Standards for Fair Advertising of Drugs and the Like. September 29, 2017, Yakusei-hatsu 0929 No. 4. (The notice that sets the Standards for Fair Advertising themselves.)
Director, Compliance and Narcotics Division, Pharmaceutical Safety and Environmental Health Bureau, MHLW. On the Explanation of and Points of Attention for the Standards for Fair Advertising of Drugs and the Like. September 29, 2017, Yakusei-kanma-hatsu 0929 No. 5. (The operational commentary on the Standards for Fair Advertising. A primary source showing concrete criteria for exaggeration, comparison, and safety expressions.)
Director-General, Pharmaceutical Safety and Environmental Health Bureau, MHLW. Guidelines on Sales Information Provision Activities for Prescription Drugs. September 25, 2018, Yakusei-hatsu 0925 No. 1 (2018). (The SIP Guidelines. Sets scientific evidence and clear sourcing, and the handling of off-label information.)
Ministry of Health, Labour and Welfare. Monitoring Project Report on Sales Information Provision Activities for Prescription Drugs. Each fiscal year. (Records actual deviation cases with company names anonymized. Lets you confirm typical exaggeration, off-label, and missing-source cases.)
Japan Pharmaceutical Manufacturers Association. Guidelines for Preparing Product Information Summaries for Prescription Drugs and the Like. JPMA. (The preparation guidelines for product information summaries, specialist-journal advertising, and the like. The practical standard for sourcing and data presentation in materials.)

← Back to AI Material Review