A Parts List for the Review Engine ── The Hierarchy of Norms, a Fixed Verdict Vocabulary, and a Two-Track Inspection

The AI material-review system is not one clever brain but an assembly of parts, each with a fixed role. Which norm serves as the basis (L1–L5), what a verdict is called (BLACK–WHITE, ALERT), how a finding is classified (R1–R8), and how far the machine is pinned down in code before handing over to the LLM (= large language model). This article takes inventory of these parts one by one. The design thinking itself is covered in the previous article (philosophy); the order in which the parts actually run is left to the next article (process).

01The Hierarchy of Norms, L1–L5

The norms the system may cite are organized in advance into five layers. L1 is Articles 66, 67, and 68 of the Pharmaceuticals and Medical Devices Act (Article 66 = prohibition of exaggerated advertising; Article 67 = restrictions on advertising drugs for designated diseases; Article 68 = prohibition of advertising unapproved drugs). L2 is the Standards for Fair Advertising Practices Concerning Drugs and Related Products. L3 is the Guidelines for Sales Information Provision Activities and their Q&A. L4 is the JPMA Code and the drafting guidelines for product information overviews and related materials. L5 is the public eye — not a written norm, but society's critical gaze. The order of the layers is not a ranking of legal force. L1 is statute and legally binding. L2 and L3 are both MHLW notifications (= administrative guidance, not law itself), and L4 is industry self-regulation; the ordering reflects the public character of the issuing body (state → industry). L5 sits outside written norms. Every finding is tied to one of these layers. A finding with no layer — "somehow problematic" — has no place in this framework.

02The Verdict Vocabulary Is Fixed

Verdict labels are not free text either. Legal verdicts use a five-grade scale: BLACK (= clear violation) / GRAY3 / GRAY2 / GRAY1 (= gray zones; the higher the number, the stronger the suspicion of violation) / WHITE (= no problem). Public-eye verdicts use a four-grade scale: ALERT-3 / ALERT-2 / ALERT-1 (= warnings; the higher the number, the more serious) / CLEAR (= no concern). This vocabulary is fixed by type (= a defined data shape in the program), and a verdict outside this value range simply cannot be generated. If the LLM tries to write a vague verdict such as "somewhat concerning" or "mostly fine," the type refuses to accept it. The vocabulary is fixed so that humans can later sort, compare, and tally the findings.

03The Two-Lane System — Never Mixing Law and the Public Eye

Each finding belongs to exactly one lane: the legal lane (BLACK–WHITE) or the public-eye lane (ALERT–CLEAR). A finding that carries both, or neither, is rejected at generation time. Rank comparisons are also made only within a single lane. The question of whether GRAY2 or ALERT-2 is heavier is made impossible by design, because legal violations and social concerns differ in both their grounds and their consequences. Handling also diverges: ALERT-2 and above go to a human reviewer, and the most serious, ALERT-3, is escalated to management. The public-eye lane, however, never directly blocks a material.

04Risk Categories R1–R8

The substance of a finding is classified into eight categories. R1 deviation from the approved scope; R2 minimization of safety concerns (= making risks such as side effects appear lighter than they are); R3 exaggeration (explicit); R4 exaggeration (implied); R5 disparaging comparison (= comparisons that denigrate other drugs); R6 evidence misuse (= improper use of supporting data); R7 missing required elements (= absence of mandatory statements); R8 social ethics. Where the layers (L1–L5) show "which norm was applied," the categories show "what was done wrong." Only with both axes together is a finding fully classified.

05The Two-Track Approach: Deterministic Matching and LLM Semantic Judgment

Inspection runs on two tracks. Matching of words, numbers, and structure is done in code. Does the text contain a prohibited term? Does the graph's axis start at zero? Is any required statement missing? These are "deterministic matches" — identical input is guaranteed to produce identical output, 100% of the time. Judgments about implication, impression, and suggestion, on the other hand, are entrusted to the LLM alone. A question like "does this arrangement of otherwise accurate facts create a false impression?" cannot be written in code. Conversely, calculation and counting are never delegated to the LLM. The smallest unit of inspection, the atomic item (= the smallest check item that can be inspected independently), numbers 694 items. About half are handled by deterministic matching, about half by LLM semantic judgment, and the small remainder goes to image inspection and persona inspection (the public eye, described below). It is a division of labor: settle the unambiguous parts with cheap code, and spend the expensive LLM only where judgment is required.

06Reference Documents — The Three-Piece Set of Approval Facts

Against what are a material's claims checked? There are three reference documents. First, the electronic package insert (= the drug's official descriptive document). It is the master reference for judging the approved scope, and a material may not exceed its wording by a single character. Second, the review report — the record of what the regulator evaluated, and did not accept, at the time of approval. Here the check is whether the material implies efficacy for items the PMDA (= Pharmaceuticals and Medical Devices Agency, the authority that conducts approval reviews) did not accept. Third, the RMP (= the post-marketing safety management plan), the list of risks requiring attention after launch; the check is whether those risks are reflected in the material. The three complement one another: the electronic package insert defines "what may be said," the review report "why it was approved that way," and the RMP "what to watch for after it is sold."

07Type-Enforced Verifiability — Ungrounded Findings Cannot Exist

Every finding must carry a reference to the original text of the norm and the ID of an atomic item. This is not an operating rule; it is enforcement built into the data type. A finding lacking its grounds errors out and disappears the instant it is generated. When a third party — regulator, lawyer, journalist — asks "why this verdict?", the answer can always be traced back to the norm's original text. I regard this single point as the minimum condition for using machine findings in human review. Where in the review flow this enforcement operates is covered in the next article (process).

08The Public Eye — 8 Personas

Separate from checks against written norms, there is an inspection that reads the entire material through eight viewpoints: patient, family, healthcare professional, media, social media, investor, regulator, and lawyer. From each standpoint, can this material withstand criticism? Issues of ethics, dignity, and patient rights that sit one step short of an explicit violation can be caught only from these viewpoints. The eight viewpoints do not strain to manufacture concerns; only what is actually found is placed on the public-eye lane as an ALERT.

09Deliberation — A Virtual Review by 60 Members

Important verdicts are not left to a single LLM's discretion. A virtual review (= a simulated review in which the AI plays different expert roles) runs in two stages. First, a 30-member debate: three groups — reviewers with pharmaceutical-company review experience, package-insert specialists, and advertising-regulation lawyers (10 each, 30 in total) — must present both the arguments for revision and the counterarguments, and dissenting opinions are stated explicitly. Next, a 30-member majority vote: three groups modeled on regulatory authorities (again 10 each) vote on each point of contention, and a majority of the 30 votes yields a candidate verdict of "revision required." Sixty members in all. The maker (the engine that raised the findings) and the acceptor (the deliberation) are kept separate so that no finding is approved by its own author. The design distrusts conclusions that appear unanimous; not only contested points but seemingly unanimous ones are also sent to human review. The deliberation goes only as far as a candidate verdict on whether revision is needed — the final verdict and the responsibility remain with the human reviewer. Deliberation is a part that raises the accuracy of first-pass screening, not a replacement for humans.

Primary Norms

Pharmaceuticals and Medical Devices Act (Japan). "Articles 66 (false or exaggerated advertising), 67, and 68 (advertising of unapproved drugs)." (Norm layer L1)
Ministry of Health, Labour and Welfare. "Standards for Appropriate Advertising of Drugs." (Norm layer L2)
Ministry of Health, Labour and Welfare. "Guidelines for Sales Information Provision Activities and Q&A." 2018–. (Norm layer L3)
Japan Pharmaceutical Manufacturers Association. "Promotion Code and Guidelines for Preparing Product Information Overviews." (Norm layer L4)

About this section: Based on the design document and execution-process specification (July 2026) of the AI material-review system built by the operator of this site. The system is a first-pass screening; final judgment and responsibility remain with the human reviewer.

← Back to AI Review Algorithm