When a standard is handed out only as words, each person reads it differently. So the one who decides must line up real examples of both a pass and a fail, so anyone draws the same line. And the final push is made not by a machine or a vote, but under one's own name.

Line Up the Reference Cases Before Scoring — The Driving Examiner

Picture a driving-test examiner. If each examiner had a different pass line for "checking safety," the same driving would pass with one and fail with another. So examiners meet in advance, watch real footage of "this passes" and "this fails," and align their eyes. Words on a scoring sheet alone get read differently by different people.

The judge who decides whether a creator of materials (the explanatory documents handed to doctors and patients) can work alone faces the same problem. Handing out the standard as text is not enough. You line up a real example of a pass and a real example of a fail as "reference cases (anchors)," and every judge looks at them before scoring. An anchor is the weight that holds a boat to one spot — a fixed reference point so judgment does not drift. Words wobble with interpretation; a real example does not.

Do not hand out the standard as words; hand it out as real examples. Only when both a real pass and a real fail are shown does the line align.

Build the Reference Cases from Real Deviations — The Casebook of Failures

Medical training has a "casebook of failures." It keeps not only the successes but a record of why mistakes happened. Reference cases work the same way: the fail samples must not be invented. Use the deviation patterns actually flagged in the MHLW monitoring project. A real pattern lands with a creator better than an imagined bad example.

The reported cases have clear patterns. One started a graph's vertical axis at 0 instead of the proper 0.8, making two drugs look indistinguishable. Another prepared no material on the primary endpoint (the most important result) and explained only a secondary endpoint (a side result) where a significant difference appeared. Another wrote in the product information summary that the drug "requires no screening or testing" when pre-dose screening (a sorting test) was in fact mandatory. Another claimed efficacy with a graph of just 9 cases (4 versus 5) and no statistical analysis. Every one of these becomes a "fail reference case" the judge should line up.

What matters is that these deviations are not the work of "bad people." They grow from circuits ordinary creators fall into under pressure. So the judge attaches to each reference case not only "which deviation" but "which mindset it came from." Showing the pattern, the mindset, and "which capability should stop it" as one set, as in the table below, lets a creator notice the same circuit inside themselves.

Fail reference case (real pattern)Mindset behind itCapability that should stop it
Axis starting at 0 to erase a differenceMotivated reasoning (the desired conclusion comes first)Misperception forecasting / self-review
Skipping the primary endpoint, explaining only the secondaryThe sin of omission ("I just didn't say it")Balance design
Writing "no screening needed" when it is mandatoryLocal rationalization (emphasizing just this point)Source grounding
"The professor said it's fine" when challengedExternalizing responsibility (passing it to authority)Self-review

Do Not Draw the Pass Line by Total Score — The Airport Screening

Think of airport security. No matter how fine the bag looks, if a single blade is inside, it does not pass. Even with 99 other perfect points, if that one point breaks the floor (the non-negotiable minimum), it fails. This is the non-compensatory gate — a checkpoint that cannot be made up for elsewhere.

The judge's most common shortcut is letting skill in appeal (the power to communicate) fill a hole in source grounding (the ability to return to the facts). The presentation is smooth, the figures are clean, the explanation is skilled. Dragged by that impression, the judge overlooks the one point that cannot be traced back to a source. That passes through the most dangerous combination — "high design × low fidelity = persuasive misperception."

Skill does not fill the hole. Can it return to the source, is it balanced, does it avoid misperception — if even one of these floors breaks, it fails no matter how high the total.

So the judge splits scoring into two stages. The first stage checks the floor (necessary conditions). Source grounding, balance, and misperception forecasting are seen as pass/fail, yes or no. One "no" decides a fail right there. The second stage is the total score (excellence), seen only among those who cleared the floor: "how much can be entrusted." Not mixing the floor with the total score is the judge's first discipline.

A Person Makes the Final Push — The Health-Check Doctor

At a health check, machines measure blood pressure and numbers. But whether "a closer exam is needed" is decided in the end by a doctor. Numbers aid the judgment; they do not take it over. Material pass/fail is the same: a checklist or scoring tool is a device that aids judgment, not the judge.

Why a person at the end? So the judge does not commit "externalizing responsibility" themselves. In a reported case, no one disclosed COI (conflict of interest — the financial tie between a presenter and a product), and the reason was "because we were not asked." Because no one took responsibility under their own name, the gap appeared. If a judge starts saying "the checklist passed it," it becomes the same structure.

So pass/fail is fixed by the judge laying a final push, under their own name, on top of the tool's output. Reach a state where you can say "I judged this creator can be entrusted to work alone." Leave even one line of the grounds (which reference case it was compared against, which floor it met). When someone later asks "why did you pass it," you — not the tool — can answer. That is the core of the judge's responsibility.

The Judge Is Also Judged — The Final Checker of the Proof Sheet

In printing a book, someone stamps approval on the proof sheet (the pre-print check copy) at the end. If that person misses something, the error is printed into tens of thousands of copies. So the final checker's judgment is also examined in hindsight. The judge does not end as "the one who judges"; their own judgment is recorded and reviewed later too.

This is not to bind the judge but to keep the quality of judgment. Knowing your judgment will be reviewed later makes shortcuts — "swept along by skill," "decided by impression without checking the reference cases" — less likely. The judge holds, for themselves, a mechanism to monitor themselves. The point from the 7th piece, "a person who assumes their own material has no problem cannot be entrusted," in fact applies to the judge too.

What this series has said gathers into one. A creator fit to work alone is not chosen by a high average. It is the person who reliably meets the floor of being able to return to the facts, and on top of that has the design power to communicate. And the judge who draws that line carries the responsibility to line up reference cases as real examples, to refuse to compromise the floor, and to decide, in the end, under their own name.

Who Can Draft Unsupervised ── Map of all 10 episodes

  1. Vol. 1: How to Tell Who Can Build Materials Alone — Don't Judge by the Average of Skills ── Readiness to work alone is decided not by the average of eight skills, but by a floor: can the person return to the source.
  2. Vol. 2: Bending the Facts Is Far Heavier Than a Plain Mistake ── A persuasive piece that creates a false impression does far more harm than a dull but correct one.
  3. Vol. 3: The More Skilled the Communicator, the Less Their Misleading Slips Get Noticed ── Skilled presentation makes misleading framing look reasonable, so the most fluent creators are the ones whose errors slip through.
  4. Vol. 4: Fail One, Fail All ── Appeal Cannot Patch a Hole in the Facts ── Set the bar for independent work as a non-compensatory floor, not an average — strong appeal can never fill a gap in the source.
  5. Vol. 5: Demand One Thing Above All: Can They Return to the Source ── Tracing every claim back to its source is the floor that persuasion can never fill in for.
  6. Vol. 6: Making It Look Good and Making It Right Are Two Different Things ── Visual polish and fidelity to the source are different skills. Judging correctness by beauty lets the most dangerous errors slip through.
  7. Vol. 7: People Who Are Sure Their Own Material Is Fine Cannot Be Trusted Alone ── Measure the gap between self-assessment and actual skill as a separate, independent gate from persuasive ability.
  8. Vol. 8: Four Gates — Draft, Self-Review, Source-Check, Balance-Check ── Tell whether someone can release materials alone by whether they pass four gates in order.
  9. Vol. 9: Judging Three People by One Standard: The Persuader, the Precise Craftsperson, and the Quiet but Trustworthy ── A story that lines up three character types against one floor (can they return to the source) and decides pass or fail by a non-compensatory gate, not by an average.
  10. Vol. 10 (this episode): The Judge's Responsibility — Line Up the Reference Cases, and Let a Person Make the Final Call ── The person who decides pass or fail must anchor the standard with real reference cases, not just words, and make the final call under their own name rather than leaving it to a checklist or a machine.
In closing

The judge's job is not to circle items on a scoring sheet. Line up real examples of a pass and a fail, do not let skill in appeal fill a source hole, and fix a fail the moment the floor breaks. Then lay the final push under your own name — not a tool, not a vote.

The judge, too, stands to have their judgment reviewed later. So the judge imposes on themselves the same self-monitoring they ask of creators: did I check against the reference cases, did I avoid being swept by impression, did I refuse to compromise the floor. The standard traced across these ten pieces, in the end, rebounds onto the judge.

Key Points ── Three to take with you
  1. Align the standard with real reference cases. A words-only standard gets read differently by each person. Only by showing real pass and fail examples (anchors) does everyone draw the same line.
  2. Do not fill the floor with a total score. Source grounding, balance, and misperception forecasting are pass/fail necessary conditions. If one breaks, it fails no matter how strong the appeal (non-compensatory gate).
  3. A person decides at the end, under their own name. A checklist aids judgment but is not the judge. To keep the judge from the "because we were not asked" escape, fix the result leaving one line of grounds.
Sources & references
  1. Ministry of Health, Labour and Welfare, Compliance and Narcotics Division (commissioned project). Report on the Monitoring of Promotional Information Activities for Prescription Drugs (March 2024 and prior years). Flagged cases are published with company names anonymized; the deviation patterns cited here are generalized from this report.
  2. Ministry of Health, Labour and Welfare. Standards for Fair Advertising of Drugs (1980; revised 2017) and its commentary. Basis for fair, objective information and the ban on exaggerated or misleading expression.
  3. Japan Pharmaceutical Manufacturers Association. JPMA Code of Practice. Ethical standard for information activities and the approach to COI disclosure.
  4. Act on Pharmaceuticals and Medical Devices (Articles 66 and 68). Ban on exaggerated advertising (Art. 66) and on advertising unapproved drugs (Art. 68) — the legal floor underpinning materials.