The Rules That Keep Measurement Honest

If the ruler itself is bent, you score honest people low and risky people high. This fourth piece sets rules not for the person being measured but for the one doing the measuring. Judge by behavioral fact, not impression; never lead the answer; make traceability to the source the floor. These six promises hold up the trust in the evaluation itself.

First, check the ruler itself

If your bathroom scale always reads two kilos heavy, it does not matter how honestly you step on it — the number is a lie. Before measuring a person, check the measuring tool first. That is the subject of this fourth piece. Up to the third, we looked at how to read fidelity to fact and skill in communication from the materials a person actually made. But if the reader's own scale is bent, even good evidence comes out distorted. So this time we set six ground rules — not for the author being measured, but for the one doing the measuring.

A ground rule is a shared agreement the measurer must not bend on a whim. Whoever measures, whenever they measure, the same author should arrive at nearly the same conclusion. We put that foundation into words.

Rule 1 — Measure by behavioral fact

A driving examiner does not pass you because "this person seems like a safe driver." They score only the facts of behavior: did the tires actually stop at the stop line, did the eyes actually check the mirror. Measuring an author works the same way. "Enthusiastic" or "has good taste" is an impression, not a fact. What we measure is what the person did inside the material they made, and what they checked before releasing it.

Recall a real reported case. One author did not even prepare materials for the primary endpoint — the single most important yardstick for judging a drug — and explained only a secondary item where a significant difference (a gap hard to explain by chance) appeared. Measured by impression, such a person might score high as "a good explainer." Measured by behavioral fact, the record reads: "did not present the most important result." Impression hides the deviation; behavioral fact preserves it.

Way of measuring	What it looks at	How it scores the same author
By impression	Mood, smooth talk, enthusiasm	Scores high as "good explainer" even when the primary item is hidden
By behavioral fact	What was put in and what was left out	Leaves the record: "did not show the primary item"

Rule 2 — Pin down who did it

When a restaurant causes food poisoning, "someone in the kitchen" prevents nothing. Only by identifying which step, which person, when, and what they did can it be fixed. Materials are the same: do not blur the subject of the evaluation. Not "the team worked hard," but who made that judgment, who chose that graph.

Here the psychology of externalizing responsibility appears. In a reported case, an author explained that "a difference is showing" even though there was no significant difference in the Japanese subgroup, and when challenged, borrowed authority: "the professor says it is fine." Borrowing authority swaps the subject of one's own judgment onto someone else. If the measurer is swept along by "well, if the professor says so," the ruler itself gets pulled into the externalizing. So when measuring, keep returning the subject to the person: "what did you verify to make that judgment?"

Rule 3 — Tie it to evidence

When fixing a galley proof — the trial print checked before printing — a veteran does not mark red on "somehow it feels off." They show which rule, at which point, is broken, then fix it. Measuring an author works the same way: tie each judgment to a basis. Not "this material is good or bad," but "this claim can / cannot be traced back to this number in this figure of the source," recorded in a traceable form.

To measure is not to state an impression. It is to show, in traceable form, whether one can return to the source.

A reported case used a graph of just nine cases (four versus five), with no statistical processing, to claim an effect. When measuring this, do not settle for "there is a graph, so there is a basis." Record the content of the evidence: "basis is nine cases, no statistical processing." Looking not at whether a basis exists but at how strong it is — that is Rule 3.

Rule 4 — Do not lead the answer

If a survey asks "what do you think of this wonderful new feature," only the wanted answer comes back. The question used to measure must not pull the answer. This is non-leading. Ask an author "you dodged that nicely, right?" and they will say "yes." Instead ask "how did you handle this side effect?" — a question free of evaluation — and let the behavior be told.

Motivated reasoning — the mind in which a desired conclusion comes first and the presentation bends toward it — happens to the measurer too. If the wish "I want to think this person is excellent" comes first, the question bends toward that answer. So fix the questions in advance and do not change them by person or mood. In a reported case, an author reframed a side effect that should be flagged (a certain component becoming excessive) as a strength: "you can supplement that component." If the measurer leads with "what a clever touch," they mistake reframing — restating things conveniently — for talent. Only neutral questions show a deviation as a deviation.

Rules 5 and 6 — Consistency and separation

Measure twice with the same ruler and you should get the same length. Rule 5 is consistency: the same author and the same evidence should bring different measurers to nearly the same conclusion. For that, share Rules 1 through 4 in writing and do not add or subtract by personal taste.

Rule 6 is separation. Do not have one person both make and measure in the same single pass. If you measure your own material as "no problem," motivated reasoning passes straight through. In a reported case, an author flatly wrote "the risk of death does not increase" with no evidence, later explaining "since it is not currently clear, I wrote it this way." That is what happens when checking and judging mix inside the maker. Place the measuring role with a different eye, at a different time. The table below sums up the six rules and the deviations and psychology each one stops.

Rule	Example deviation it stops	Psychological driver behind it
1 Measure by fact	Explains only the secondary item, not the primary	The sin of omission
2 Pin the subject	"The professor says it is fine"	Externalizing responsibility
3 Tie to evidence	Claims effect on nine cases, no statistics	Local rationalization
4 Do not lead	Reframes a side effect as "can supplement"	Motivated reasoning
5 Be consistent	Conclusion sways with the measurer	Motivated reasoning (of the measurer)
6 Split make and measure	Self-approves "does not increase" with no basis	Motivated reasoning

Measuring Skill from Work and Behavior ── Map of all 10 episodes

Vol. 1: Measure by the Materials Actually Made, Not by Impressions or Self-Report ── A material maker's skill is measured from the actual deliverables and observable conduct, not from self-report or others' impressions.
Vol. 2: Tracing the Brief, the Choices, and the Result — In Order ── Read a creator's skill from evidence by walking through one real project in order: the brief, the thinking, the actions, and the result.
Vol. 3: Reading "Faithfulness to the Facts" and "Craft of Delivery" Out of the Work Itself ── This installment shows how to recode a finished piece into two axes — faithfulness to the facts and the craft of getting it across — by reading concrete clues, not impressions.
Vol. 4 (this episode): The Rules That Keep Measurement Honest ── Six ground rules that keep the evaluator from drifting when measuring an author's real skill.
Vol. 5: Three Rulers: Accuracy, Clarity, and Balance ── Defines three rulers for grading material-making skill and scores each on a four-step scale: accuracy as the floor, clarity as the reach, and balance as the adjustment between too much and too little.
Vol. 6: How to Decide the Level — Returning to the Source Sets the Ceiling ── Work that cannot be traced back to its source cannot earn a higher level, however polished it looks. Grounding sets the ceiling.
Vol. 7: What Deliverables Signal Which Level ── An anchor table that reads a creator's level (L1-L4) from visible deliverables and behavior patterns.
Vol. 8: How Far Can We Trust a Judgment? ── How sure a level judgment is depends on how visible the evidence is; less observable skills produce shakier judgments, so we attach a confidence to each verdict.
Vol. 9: Combine More Than Self-Assessment: Add the Reviewer's and Requester's View ── Layering four viewpoints — self, reviewer, requester, and AI — surfaces the deviations of omission that a single pair of eyes cannot see.
Vol. 10 (final): Connecting the Measurement to Pass/Fail and a Development Plan ── The finale links the score to the pass floor and a plan for what to grow next.

In closing

The six rules are not tools to bind the author but a fence that keeps the measurer's own ruler from bending. Judge by behavioral fact, return the subject to the person, record in a form traceable to the basis, do not lead the answer, stay steady across measurers, and split making from measuring. Drop any one and the danger remains of mistaking persuasive misdirection for talent.

What matters is that these six work just as well for the author's own self-monitoring. Before releasing, ask yourself "is this fact or impression," "am I offloading the subject onto someone," "can I trace this to a basis" — and most of the reported deviations stop before they go out. The promise of measuring is also the promise of making.

Key Points ── Three to take with you

Measure by behavioral fact, not impression. Not "enthusiastic" but what was put in and left out of the material. An omission that hides the primary item vanishes under impression and survives under fact.
Read borrowed authority as externalized responsibility. "The professor says it is fine" swaps the subject of judgment onto someone else. The measurer keeps returning the subject to the person.
Non-leading questions and a split between making and measuring. Evaluative questions bend the answer, and self-approval lets motivated reasoning pass straight through. Neutral questions and a separate verifying pass show a deviation as a deviation.

Sources & references

Ministry of Health, Labour and Welfare, Compliance and Narcotics Division (commissioned project). Report on the Monitoring of Promotional Information Activities for Prescription Drugs (March 2024 and prior years). Flagged cases are published with company names anonymized; the deviation patterns cited here are generalized from this report.
Ministry of Health, Labour and Welfare. Guidelines on Promotional Information Activities for Prescription Drugs. Principles for fair presentation of primary endpoints, significance, and conflicts of interest.
Japan Pharmaceutical Manufacturers Association. JPMA Code of Practice. Principles of fidelity to evidence and fair provision of information.
Behavioral assessment methodology. Behavioral Event Interview (BEI) / the STAR method. General accounts of interview techniques that measure ability from behavioral fact rather than impression.

← Back to Measuring Skill from Work and Behavior