"He seems capable" is not an evaluation. Look only at the materials left behind and the actions taken while making them. From that visible evidence, place the creator somewhere between L1 and L4. This article builds that mapping.
Think in driver's-license stages
Driving skill cannot be measured by talk. So a license splits stages by "moves you actually performed" - hill starts, parking. A learner's permit, a full license, and someone who has driven years without an accident. All say "I can drive," yet the visible behavior behind it differs completely.
Making materials is the same. A self-report like "I check my sources properly" tells us nothing about whether the person is L1 or L4. The only material for judgment is the deliverables that actually appeared and the actions taken along the way. What this article builds is that "mapping of what was seen to which level" - an anchor. An anchor is a marker that pins a vague impression to one fixed point.
There are four stages, L1-L4. L1 = finishes only the one assigned case as told. L2 = has learned the pattern and can reproduce it. L3 = understands why and can apply it elsewhere. L4 = can design the system itself and set a workplace standard. We translate these four into "what must be visible at each level" for each skill.
The same job splits by visible behavior
Picture a kitchen apprentice. Making one dish by the recipe is L1. Producing the same dish reliably with memorized amounts is L2. Understanding why the salt goes in first and applying it to other ingredients is L3. Writing the kitchen's procedure so a newcomer hits the same taste is L4. Not just the plate, but the person's movements tell you where they stand.
In material creation, take the most important floor: "can you return to the source" (grounding). An L1 person's deliverable may miss the cited page number, or mix in second-hand citation (using someone's summary in place of the original). At L2 the source column is complete and the original can be produced on request. L3 reads the source's limits - target patients, observation period, sample size - and adjusts the wording accordingly. L4 turns the source-checking procedure itself into a checklist so the whole team stands on the same floor.
| Level | Visible deliverable / behavior | In a phrase |
|---|---|---|
| L1 | Finishes one assigned case. Missing page numbers, second-hand citations remain | One-off |
| L2 | Source column complete; produces the original on request. Stable pattern | Reproduces |
| L3 | Reads source limits (n, target, period) and tunes wording | Applies |
| L4 | Standardizes the check; makes others stand on the same floor | Builds the system |
Flip a deviation case and the level boundary appears
Recall a health checkup. The way abnormal values show up reveals where the body is under strain. Deviation cases work the same way: flip what was done, and you see the level the person was stuck at.
In a reported case, the creator prepared no material for the primary endpoint (the result the drug most wanted to confirm) and explained only a secondary result that showed significance. This is the "learned the pattern but does not understand why the primary endpoint comes first" stage - an L2-level omission. The psychological driver of protecting oneself by not speaking overlaps here.
In another case, the creator said "a difference is seen" for a small Japanese subgroup where none existed, and when challenged, deflected to authority: "the professor says it's fine too." This combines weak application with externalizing responsibility. To claim L3, one must explain the weakness of the subgroup numbers in one's own words. The table below maps cases, levels, and the skill that should stop them.
| Reported deviation | Level exposed | Skill that should stop it |
|---|---|---|
| Omits primary endpoint, explains only secondary | Stuck at L2 (omission) | Balance design (show the whole picture) |
| Calls a no-difference subgroup "different" | L2 to below L3 | Source grounding (read the numbers' limits) |
| Zooms part of the y-axis to enlarge the gap | Dangerous low-fidelity x high-design | Self-review (doubt yourself before release) |
| Labels a required screening "unnecessary" | Cracked grounding floor | Misreading prediction (anticipate reader error) |
The third, the y-axis manipulation, deserves caution. The design skill to make a graph stand out is there. But fidelity to fact is not. This is what this section calls most dangerous - "high design x low fidelity = persuasive misreading" - and far from being high-level, the floor is cracked. Skill in appeal does not fill a hole in grounding.
Levels may differ by skill
When several people proof a galley (a test print before printing), one is strong on typos, one on phrasing, one on checking numbers. Even within one person, strengths and weaknesses split. The eight skills of material creation are the same: they almost never line up at one level.
For example, someone may be L3 in appeal design (making things easy to grasp) but L2 in source grounding. Looking at the total score alone, this looks "so-so." But in this section's pass logic, a floor like grounding is a non-compensable gate. Non-compensable means high scores elsewhere cannot fill the hole. If the floor is cracked, no amount of appeal skill earns a pass.
Lay out the visible deliverables by skill, and first confirm the floor (grounding) is not missing. If the floor is missing, do not use the height of other skills to raise the ranking. Only after the floor is met do you evaluate overall height as excellence.
So the evaluation sheet is not one number but a row of levels per skill. Shown as "grounding L2 / balance L3 / appeal L3 / self-review L2," the person can see where to grow to reach the next stage. This is not a table for condemnation. It is a map for the person to self-monitor which of their own circuits is weak.
How to use the anchor table, and its pitfalls
At airport baggage screening, sample photos are posted - "stop if this shows up" - so judgment does not drift between officers. The anchor table plays the same role: a sample so judgment lines up even when the evaluator changes. But applying the sample mechanically can cause misjudgment instead.
There are three pitfalls. First, a good deliverable may just mean the person happened to get a good request and good material. So line up several jobs, not one, and check whether output is stable (the core of L2). Second, behavioral evidence may not remain. If a source check was done but unrecorded, a third party cannot confirm grounding. Keeping records is itself evidence of L3-L4. Third, an excellent device not on the table may be undervalued simply because it is not on the table. The anchor fixes the floor and the typical; it does not cap excellence.
The next article, the eighth, addresses how far we can trust the level we assigned this way - whether a different person seeing the same deliverable lands on the same level, the question of judgment reliability.
Measuring Skill from Work and Behavior ── Map of all 10 episodes
- Vol. 1: Measure by the Materials Actually Made, Not by Impressions or Self-Report ── A material maker's skill is measured from the actual deliverables and observable conduct, not from self-report or others' impressions.
- Vol. 2: Tracing the Brief, the Choices, and the Result — In Order ── Read a creator's skill from evidence by walking through one real project in order: the brief, the thinking, the actions, and the result.
- Vol. 3: Reading "Faithfulness to the Facts" and "Craft of Delivery" Out of the Work Itself ── This installment shows how to recode a finished piece into two axes — faithfulness to the facts and the craft of getting it across — by reading concrete clues, not impressions.
- Vol. 4: The Rules That Keep Measurement Honest ── Six ground rules that keep the evaluator from drifting when measuring an author's real skill.
- Vol. 5: Three Rulers: Accuracy, Clarity, and Balance ── Defines three rulers for grading material-making skill and scores each on a four-step scale: accuracy as the floor, clarity as the reach, and balance as the adjustment between too much and too little.
- Vol. 6: How to Decide the Level — Returning to the Source Sets the Ceiling ── Work that cannot be traced back to its source cannot earn a higher level, however polished it looks. Grounding sets the ceiling.
- Vol. 7 (this episode): What Deliverables Signal Which Level ── An anchor table that reads a creator's level (L1-L4) from visible deliverables and behavior patterns.
- Vol. 8: How Far Can We Trust a Judgment? ── How sure a level judgment is depends on how visible the evidence is; less observable skills produce shakier judgments, so we attach a confidence to each verdict.
- Vol. 9: Combine More Than Self-Assessment: Add the Reviewer's and Requester's View ── Layering four viewpoints — self, reviewer, requester, and AI — surfaces the deviations of omission that a single pair of eyes cannot see.
- Vol. 10 (final): Connecting the Measurement to Pass/Fail and a Development Plan ── The finale links the score to the pass floor and a plan for what to grow next.
Level judgment uses as material only the deliverables left behind and the actions taken while making them - not intelligence or enthusiasm. The L1-L4 anchor table is a sample that aligns judgment even when the evaluator changes, and at the same time a map for the person to find their own weak circuit.
The order must not be forgotten. First confirm the grounding floor is not missing; if it is cracked, no amount of design skill earns a pass. Only after the floor is met do you lay out per-skill levels to measure excellence. The next article asks how far this very judgment can be trusted.
- Material is only deliverables and behavior. Place L1-L4 from the materials that actually appeared and the behavioral evidence left while making them, not self-report or impression.
- The floor is non-compensable. If grounding (returning to the source) is missing, no height of appeal or design earns a pass. The most dangerous is high design x low fidelity.
- Lay out levels per skill. The eight skills never line up evenly. Listing levels per skill lets the person self-monitor which circuit to grow.
- Ministry of Health, Labour and Welfare, Compliance and Narcotics Division (commissioned project). Report on the Monitoring of Promotional Information Activities for Prescription Drugs (March 2024 and prior years). Flagged cases are published with company names anonymized; the deviation patterns cited here are generalized from this report.
- Ministry of Health, Labour and Welfare. Guidelines on Promotional Information Activities for Prescription Drugs. Referenced for general principles on handling primary and secondary endpoints and stating evidence limits.
- Japan Pharmaceutical Manufacturers Association. Code of Practice / Promotion Code. Referenced as principles of fair, accurate, objective information provision.
- General competency-assessment literature. Behavior-based stage evaluation (BEI/STAR method; the logic of Behaviorally Anchored Rating Scales, BARS) referenced as methodology for designing level anchors.