Before deciding a level, the evidence first has to be put into a measurable form. The previous two issues fixed a way of listening that admits only what the person actually did — not their motivation or personality — as evidence. The next question is this: what do we measure that behavior with? A health check makes it clear. We decide the items first — height, weight, blood pressure — and that is what lets us compare. This design works the same way, placing three rulers for measuring behavior: how deep the thinking behind the judgment is (abstraction α), how wide a range of situations the action reached (scope σ), and how firmly the account is backed by fact (grounding g). Only once behavior is translated into these three do L1–L4 become a matter of fit with a ruler rather than impression.
Why steps, not scores
It is tempting to turn abstraction or scope into a score like "0.7." But that only fakes a precision with no substance. Think of an earthquake's intensity scale: it is given in whole steps — 1, 2, 3 — never "intensity 2.7." Showing the strength of shaking in fine decimals would only ring false. This design works the same way: α and σ are treated as steps (bands).
When measuring in steps, one property matters. The up-down order is certain, but the gaps between steps are not equal. Rising from 0 to 1 and rising from 2 to 3, though both "one step," carry different weight. So calculations like adding and averaging are saved for one final move (next issue's projection); here the only question is "which step is this account nearest to." It is like a recipe saying "a pinch / a moderate amount / a lot" of salt: we don't measure the grams to a decimal, but we firmly decide which step. Why do it this way? To stop the evaluator from escaping into a vague "roughly the middle." Every account must land on exactly one step.
This way of measuring has a precedent. A well-known method called BARS (Behaviorally Anchored Rating Scales) fixed each step with "examples of observable, concrete behavior." This design follows suit, defining each ruler by a "type of action you can confirm with your eyes."
α — depth of judgment (how deep the thinking behind it goes)
α (alpha, abstraction) measures "why did you judge that way." It shows in STAR's "+Thought (the motive for why you did it)." Did the person just follow the written words, trace back to the rule's purpose, or create a new viewpoint no one had named? The step runs 0 to 3. In cooking terms, it is the difference between someone who follows the recipe's amounts exactly (shallow) and someone who understands why those amounts and can apply them to other dishes (deep).
Applied the literal text of the rule as-is. As in "it doesn't say 'superior,' so no problem," the judgment stops at the written words alone.
Read not one word but several conditions together. Matching against a similar pattern seen before is at work.
Traced back to the intent or principle behind the rule and judged the case in front of them from it. Set aside the surface label and judged by substance.
Put into words a perspective that had no name before, as a new judgment principle. Shows most clearly in "+Thought."
σ — breadth of action (how wide a range it reached)
σ (sigma, scope) measures "what concretely did you do." It shows in STAR's "Action A." Did the person just process one case they already knew, or carry the same insight into a different field, or into a case with no precedent? The step runs 0 to 3. In sports terms, it is the difference between a player who only scrimmages against the same opponent and one who has built movements that transfer to another sport.
Handled one already-known type within its own range. No carrying over to another setting.
Handled many cases, but all repetitions of the same type. Experience accumulates, but no field is crossed.
Brought the same insight into a structurally different field. For example, a way of seeing learned on a chart-bearing material works on a chart-free patient booklet too — a carry-over.
Reached across departments or fields, extending judgment to cases with no precedent. The widest reach.
σ has one important safeguard: no matter how many same-type cases (σ1) you handle, you do not rise to σ2. The source's rule requires "two or more fields of a different type" for σ to reach step 2 or above. Why? To stop "just doing it many times" from being faked as "breadth." A high count strengthens the "grounding in fact (g)" coming up next, but that is treated as separate from "breadth of scope." Making the same dish 100 times does not widen a cook's range — same logic.
g — grounding in fact (did it actually happen)
g (gee, grounding, meaning "feet on the ground") measures "did it actually happen." It is guaranteed by STAR's "Result R." Where α and σ measure the "quality of the testimony," g measures "whether that testimony is backed by fact." The step runs only 0 to 2 (one shorter than α or σ). In health-check terms, it is whether there is an actual blood-test number (g1 or above) rather than the person's self-report (g0).
A bare claim with no concrete event, such as "I can do it" or "I am L4." Raises no step at all.
Who, when, and what are identifiable, and Situation→Task→Action→Result come together as one past event.
Not a one-off fluke: the same pattern appears across multiple events, or the backing held up even when doubted. Corresponds to the "reproducibility" the BEI interview method values.
A claim with no backing (g0) raises no step. This is the heart of the verdict rule. Not skill at telling a story but action backed by fact alone moves the level. One formula appears here, but the idea is simple: "once the backing for events meeting a given step adds up to 2, that step counts as backed by fact." Concretely, two g1 events, or one g2 event, will do. Placing a high step on a single story is risky, so it is like only accepting a call once two or more referees rule the same way. (In formula form it is "threshold τ_g, default 2," but the point is: do not grant a step until the backing reaches 2.)
The three rulers at a glance
Laying the three rulers in one table shows what is shared and what differs. α and σ run 0–3, g runs 0–2. α is drawn from STAR's "+Thought," σ from "Action A," g from "Result R." Because each is drawn from a different place, they are independent of one another.
| Ruler | 0 | 1 | 2 | 3 | Part of STAR drawn from |
|---|---|---|---|---|---|
| α depth of judgment | by the written words/steps | combined several conditions | from the rule's purpose | created a new viewpoint | +Thought (motive) |
| σ breadth of action | just one known case | multiple but same type | applied to a different field | across depts, unprecedented cases | Action A |
| g grounding in fact | just saying it | event where STAR comes together | backing checked / repeated | (no step) | Result R |
As the right column shows, the three are independent rulers drawn from separate places. So "deep reasoning but only one setting (α3 with σ0)" and "broad insight but not made into a principle (α0 with σ2)" are both possible. Like the focus of a photo — where it lands differs from person to person — the two rulers can come out mismatched. The direction of this mismatch is handled next issue (the wing b = A_hat − S_hat). Here the task is simply to land each account correctly on the three rulers.
How encoding actually works — landing testimony on the three rulers
"Encoding" means the work of translating what you heard into steps of α, σ, and g. In practice you judge all three at once while listening. Less a fixed procedure than cues for "where to look," given in order.
Dig into "why did you judge that way?" A quoted clause leans α0; spoken purpose or principle suggests α2 or above. When abstract words like "handled it appropriately" appear, always return to action with "what specifically?"
Check with a carry-over question whether the same insight worked in another field: "Did the same hold on other types of material?" Mere repetition of the same type caps at σ1.
Confirm the action actually happened — who, when, what, all identifiable. Multiple events or checked backing give g2. If the subject stays "we" and their own action stays invisible, it sits near g0.
The bundle of testimony, once landed on the three rulers, becomes the input to next issue's "ceiling of backing (grounding ceilings A_hat and S_hat)." Conversely, if the rulers are placed loosely here, the output stays muddy no matter how precise the later formulas are. The accuracy of measurement is set by how strictly you do this first translation.
Measurement Design ── Map of all 10 episodes
- Vol. 2: Listening Through STAR ── Situation, Task, Action, Result, Thought ── Pick just one thing that actually happened in the past and ask about it in five parts: the setting (Situation), what was assigned (Task), what the person did (Action), what came of it (Result), and why they decided as they did (Thinking). Spend more than half the time on the Action, write down what they did as verbs, check through the Result that it really happened, and draw out the root of the judgment through the Thinking.
- Vol. 3: Encoding to Two Axes ── Action Reveals Scope, Thought Reveals Abstraction ── Turning one "what they actually did" story heard in an interview into three readings — how widely they moved (scope sigma), what reasoning they used (abstraction alpha), and whether it really happened (grounding g) — worked through a concrete material-review example.
- Vol. 4: The Six BEI Principles ── Axioms That Keep the Measurement Clean ── What a person actually did, told through a four-point way of asking, gets converted into three rulers: depth of thinking, reach of action, and whether a real episode backs it up. The person's reading is then the highest level that the episodes actually support. This installment explains, with everyday examples, the six interview manners that keep that conversion from getting muddied.
- Vol. 5 (this episode): Three Bands ── The Scales of Abstraction α, Scope σ, and Grounding g ── Before any level verdict, this issue sets the three rulers for measuring the behavior we heard: how high the reasoning goes, how far the action reached, and how firmly it is backed by fact. Measured in steps, not scores.
- Vol. 6: How L Is Decided ── The Grounding Ceiling and Projection to the Diagonal ── Talk without backing does not raise the level. Take only the reach that real behavior confirms, even out the two measures, and read L.
- Vol. 7: The Behaviors That Separate Levels ── Eight-Dimension Anchors and Boundaries ── Using a sample book of "what they actually did" (the anchor table), we match a person's account to the closest sample to decide the level (L1 to L4). All eight abilities are measured by the same method.
- Vol. 8: Confidence and Observability ── How Far to Trust a Reading ── An episode about putting a number on how sure a rating is. Confidence C comes from how much evidence there is, whether the story holds together, and whether the rater could see it; observability o comes from being well placed and actually producing evidence; their product, weight w, feeds the final tally.
- Vol. 9: Multi-Party AI Dialogue ── Corroboration for Others' Level, Divergence for Calibration ── One pair of eyes cannot measure a person. The subject and several colleagues take the same structured interview (BEI); each vote is weighted by how well that person actually saw the scene, and only readings that other votes back up are bound into an outside view of the level. The gap from the subject's self-rating is kept in a separate column as "how accurately they see themselves," not as ability.
- Vol. 10 (final): From Integrated Output to the Qualifying Line ── The Record and the Operating Procedure ── The closing piece of Series 3 on measurement design. In plain terms it explains how the per-person, per-item score sheet hands each number to the right checkpoint in the pass/fail decision, and walks through the seven steps for actually running the measurement.
The three rulers each measure something different: α the depth of the judgment, σ the breadth of the action, g whether both actually happened. α and σ are measured in steps of 0–3, g in steps of 0–2, and the summing into a smooth score waits until the final projection. At the encoding stage there is just one question — which step is this testimony nearest to.
The next issue gathers this translated evidence into the "ceiling of backing (grounding ceilings A_hat and S_hat)," lays the two rulers over the main diagonal, and reads the level. The "direction of the mismatch," which arises precisely because the three are independent, gains meaning only there. If the "which step" fixed here is loose, the later formulas turn muddy. Measurement begins here.
- Three independent rulers: α from STAR's +Thought (depth of judgment), σ from Action A (breadth), g from Result R (grounding in fact). α and σ are steps of 0–3, g of 0–2.
- Measured in steps, not scores: the gaps between steps are not equal. Encoding asks only which step is nearest; adding and averaging are saved for the final projection.
- Backing governs the step: g0 (just saying it) raises nothing. Same-type cases cap at σ1; σ2 and above need two or more fields of a different type, so a high count cannot be faked as breadth.
- McClelland, D. C. Testing for Competence Rather Than for Intelligence. American Psychologist, 1973. (Origin of measuring capability through behavioral evidence.)
- Boyatzis, R. E. The Competent Manager: A Model for Effective Performance. Wiley, 1982. (Encoding behavioral events via BEI.)
- Smith, P. C., & Kendall, L. M. Retranslation of expectations: An approach to the construction of unambiguous anchors for rating scales (BARS). Journal of Applied Psychology, 1963. (Prototype of ordered bands fixed by observable behavior.)
- Spencer, L. M., & Spencer, S. M. Competence at Work: Models for Superior Performance. Wiley, 1993. (Graded definitions of competency levels.)