Listening Through STAR ── Situation, Task, Action, Result, Thought

Ask "are you skilled?" and what comes back is a self-image, not what the person actually did. STAR turns the question around. It stops asking whether someone is good and asks what they did inside one event that really happened. Of the five things to listen for, what truly moves the rating is the Action and the Thinking; the Setting and the Role are only the ground that keeps you from misreading them. This part describes, as the listener's craft, how to dig one event until the marks on the two measuring sticks come into view.

Why narrow it to one event

What you measure is not the person, and not the resume. You measure one event that actually happened. Think of a health check-up. Ask someone "are you healthy?" and the answer is just their impression. So instead you measure blood pressure and draw blood — you look at the numbers from one concrete test. An interview is the same. Ask "how are you overall?" and the person tells you a flattering self-image, and the listener gets pulled along by how well it is told. Instead, pin down one event — who, when, what was done — and dig that one event from five angles.

Why insist on one past event? The source holds that "what a person did in the past predicts future behavior better than anything else." The reason is simple: held ability and motivation can't be seen from outside, but behavior actually carried out is a fact you can check with your eyes. Narrowing to one event fixes what you later put marks on. The marks — how high the thinking reaches, how wide the hand reaches, how strong the backing — go on each individual piece of evidence pulled from this one event, not on the whole person. So the listener's first job is to land a foggy overall impression onto one event with a clear outline.

How the five listening points map to the two sticks

A cooking recipe makes it clear. Even for the same "I made curry," you learn different things from the state of the kitchen (Setting), who it was for and how many servings (Role), how the hands moved (Action), whether it came out tasty (Result), and why those steps were chosen (Thinking). STAR's five are the same — each tells you something different. The Setting and the Role are background and don't move the rating itself. "How high the thinking reaches" shows in the Thinking, "how wide the hand reaches" shows in the Action, and the Result guarantees the backing. STAR, the source says, becomes a tool for taking both measuring sticks directly. The table below is the map.

Layer (STAR)	Mapping to the sticks / backing	What the listener extracts
Situation S, Task T	Background. A premise that prevents misreading; does not move the rating itself	The frame for interpreting the action: when, where, what was at stake
Action A (most important)	How wide the hand reaches — how far the hand got shows here	What was concretely done; whether it reached an out-of-field setting or an unprecedented case
Result R	Backing — proof that it actually happened	Outcome and impact; a result you can check with your eyes, not a claim
Thinking (motive)	How high the thinking reaches — the root of the judgment shows here	Why it was judged so: did they look only at the wording, or reason from principle

So the Action shows how wide the hand reaches, the Thinking shows how high the thinking reaches, and the Result guarantees the backing. The Setting and the Role, by themselves, neither raise nor lower the rating. You still confirm them first, because if you misread the background, the very meaning of the action shifts. Skip the prep work and the final dish never comes together.

Allocating the dig — 50–60% on the Action

Do not split the time evenly across the four points. The source's method directs spending 50–60% of the time, when digging one event, on the Action. Take the Setting and Role briefly, just the frame; the Result and Thinking need only enough to confirm backing and how high the thinking reaches. Why is the Action the lead? Because "how wide the hand reaches" shows up nowhere but inside the Action.

"How wide the hand reaches" is not how many same-type jobs the person handled, but whether the hand reached into an out-of-field setting with a completely different structure. That only becomes visible by lowering "what was done" one concrete step at a time. So when the person offers a vague phrase like "I handled it well," always pull them back to concrete action: "Concretely what? Then what next?" This pulling-back bites hardest while digging the Action. What you want is not a conclusion but the movement of the hands at that moment.

The order of digging, numbered:

Fix the Situation in one sentence — when, which material, what was at stake. Do not linger. Take only the frame of the background.
Fix the Task in one sentence — what question the person had to solve there. This much is the footing.
Dig the Action in stages — "first what," "then what," strung together as verbs. Spend 50–60% of the time here. Draw out whether it reached an out-of-field setting by asking "did you do the same in a different but similar setting?"
Confirm backing through the Result — what actually happened. Was the judgment overturned, did it take hold? If the same ability showed across several events, the backing is even stronger.
Surface the root through the Thinking — "why judged so." Tell apart looking only at the wording from reasoning out from principle.

Record in verbs — adjectives are not evidence

Write down what you heard in the person's own words, and in verbs rather than adjectives. "Was excellent" is a conclusion, not evidence. In the source's example, keep it as verbs — "proposed X to Y and carried out Z" — and tie the marks (how high, how wide, how backed) to this record. The rule of the BEI (Behavioral Event Interview — an interview method that asks about actual behavior and its results) bites here: record the concrete behavior that supports a conclusion, not the conclusion.

Adjectives are dangerous twice over. First, they invite the listener's preconception — "that's just the kind of person they are." Second, they fake the look of backing. "Always sharp" sounds endlessly repeatable, but with not a single concrete event the backing is zero, and under the source's rule it does not raise the rating. Written in verbs, the presence or absence of backing is visible at a glance, on the spot. In photo terms, an adjective is out of focus; a verb is one sharp, in-focus shot.

Backing decides the rating — the meshing of Action and Result

However high a reach is narrated in the Action, the rating band does not rise if the Result cannot back it up. It is like several referees: only when they all raise their hands does the call stand — narration alone scores no points. The source's idea of the "grounding ceiling" (the cap on backing), put plainly: to place something in a given high band, the total "backing points" of evidence at or above that band must reach a set line (usually 2 points).

In the formula it is written "A-hat selects the highest band whose total backing points reach at least 2." It looks hard, but the point is one thing — do not count things that are mere claims; among the things actually backed as having happened, take the highest band. Skip the formula, and this one sentence is enough.

"How wide the hand reaches" has one more fence. No matter how many same-type jobs are stacked, the width caps at band 1 (to go higher you need "two or more different kinds of setting"). This shuts the loophole of scoring on experience alone. So when listening to the Result, confirm not the count but whether the kinds of setting differ. Did the person catch the same trick in a material with a chart and in a chart-free patient leaflet — that is the backing for band 2. Solve the same problem twice and it is one problem's worth; only by solving a different kind of problem does the real ability show.

L / coordinate	Utterance anchor (the testimony type heard via STAR)	Coding
L1 (0,0)	Judged "it does not say superior, so no problem." Looking only at the wording	α0 σ0 g1
L2 (1,1)	Noticed well-known emphasis patterns such as "golden-cross-grade superlative expression"	α1 σ1 g1
L3 (2,2)	Saw the implication in how the chart was built — "axis, arrow, and layout make a no-real-difference look superior" — and caught the same trick in a chart-free patient leaflet	α2 σ2 g1 (2 domains)
L4 (3,3)	Defined a new angle themselves — "even objective material can steer impressions depending on how it is shown" — and other reviewers now use that view	α3 σ3 g2 (third-party adoption)

α (alpha) stands for how high the thinking reaches, σ (sigma) for how wide the hand reaches, and g for how strong the backing is; the larger the number, the higher, wider, or stronger. This table is an example of making one viewpoint — the power to spot risk — something you can "decide without wavering." The evaluator matches the person's testimony to the closest of the sample utterances on the left (the anchors — reference samples for judging) and assigns the marks on the right. The boundary is set by "which sample utterance it is closest to," not left to the evaluator's taste. The remaining seven viewpoints hold sample utterances in the same form. When several referees share the same judging samples, their calls line up — same logic.

The listener's forbidden moves — do not dirty the measured value

While digging with STAR, the listener keeps the BEI rules. First, no hypothetical questions ("if you had..."). Ask only about the past that actually happened — this is the source of backing. Next, when the person says "we," re-ask as "you" to carve out that individual's contribution. And do not hint at the desired answer — the moment the questioner's expectation mixes in, the evidence from that one event is dirtied.

Leading is especially easy while digging the Action. When the person fumbles for words, the listener is tempted to throw a lifeline: "So you reasoned it from principle, right?" That is the listener raising "how high the thinking reaches" on the person's behalf — the same as fabricating high-level evidence that was never there. The correct move is to add nothing to the answer and simply return to verbs: "What exactly did you see at that moment, and what did you do next?"

Measurement Design ── Map of all 10 episodes

Vol. 2 (this episode): Listening Through STAR ── Situation, Task, Action, Result, Thought ── Pick just one thing that actually happened in the past and ask about it in five parts: the setting (Situation), what was assigned (Task), what the person did (Action), what came of it (Result), and why they decided as they did (Thinking). Spend more than half the time on the Action, write down what they did as verbs, check through the Result that it really happened, and draw out the root of the judgment through the Thinking.
Vol. 3: Encoding to Two Axes ── Action Reveals Scope, Thought Reveals Abstraction ── Turning one "what they actually did" story heard in an interview into three readings — how widely they moved (scope sigma), what reasoning they used (abstraction alpha), and whether it really happened (grounding g) — worked through a concrete material-review example.
Vol. 4: The Six BEI Principles ── Axioms That Keep the Measurement Clean ── What a person actually did, told through a four-point way of asking, gets converted into three rulers: depth of thinking, reach of action, and whether a real episode backs it up. The person's reading is then the highest level that the episodes actually support. This installment explains, with everyday examples, the six interview manners that keep that conversion from getting muddied.
Vol. 5: Three Bands ── The Scales of Abstraction α, Scope σ, and Grounding g ── Before any level verdict, this issue sets the three rulers for measuring the behavior we heard: how high the reasoning goes, how far the action reached, and how firmly it is backed by fact. Measured in steps, not scores.
Vol. 6: How L Is Decided ── The Grounding Ceiling and Projection to the Diagonal ── Talk without backing does not raise the level. Take only the reach that real behavior confirms, even out the two measures, and read L.
Vol. 7: The Behaviors That Separate Levels ── Eight-Dimension Anchors and Boundaries ── Using a sample book of "what they actually did" (the anchor table), we match a person's account to the closest sample to decide the level (L1 to L4). All eight abilities are measured by the same method.
Vol. 8: Confidence and Observability ── How Far to Trust a Reading ── An episode about putting a number on how sure a rating is. Confidence C comes from how much evidence there is, whether the story holds together, and whether the rater could see it; observability o comes from being well placed and actually producing evidence; their product, weight w, feeds the final tally.
Vol. 9: Multi-Party AI Dialogue ── Corroboration for Others' Level, Divergence for Calibration ── One pair of eyes cannot measure a person. The subject and several colleagues take the same structured interview (BEI); each vote is weighted by how well that person actually saw the scene, and only readings that other votes back up are bound into an outside view of the level. The gap from the subject's self-rating is kept in a separate column as "how accurately they see themselves," not as ability.
Vol. 10 (final): From Integrated Output to the Qualifying Line ── The Record and the Operating Procedure ── The closing piece of Series 3 on measurement design. In plain terms it explains how the per-person, per-item score sheet hands each number to the right checkpoint in the pass/fail decision, and walks through the seven steps for actually running the measurement.

In closing

STAR is not a technique that treats the five listening points equally. Take footing from the Setting and Role, spend the majority of the dig on the Action, confirm backing through the Result and how high the thinking reaches through the Thinking — this weighting raises the rating from behavior actually done, not from how well it is told. Returning adjectives to verbs and claims to events, every time, is the listener's discipline.

Next we bring down to concrete, as the rater's craft, how the heard behavior is translated into the three bands of "how high, how wide, how backed" — handling them as steps, and the rule of not counting claims that have no backing.

Key Points ── Three to take with you

50–60% on the Action. Take only the frame from the Setting and Role; spend the majority of time digging the Action, where how wide the hand reaches shows up.
Record in verbs, not adjectives. "Was excellent" has zero backing. Only verbs of what was actually done get the marks.
Confirm backing through the Result. A reach narrated but not backed does not raise the rating; stacking same-type cases caps the width at band 1.

Sources & references

McClelland, D. C. Testing for Competence Rather Than for "Intelligence." American Psychologist, 1973. (Origin of measuring by behavior rather than aptitude tests)
Boyatzis, R. E. The Competent Manager: A Model for Effective Performance. Wiley, 1982. (Systematization of the Behavioral Event Interview)
Flanagan, J. C. The Critical Incident Technique. Psychological Bulletin, 1954. (Prototype of digging one critical event through behavior)
Smith, P. C., & Kendall, L. M. Retranslation of Expectations: An Approach to the Construction of Behaviorally Anchored Rating Scales. Journal of Applied Psychology, 1963. (Theory of fixing scales by utterance anchors)
Spencer, L. M., & Spencer, S. M. Competence at Work: Models for Superior Performance. Wiley, 1993. (Practice of coding and level judgment from BEI evidence)

← Back to Measurement Design