AI Programming 03 — Prompt Engineering: Designing Instructions That Reproduce | AI Programming | Pharmaceutical Advertising Regulation: Material Creation, Review & Use in Japan

Prompt Engineering — Designing Instructions That Reproduce— Structuring, few-shot, chain-of-thought, and using evaluation to keep them from breaking

Ask the same AI to do the same thing, and a small change in how you ask can swing the quality of the answer dramatically. Prompt engineering (= the design of the instructions you give an AI) is the discipline of not leaving that "how you ask" to chance, but designing it so that anyone, on any attempt, gets the same quality. This installment explains three pillars — structuring instructions, few-shot (= showing a handful of worked examples), and chain-of-thought (= having the model write out its reasoning) — grounded in the research behind them. It then goes further, into the mechanism that pharma marketing, development, and medical affairs cannot do without: evaluating outputs and detecting when they drift. A prompt is not something you write once and forget; it is something you keep protecting with tests.

01What a Prompt Is — Defining "the Instruction" Precisely

A prompt (= the instruction text you hand to an AI) is the input text you give to a large language model (= an AI trained on vast amounts of text). Everything from a single line — "summarize this," "give me three headline options under the following conditions" — to a paragraph of several hundred words specifying role, premises, constraints, and output format, is a prompt.

Here is the first fact to grasp. The AI is not computing "what is correct"; it is computing "how plausible the next word is." That is why a slight difference in instruction moves the output. With the same intent, a vague phrasing returns a vague answer and a concrete phrasing returns a concrete one. Think of prompt engineering as turning this property to your advantage — steering the output toward what you want.

Translated to the pharma workplace, this touches every piece of daily writing work: drafting promotional materials, summarizing case reports, interpreting regulatory documents, drafting internal FAQs. When the way of asking is personal and undocumented, quality wobbles every time staff change. Put the way of asking into words as a design, and anyone on the team can run it and get a consistent level of quality.

02Structuring the Instruction — Four Parts That Cut Ambiguity

A good prompt is not an off-the-cuff sentence; it is a combination of parts. The first thing that works in practice is to write these four explicitly and separately.

Part 01

Role

decide "as whom" it answers

Give it a stance — "you are a promotional-material reviewer at a pharmaceutical company" — and its vocabulary and viewpoint lean toward that role. This is where you set the degree of expertise and caution.

Part 02

Context

hand over "on what premises"

Bundle here the material the judgment needs: the target audience, the approved indications, the range of expressions permitted. Any premise you forget to hand over, the AI will fill in on its own.

Part 03

Instruction

narrow "what" to a single task

Get greedy — "summarize it, and also write headlines" — and precision drops. One task at a time. If there are several, the default is to split them into separate calls.

Part 04

Format

constrain "in what shape" it returns

Three bullet points, a table, under 200 characters — decide the shape up front and the downstream work gets easier. If you want to process it mechanically, specify a structured format such as JSON.

Simply writing these four separately raises the stability of the output visibly. In pharma especially, forgetting to hand over the context is the single largest source of accidents. To keep it from writing an unapproved indication, you must hand over the frame — "these are the only indications you may use" — as context, every single time you ask. Withhold the frame, and the AI freely "inflates."

03Using few-shot — Show a Few Examples and Transfer the Pattern

Few-shot prompting (= showing a handful of worked examples before asking the real question) is one of the most cost-effective techniques. Instead of instructing in fine-grained words, you line up two to five "input → desired output" pairs and place the real input last. The AI reads the pattern and returns an answer in the same register.

That this works was shown at scale in the study reporting GPT-3 by Brown et al. (2020). Comparing zero-shot (= no examples at all), one-shot (just one), and few-shot (several examples), accuracy rose on many tasks as the number of examples increased. You can change behavior without retraining, just by putting examples in the prompt — that is the core of few-shot.

In practice, it earns its keep in situations like these.

When you want to standardize the format of the output — the tone of headings, sentence endings, a feel for length: showing an example conveys these more accurately than explaining them in words.
When you want to show the criterion by pattern — showing a few "this expression is OK / this one is NG" pairs conveys the boundary more sharply than words do.
When you want to fix the handling of technical terms — showing in advance the translations or phrasings decided internally reduces variation.

Note, however, that the choice of examples matters. If the examples are biased, it imitates the bias along with everything else. Mix in a slightly exaggerated example and it returns exaggerated output; mix in an outdated expression and it returns outdated output. Because few-shot is a "lean toward what you showed" technique, a person must first confirm that the examples themselves comply with the approved content and the norms.

04Chain-of-Thought — Have It Write the Process to Raise Accuracy

Chain-of-thought prompting (CoT, = having the model write not just the answer but the reasoning along the way) is a technique proposed by Wei et al. (2022). On arithmetic and logic problems, they showed that asking "think step by step, then answer" rather than "give the answer" raises the rate of correct answers. It resembles how a human writing out the intermediate steps makes fewer calculation errors.

Relatedly, Kojima et al. (2022) reported that simply appending the phrase "let's think step by step" can, in some cases, improve reasoning quality without any examples. Understand CoT as a method that, rather than forcing a complex judgment out in one breath, prevents leaps by having the grounds put into words along the way.

In pharma practice, CoT helps in ways like these.

Leave a record of the grounds for a regulatory check — asking "explain, in order and against the statute, whether this expression amounts to exaggeration, then give your conclusion" records the reasoning process and makes it easier for a person to verify.
Reduce omissions in a summary — when summarizing a long document, splitting it into stages ("first list the points, then judge their importance, then summarize") reduces the chance of dropping key points.

"Show" the process, but don't swallow it whole: The "reasoning process" written under CoT is at best a plausible explanation, not a guarantee that the AI actually judged by that path. The process can look right while the conclusion is wrong. Use the process as a clue for a person to verify, and always keep the final judgment in human hands. Especially in areas such as applicability under the Pharmaceutical and Medical Devices Act, where an error carries heavy consequences, you must not publish something on the strength of the AI's explanation alone.

05Evaluating the Output — Turn "Seems Good" Into a Measurable Standard

Refine the prompt and the output improves. But "it feels better" is not usable for work. The second half of prompt engineering is building a mechanism to evaluate output objectively. Measure by a yardstick decided in advance, not by feel.

You can organize the axes of evaluation by splitting them into at least these three.

Axis of evaluation	What it checks
Factuality (accuracy)	Whether figures, indications, and citations match the approved content and primary sources. Whether hallucination (= a plausible-sounding falsehood) has crept in
Norm compliance (staying within the rules)	Whether, against the Pharmaceutical and Medical Devices Act, the Sales Information Provision Guideline, and the fair advertising standards, any prohibited expression has entered
Form and consistency	Whether it keeps to the specified format, length, and tone. Whether the same instruction yields it without wobble every time

What matters is separating the person who builds the evaluation from the person who writes the prompt. When the author judges their own work "good," the standard tends to go easy on itself. Have a different person, or a different vantage point, do the evaluation — this separation prevents the padding of quality. In addition to human evaluation, it also helps to prepare "expected answers" for a small set of representative cases and check against them every time you change the prompt.

06Regression Testing — Keep the Prompt in a State Where "You Notice When It Breaks"

Even once a prompt is written well, it is not safe forever. When you tweak the wording a little, or when the version of the AI you use goes up, output that used to come out correctly can quietly break. Checking all of it by hand every time is not realistic. So you set up regression testing (= a mechanism that repeatedly confirms whether behavior that previously passed still passes).

The skeleton of the method is simple.

Assemble a few dozen pairs of representative inputs and expected outputs — prioritize cases where a problem arose in the past and cases where the boundary is delicate.
Every time you change the prompt or the AI, run those pairs in a batch — automatically check whether they match expectations and whether they have deviated.
If it drifts, stop — if even one output touches the norms, roll that change back before publishing.

In pharma especially, deliberately assembling test cases that confirm "does it avoid producing an exaggerated expression" and "does it avoid writing an unapproved indication" is what pays off. Throw dangerous inputs at it on purpose and confirm that the AI holds the line. This is the same idea as "test the fragile spots most heavily" in the world of code. It connects straight to the topic of test generation we take up next time (Vol. 4).

07Common Pitfalls — Failures That Recur in the Field

The places people stumble in prompt design are mostly the same. Knowing them in advance lets you avoid them.

Getting greedy all at once — cram in "summarize it, also write headlines, and while you're at it proofread it" and each comes out half-baked. Split the tasks into separate calls.
Forgetting to hand over the premises — ask without writing the approved indications or the target audience, and the AI fills the blanks on its own. State the context explicitly every time.
Transferring the bias of the examples as is — if the few-shot examples are exaggerated or outdated, so is the output. Have a person inspect the examples first.
Mistaking the process explanation for proof — the CoT "reasoning process" is an explanation, not a proof. Verify the conclusion independently.
Never looking again once it works — output breaks with AI updates and small edits. Keep watch continuously with regression tests.
Writing confidential or personal information directly into the prompt — follow internal rules and contractual terms for handling case data and information restricted to internal use. Don't paste it casually.

Every one of these is less a matter of technique than of design discipline. Put the way of asking into words, state the premises, separate the evaluation, protect it with tests. More than flashy tricks, this plain discipline is what holds up quality.

08Connections to Other Chapters on This Site

Prompt design connects to the other chapters of this site as follows. Read them together and the dots become a line.

AI Programming Vol. 4 — Writing Tests With AI — extends this installment's "regression testing" into the context of code test generation and TDD. A continuation of the idea of protecting the prompt.
AI Marketing Vol. 5 — Generated Content and Review — how the drafts made with few-shot and CoT are received at the "exit" of material review. The junction of design and review.
Ad Regulations 01 — Pharmaceutical Act §§66–68 — the foundation of the "norm compliance" axis of evaluation. The original text on the line between exaggeration, unapproved promotion, and information provision.
Material Review series — the practical work of the review that finally receives the AI-made drafts. It covers the craft of the human final judgment.

In Closing

Prompt engineering is not a hunt for a magic incantation. Split the instruction into four parts, show examples when needed, and have complex judgments write out the process — each is a plain effort to "cut ambiguity and prevent leaps." And then evaluate the crafted output by a decided yardstick rather than by feel, and keep protecting it with regression tests. Only when you go this far does a prompt turn from "a lucky one-off" into "a mechanism anyone can run and reproduce."

In the pharma workplace, this reproducibility translates directly into safety. The Pharmaceutical and Medical Devices Act does not slacken just because the AI can write fast — exaggeration is §66, unapproved promotion is §68, information provision is §68-2. What gets judged is not "who wrote it" but "what is written." Put the prompt into words as a design, and fence it in with evaluation and tests. That is the surest footing for handling generative AI inside the boundaries of regulation. Next time, we take this idea of "protecting with tests" forward into writing tests together with AI, and TDD.

Key Points — Three to Take Away

A prompt is stable when written as four parts — role, context, instruction, output format. Forgetting to hand over the context (the approved indications and the target audience) is the largest source of accidents in pharma; premises you don't supply, the AI fills in on its own.
Few-shot (showing a handful of examples) and chain-of-thought (having it write the process) are proven techniques for improving output without retraining. But the bias of the examples transfers as is, and the written "reasoning process" is an explanation, not a proof. A person verifies the conclusion independently.
A prompt is not written once and done. Evaluate it on three axes — factuality, norm compliance, and consistency — separate the person who writes from the person who evaluates, and use regression tests with representative cases and expected outputs to keep it in a state where "you notice when it breaks." The Pharmaceutical and Medical Devices Act does not slacken just because the AI got fast.

Sources & References

Wei, J., Wang, X., Schuurmans, D., et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Advances in Neural Information Processing Systems (NeurIPS) 35, 2022 (arXiv:2201.11903). (The original chain-of-thought paper)
Brown, T. B., Mann, B., Ryder, N., et al. Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems (NeurIPS) 33, 2020 (arXiv:2005.14165). (GPT-3; large-scale validation of few-shot)
Kojima, T., Gu, S. S., Reid, M., et al. Large Language Models are Zero-Shot Reasoners. Advances in Neural Information Processing Systems (NeurIPS) 35, 2022 (arXiv:2205.11916). (A report that reasoning can be elicited even without examples)
Ministry of Health, Labour and Welfare. Act on Securing Quality, Efficacy and Safety of Products Including Pharmaceuticals and Medical Devices (Pharmaceutical and Medical Devices Act), Articles 66, 68, and 68-2. (Exaggerated advertising = Article 66; prohibition of advertising unapproved drugs = Article 68; obligation of information provision = Article 68-2)
Ministry of Health, Labour and Welfare, Pharmaceutical Safety and Environmental Health Bureau. Guideline on Sales Information Provision Activities for Prescription Drugs. Notification of September 25, 2018 (Yakusei-hatsu 0925 No. 1). (The norm for sales information provision activities)
Director, Compliance and Narcotics Division, Pharmaceutical Safety and Environmental Health Bureau, Ministry of Health, Labour and Welfare. Explanation of the Standards for Fair Advertising of Pharmaceuticals and Points to Note. Notification of September 29, 2017. (Operational rules for Article 66 of the Pharmaceutical and Medical Devices Act)

← Back to AI Programming