01Vulnerabilities in Generated Code — Start From Empirical Data

Let's begin with evidence, not impressions. In 2022 a research team at New York University conducted a large-scale study of the security of code proposed by GitHub Copilot (Pearce et al., IEEE Symposium on Security and Privacy 2022). Building scenarios around representative items from MITRE CWE (= Common Weakness Enumeration, a common catalog that classifies software weaknesses) and having Copilot complete them, they reported that of the 1,689 programs generated, about 40% contained a known vulnerability.

What matters here is not the percentage itself but "why it happens." Generative AI does not guarantee correctness; it reproduces the "plausibility" of the ways things were written in its training data. That data mixes safe code with risky code. So the AI sometimes proposes, as is, the way of writing that is on average most commonly seen — which includes old, dangerous patterns.

Keep the following points in mind.

02Secret Leakage — Keys and Tokens Left in the Code

The next most common problem is leakage of secrets (= confidential values such as API keys and passwords that must not be made public). When you have AI write code, it will sometimes propose, as a working example, keys or tokens embedded directly in the source. Adopt that as is and push it to a repository, and the key stays in the history.

The leakage paths fall mainly into three types.

The basic principle is one thing: never write secrets into the code. Separate them into environment variables or a dedicated secret-management mechanism, and write only "where the key lives" into the code. And on the assumption that leakage will happen, prepare procedures for detection and revocation (= invalidating a key and reissuing it) in advance.

03Prompt Injection — Input That Turns Into a Command

Prompt injection (= an attack that hijacks the AI's behavior with instructions slipped into the input) is a risk specific to applications that embed generative AI. It is similar in conception to a classical vulnerability, SQL injection (= an attack that abuses input to illicitly manipulate a database). Input that should be treated as data ends up interpreted as a command — that is the shared weakness.

This attack is listed as the top risk in the OWASP Top 10 for LLM Applications (= a catalog of the ten most representative risks for applications that use large language models). There are two forms.

There is no silver bullet that prevents it completely. That is exactly why you build the surroundings on the premise that "AI output cannot be trusted." Don't give the AI strong privileges, restrict the paths through which it reads external data, and inspect output — by a human or a mechanism — before executing it. Catch it in layers.

A concrete example from the pharma setting: consider a workflow that uses AI to draft responses to inquiries from healthcare professionals. If a document pulled in from outside contains a planted instruction to "emphasize an unapproved indication," the AI could follow it and generate wording that runs afoul of the Pharmaceutical and Medical Device Act (PMD Act). Exaggerated advertising is prohibited under Article 66 of the PMD Act, and advertising of unapproved drugs under Article 68. Being AI-produced is no exemption. The output must always be checked by a person against primary sources and put through formal material review — keeping this checkpoint in place is the crux.

04The OWASP Lens — Old Threats Have Not Disappeared

It is easy to have your eyes drawn to new risks, but the classical weaknesses listed in the OWASP Top 10 (= a catalog of the ten most representative security risks for web applications, 2021 edition) appear as is in code that AI writes too. If anything, because AI reproduces past dangerous patterns, they demand more care.

OWASP Top 10 itemWhat tends to happen in AI-generated code
Broken access controlProposes a "just works" implementation that skips privilege checks
InjectionReproduces the old pattern of embedding input into a query via string concatenation
Insecure designFails to grasp the design intent of authentication and validation, returning locally optimal fragments
Vulnerable and outdated componentsRecommends libraries as of its training cutoff, or versions with known weaknesses
Identification and authentication failuresProduces examples with plaintext password storage or weak hashing

The key point is that introducing AI does not change the yardstick of inspection. An established catalog like the OWASP Top 10 functions as a checklist precisely in the AI era. Keep the habit of holding generated output up against each of these items, one by one.

05Dependency Risk — The Supply Chain as a Weak Point

Modern software is made up more of combining external components (= libraries, dependencies) than of the parts you write yourself. This is where the risks of the software supply chain (= the network that supplies components; the whole of the parts you use and their distribution routes) concentrate. AI also works in the direction of widening this hole.

There are two pillars of defense. One is to always know "what you are using" via an SBOM (= Software Bill of Materials, a list of the components in use). The other is to run a mechanism that automatically scans dependencies and alerts you when a known weakness appears. The more a component was recommended by AI, the more a person should confirm that the name actually exists and that the distribution source is trustworthy.

Layer 01

Inspection at the entrance

"confirm before taking it in"

Don't use AI-recommended components or code as is. Confirm existence, distribution source, version, and known weaknesses before pulling them in. The plausibility of a name is not evidence.

Layer 02

Separation of secrets

"pry the keys away from the code"

Move credentials to environment variables or secret management. Run secret detection before committing so nothing stays in the history. Decide the revocation procedure for a leak in advance.

Layer 03

Least privilege

"don't hand the AI strong power"

The more a part embeds generative AI, the more you narrow its privileges. Even if it is hijacked by prompt injection, confine it to a scope where the damage cannot spread.

Layer 04

Review at the exit

"a person looks before it goes out"

Inspect the output with static analysis and human review before releasing it to the world. In pharma, material review; in code, a checkpoint before promotion to production.

06Regulation in Healthcare — When Code Itself Becomes Regulated

In the pharma and medical context, the code itself becomes the object of regulation. Unlike general software development, "as long as it works" does not suffice. Systems involved in clinical trials, manufacturing, or quality require Computerized System Validation (CSV) — the practice of proving, with records kept, that the system works correctly as intended — grounded in GxP (= the collective term for the norms to be observed in drug development, manufacturing, quality control, and the like). As an international practical guideline, ISPE's GAMP 5 is widely referenced.

When electronic records and electronic signatures are handled, in the United States 21 CFR Part 11 (= the FDA's rule on electronic records and electronic signatures) requires audit trails and access management. Domestically, if patient information is handled, the Ministry of Health, Labour and Welfare's "Guidelines for the Safety Management of Medical Information Systems" become a prerequisite to observe. Here AI-generated code raises pointed questions.

The conclusion is simple. AI can be used as a draft or an aid, but the final proof of validity and the responsibility rest with people and the organization. Regulation does not loosen just because AI got faster.

07Designing the Countermeasures — Catch It With Structure, Not Individual Tricks

Assemble the risks up to this point as a design, not as individual fixes. At the foundation is the stance of "build the surroundings on the premise that AI output cannot be trusted." An established framework like the U.S. NIST SSDF (= Secure Software Development Framework; NIST SP 800-218) applies directly to development that uses AI.

What matters is building these in from the start, within the flow of development, rather than "adding them later." Use the power to build fast in service of protecting safety and trust — put the mechanism in place first, toward that end.

08Connections to the Other Chapters in This Series

The security of this installment connects to the other installments of the series as follows. Read together, they make your way of engaging with generated code three-dimensional.

Closing

Generative AI gave us the power to write code fast. But speed carries the old dangers — vulnerabilities, secret leakage, prompt injection, components of uncertain provenance — at a new scale. What the research shows is that the assumption that you may trust AI's proposals as they are does not hold. AI reproduces plausibility, not correctness. So rather than leaning on persuasive appearance, hold it up one item at a time against established checklists — OWASP, SSDF, CWE.

In the pharma setting, the code itself becomes regulated. The requirements of GxP, CSV, and electronic records do not budge just because AI got faster. The proof of validity and the final responsibility rest with people and the organization. Design generative AI not as a tool for skipping inspection but as a foundation that gains speed only after safety and trust have been protected all the way through — that is the core of facing the risks of generated code. Next time, we move on to how to choose a model suited to the purpose and how to keep costs down.

Key Points — Three to Take Away
  1. Code from generative AI "works" but is not necessarily "safe." In empirical research, about 40% of Copilot's output contained a known vulnerability. Because AI reproduces plausibility rather than correctness, it will readily propose old, dangerous patterns. Don't make persuasive appearance your basis for safety.
  2. Old and new risks overlap. On top of AI-specific threats such as prompt injection (the top of OWASP for LLM) and package hallucination, classical weaknesses — hard-coding secrets into code, recommending old components, injection — appear as they always did. Rather than crushing them individually, catch them in layers at the entrance, the inside, and the exit.
  3. In pharma and medicine the code itself is regulated. CSV grounded in GxP, 21 CFR Part 11, and the medical-information guidelines require accountability, traceability, and records. AI can be used for drafts, but the proof of validity and the final responsibility rest with people and the organization. Regulation does not loosen just because AI got faster.
Sources and References
  1. Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., & Karri, R. Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions. Proceedings of the 2022 IEEE Symposium on Security and Privacy (S&P), 2022. The primary study that empirically measured vulnerabilities in generated code at scale (about 40% of output confirmed vulnerable).
  2. OWASP Foundation. OWASP Top 10:2021. 2021. A catalog of the ten most representative security risks for web applications. Also functions as a checklist for AI-generated code.
  3. OWASP Foundation. OWASP Top 10 for LLM Applications. 2025. A catalog of risks specific to applications that embed large language models. It places prompt injection at the top.
  4. MITRE. Common Weakness Enumeration (CWE). A common catalog that classifies software weaknesses. The foundation for identifying and sharing types of vulnerabilities.
  5. National Institute of Standards and Technology (NIST). Secure Software Development Framework (SSDF), Version 1.1. NIST Special Publication 800-218, 2022. A practical framework for secure software development.
  6. ISPE. GAMP 5: A Risk-Based Approach to Compliant GxP Computerized Systems (Second Edition). International Society for Pharmaceutical Engineering, 2022. The international practical guideline for Computerized System Validation (CSV).
  7. U.S. Food and Drug Administration. 21 CFR Part 11 — Electronic Records; Electronic Signatures. The rule that sets the requirements (audit trails, access management) for electronic records and electronic signatures.
  8. Ministry of Health, Labour and Welfare. Guidelines for the Safety Management of Medical Information Systems (Version 6.0). 2023. The premise for safe management of systems that handle medical information.
  9. Ministry of Health, Labour and Welfare. Act on Securing Quality, Efficacy and Safety of Products Including Pharmaceuticals and Medical Devices (PMD Act), Articles 66 and 68. The statutory basis for the prohibition of exaggerated advertising and of advertising unapproved drugs.