Large language models are increasingly embedded in healthcare applications, supporting patient education, clinical decision support and preliminary diagnostic assistance. Their conversational design enables tailored guidance aligned with individual patient contexts. As integration into clinical systems expands, attention has shifted to security vulnerabilities that may compromise patient safety.

 

Must Read: Benchmarking Medical Reasoning in Large Language Models

 

Prompt-injection attacks, in which maliciously crafted inputs manipulate model behaviour, represent a documented weakness in large language model systems. Within medical contexts, such manipulation could generate unsafe or contraindicated treatment recommendations, disseminate misinformation and contribute to adverse outcomes, including medication errors. A controlled simulation conducted between January and October 2025 evaluated whether commercial models could be induced to provide clinically dangerous advice across structured patient–model dialogues. The findings highlight substantial susceptibility across lightweight and flagship systems, including scenarios involving pregnancy contraindications, opioid prescribing and drug interaction toxicity.

 

Controlled Simulation Across Clinical Scenarios

The primary evaluation used a controlled simulation design with 216 standardised patient–model dialogues. Three lightweight commercial models were assessed across 12 clinical scenarios spanning supplement recommendations, opioid prescriptions, pregnancy contraindications and central nervous system toxic effects. Scenarios were categorised as moderate, high or extremely high harm through consensus among five board-certified clinicians, based on clinical consequences, reversibility and regulatory contraindications.

 

Each scenario followed a six-turn dialogue structure, progressing from initial complaint to primary recommendation and follow-up advice. Two injection strategies were applied. A context-aware approach integrated patient-specific context to promote moderate or high-risk recommendations while maintaining clinical plausibility. An evidence-fabrication approach inserted falsified meta-analyses or fabricated guideline excerpts to legitimise extremely high-harm interventions, including drugs contraindicated in pregnancy or hazardous drug combinations. Injection instructions were programmatically inserted at the treatment request stage to simulate realistic manipulation after clinical context had been established.

 

Across 108 injected dialogues and 108 controls, injection attacks achieved a 94.4% success rate at the primary decision turn. Two models were completely susceptible, each generating unsafe recommendations in all 36 injected dialogues. The third demonstrated partial resistance, with unsafe outputs in 30 of 36 dialogues. Control dialogues showed minimal false positives. Success varied by category. Central nervous system toxic effects and supplement recommendation scenarios achieved universal success. Opioid scenarios reached 91.7%, while pregnancy-related contraindication scenarios reached 83.3%.

 

Persistence and Harm-Level Vulnerability

Manipulated recommendations frequently persisted beyond the initial decision turn. Overall persistence at follow-up reached 69.4%. One model demonstrated persistence in 86.1% of injected scenarios, another in 83.3%, while the partially resistant model showed persistence in 38.9%. Scenario-specific patterns emerged. Supplement-related dialogues demonstrated the highest persistence at 91.1%. Pregnancy-related scenarios persisted in 61.1% and opioid scenarios in 50.0%. Central nervous system toxic effect scenarios showed more limited persistence.

 

Stratification by harm level revealed sustained vulnerability across categories. Extremely high-harm scenarios, including prescriptions involving pregnancy contraindications, dangerous drug interactions and inappropriate controlled substances, succeeded in 91.7% of cases. High-harm scenarios reached 93.3%, while moderate-harm scenarios showed complete vulnerability. In one pregnancy scenario involving thalidomide, the partially resistant model refused all three attempts, whereas the other two models accepted all attempts. Nevertheless, overall vulnerability remained high across harm strata.

 

A proof-of-concept experiment extended testing to three flagship models using a client-side injection approach. This framework simulated a man-in-the-middle attack in which hidden instructions were prepended to user input before transmission. In the thalidomide in pregnancy scenario, two flagship models were completely susceptible across five runs each, while the third demonstrated vulnerability in 80% of cases. Control conditions produced no unsafe recommendations. In successful cases, two models referenced injected content from the first turn and maintained persistence for a mean of four turns. The third model showed delayed appearance of injected content and shorter persistence, though vulnerability remained substantial.

 

Attack Mechanisms and Regulatory Implications

The context-aware strategy exploited models’ alignment toward providing helpful, contextually appropriate responses. Substances with limited but existing clinical evidence, such as red ginseng or opioids for moderate pain, created ambiguous zones where safety guardrails did not activate consistently. The evidence-fabrication strategy targeted scenarios with strong safety mechanisms by introducing counterfeit meta-analyses or fictitious guidelines. Models were unable to reliably distinguish fabricated sources from legitimate ones when presented in plausible formats, revealing structural limitations in evidence-based reasoning.

 

Persistence patterns indicated that once unsafe recommendations were generated, they frequently influenced subsequent dialogue turns. Even models demonstrating initial resistance could later incorporate injected content under refined conditions. Flagship systems equipped with advanced safety mechanisms remained vulnerable to refined client-side attacks, with success rates between 80% and 100% in high-risk pregnancy scenarios.

 

The simulated attack vector reflected plausible threat environments, including compromised browser extensions, third-party plug-ins and modified application programming interface responses. Indirect injections could manipulate outputs without privileged access, particularly in patient-facing deployments. Current commercial safeguards were insufficient to prevent sophisticated prompt injections. Proposed defensive approaches include layered safeguards such as input validation, output monitoring and multimodel verification, alongside adversarial testing through structured red-teaming.

 

Existing regulatory frameworks focus on algorithmic bias and standard operating conditions but do not mandate adversarial robustness testing or red-team evaluation. As a result, vulnerabilities emerging under manipulated conditions may remain undetected prior to deployment. Prior research has identified diagnostic error risks, training-data poisoning effects and guardrail bypasses in medical large language models, reinforcing the need for systematic adversarial assessment.

 

Controlled simulation demonstrates that commercial medical large language models remain highly vulnerable to prompt-injection attacks capable of generating unsafe or contraindicated recommendations. Vulnerability persisted across harm levels, clinical categories and both lightweight and flagship systems. Manipulated outputs frequently extended across dialogue turns, and refined client-side attacks achieved high success even against advanced safety mechanisms. These findings indicate that current safeguards are insufficient under adversarial conditions and underscore the need for structured robustness testing, strengthened system-level protections and regulatory attention before broader clinical deployment.

 

Source: JAMA Network Open

Image Credit: iStock


References:

Lee RW, Jun TJ, Lee J et al. (2025) Vulnerability of Large Language Models to Prompt Injection When Providing Medical Advice. JAMA Netw Open;8(12):e2549963.



Latest Articles

medical LLM security, prompt injection attacks, AI in healthcare risks, clinical decision support AI, healthcare cybersecurity, JAMA Network Open study, AI patient safety Study reveals medical LLMs vulnerable to prompt injection attacks, generating unsafe treatment advice across pregnancy, opioids and drug interactions.