Large language models are increasingly used in healthcare, including for summarising discharge notes and responding to health-related queries. Concerns remain about their handling of fabricated or misleading medical content, particularly when misinformation is embedded within credible clinical narratives. A cross-sectional benchmarking analysis evaluated how often models accepted false medical claims and how rhetorical framing influenced this behaviour. Twenty models were tested through more than 3.4 million prompted runs using three corpora: public-forum and social-media discussions, real hospital discharge notes with a single inserted false recommendation and physician-validated simulated vignettes. Each scenario was presented with a neutral base prompt and with prompts framed around 10 named logical fallacies. Two outcomes were recorded: susceptibility, defined as acceptance of the false claim, and fallacy detection, defined as identification of flawed reasoning.

 

Benchmark Design Across Clinical and Public Contexts

The evaluation combined authentic clinical material, online discourse and structured simulated scenarios. Real discharge notes were sourced from the Medical Information Mart for Intensive Care database. In each note, two physicians inserted one fabricated recommendation within the recommendations section. The social-media corpus consisted of 140 misinformation examples selected by physicians from an initial pool of more than 760 posts drawn from two public forums. These posts reflected commonly circulated health-related rumours. A third corpus comprised simulated misinformation vignettes designed to be realistic and clinically relevant. Two physicians validated these scenarios, resolving disagreements by consensus.

 

Must Read: Deepfakes Expose Gaps in Healthcare Security

 

For each case, models received a base prompt requesting identification of medically incorrect or fabricated information. The same content was then presented using prompts framed around ten logical fallacies, including appeal to authority, appeal to popularity, appeal to emotion, slippery slope and circular reasoning. Each fallacy template was rephrased multiple times to limit lexical bias. Models returned structured, machine-parsable responses indicating whether misinformation and fallacies were present. Open-source systems were executed locally, while proprietary models were accessed via official interfaces. Processing of the credentialed clinical dataset occurred within a secure environment.

 

Framing Effects and Dataset-Specific Patterns

Across all models and datasets, fabricated content was accepted in nearly 1/3 of base prompts. In most cases, applying fallacy-based framing reduced susceptibility or did not significantly increase it. The largest overall reduction was observed with appeal to popularity, lowering acceptance to roughly 1/8 prompts. Two framings increased susceptibility compared with the base prompt: appeal to authority and slippery slope.

 

Patterns varied by corpus. Modified discharge notes demonstrated the highest baseline susceptibility, with acceptance approaching half of base prompts. In this setting, most fallacy framings reduced susceptibility, yet appeal to authority and slippery slope produced higher acceptance than the base condition. Social-media misinformation showed substantially lower baseline susceptibility than discharge notes. Under most fallacy prompts, susceptibility declined further, and appeal to popularity again showed the strongest reduction. In this corpus, appeal to authority and slippery slope did not materially change acceptance relative to the base prompt. Simulated vignettes exhibited low baseline susceptibility. Several fallacy framings reduced acceptance, while appeal to authority increased susceptibility and circular reasoning produced a smaller increase.

 

Fallacy detection showed a different pattern. Under base prompts, models frequently misclassified fallacy presence. When prompts explicitly invoked fallacy labels, correct detection rates increased across datasets, although variability persisted across fallacy types and corpora.

 

Model-Level Variation and Scale Associations

Substantial heterogeneity emerged at model level. Baseline susceptibility ranged from high acceptance rates in smaller systems to very low apparent susceptibility in a model that declined to provide structured outputs, reflecting refusal rather than consistent rejection of misinformation. A composite robustness score combined low susceptibility with correct fallacy detection using a geometric mean. The highest score was achieved by a model that paired low acceptance of fabricated content with strong fallacy detection. Another model demonstrated extremely low practical susceptibility despite more moderate fallacy detection performance.

 

An association between parameter count and susceptibility was examined for models with known scales. Susceptibility generally decreased as parameter count increased, with negative correlations reported across fallacy types. Models below 10 billion parameters showed mean susceptibility above 45%, whereas models exceeding 30 billion parameters typically remained below 25%. The relationship was not uniform. Some smaller models performed comparatively well under specific fallacy framings, and a mid-sized model showed the lowest practical susceptibility overall. These findings indicated that scale contributed to resilience but did not fully determine outcomes, with alignment and safety tuning also shaping performance.

 

Qualitative examples illustrated context-dependent risk. Several models endorsed misinformation from the social-media set, including claims involving pregnancy medication, rectal garlic, CPAP use and mammography, as well as assertions equating certain foods with prescription anticoagulants. In discharge notes, more than half of the models accepted at least some fabricated recommendations written in formal clinical style, such as advice involving cold milk for oesophagitis-related bleeding, avoiding citrus before laboratory tests and dissolving a laxative in hot water to activate ingredients.

 

The benchmarking analysis demonstrated that endorsement of fabricated medical information remained common across contemporary large language models, particularly when misinformation was embedded within formal discharge-note prose. Most fallacy framings reduced susceptibility, yet appeal to authority and slippery slope increased acceptance in pooled results, and appeal to authority also increased susceptibility in simulated vignettes. Explicit fallacy prompts generally improved fallacy detection compared with neutral prompts, though variability persisted across models and datasets. Susceptibility tended to decrease with greater model scale, but exceptions underscored the importance of alignment and safety mechanisms beyond parameter count.

 

Source: The Lancet Digital Health

Image Credit: iStock


References:

Omar M, Sorin V, Wieler LH et al. (2026) Mapping the susceptibility of large language models to medical misinformation across clinical notes and social media: a cross-sectional benchmarking analysis. The Lancet Digital Health, 8(1):100949.



Latest Articles

large language models, medical misinformation, clinical AI safety, discharge notes analysis, logical fallacies in AI, healthcare AI benchmarking, patient safety risks, The Lancet Digital Health Study benchmarks LLM vulnerability to medical misinformation in clinical notes and social media, highlighting framing effects and safety gaps.