Large language models (LLMs) are increasingly embedded in healthcare workflows, supporting activities ranging from clinical decision support and question answering to education and research. Their rapid uptake has been driven by evidence that, under controlled conditions, they can perform at levels comparable to clinicians on specific tasks. Yet reliability in real-world use depends not only on model architecture or training data, but also on how clinicians and other users frame their queries and what information they provide. Variations in tone, the perceived authority of external opinions and the completeness of clinical context can subtly but materially alter outputs. Understanding these user-driven influences is critical as LLMs move closer to routine clinical use, where inaccurate or biased responses may carry significant consequences for patient care and safety.
Misinformation Framing and Model Susceptibility
Systematic testing across multiple LLMs shows that all models are vulnerable to user-provided misinformation embedded within medical queries. Among several framing variables, the tone of an incorrect external opinion exerts the strongest influence. Definitive, confident assertions consistently lead to larger drops in diagnostic accuracy than hedged or tentative language. The expertise attributed to the source of misinformation also matters. Incorrect statements presented as coming from experienced clinicians are more likely to bias model outputs than those attributed to inexperienced sources.
Differences emerge between proprietary and open-source models. Proprietary systems demonstrate higher baseline accuracy under neutral conditions but show greater performance degradation when exposed to assertive or authoritative misinformation. Open-source models, while generally less accurate overall, are less consistently swayed by such framing. The assumed persona of the model also plays a role, particularly for proprietary systems. Prompts framing the model as a medical expert tend to yield slightly greater resistance to external bias than those presenting it as a medical assistant, although this effect is modest compared with tone and source authority.
Must Read:Uncertainty-Aware LLM Enhances Explainable Diagnosis
These findings indicate that LLMs often align with user opinions even when those opinions are incorrect. The tendency to accommodate confident user input appears especially pronounced in high-difficulty clinical questions, where ambiguity leaves more room for external cues to shape the response.
Effects of Missing Clinical Information
Beyond framing, the completeness of clinical information strongly influences model performance. When essential data is omitted, accuracy declines across all models. Physical examination findings and laboratory or diagnostic test results emerge as the most critical categories. Their removal produces the largest performance drops, underscoring their central role in clinical reasoning. History taking and past medical history occupy a secondary tier of importance, while demographic details and miscellaneous contextual information generally have smaller effects on simpler queries.
Model type again shapes the pattern. Proprietary LLMs retain higher absolute accuracy but experience sharper declines when key data are missing, particularly laboratory results or physical findings. Open-source models show lower baseline performance yet follow a similar hierarchy of information importance. Larger parameter models outperform smaller ones, and medical fine-tuning yields incremental improvements, although it does not eliminate sensitivity to missing data.
Dataset complexity also matters. In more challenging clinical scenarios, history taking can rival or exceed laboratory and physical examination data in importance. These cases demand a broader contextual picture, suggesting that omissions tolerated in straightforward questions can become detrimental in complex clinical reasoning.
Consistency and Reasoning Patterns Across Models
Analysis of how models respond to information removal reveals important differences in reasoning consistency. When evaluated on structured clinical questions, models display highly similar patterns in how accuracy declines as different data categories are omitted. This consistency suggests shared clinical heuristics when the task is well defined. In contrast, more complex question sets reveal greater divergence. Proprietary models tend to remain aligned with each other, while open-source models vary more widely in which information they prioritise.
Some open-source systems even show paradoxical accuracy gains under certain misleading conditions. Further examination indicates that these improvements do not reflect better reasoning but rather anchoring effects, where the model is drawn toward answer options explicitly mentioned in the prompt regardless of logical context. When misleading prompts are adjusted to remove this anchoring, the apparent gains disappear, reinforcing the interpretation that superficial linguistic cues, rather than robust reasoning, drive these effects.
Medical fine-tuning improves performance within specific domains and can reduce vulnerability to authoritative misinformation. However, these benefits are context dependent. Fine-tuned models may underperform in linguistically complex or unfamiliar settings, suggesting a trade-off between specialisation and general reasoning capacity.
The reliability of medical AI is shaped as much by user behaviour as by model design. Confidently framed misinformation and authoritative external opinions can significantly bias LLM outputs, while incomplete clinical context, particularly missing laboratory or physical examination data, undermines accuracy. Proprietary models deliver stronger baseline performance but are more susceptible to these user-driven factors, whereas open-source models show lower accuracy with more variable reasoning patterns. For healthcare professionals, these findings highlight the importance of cautious query framing, avoidance of definitive or authoritative misinformation and provision of complete, relevant clinical details. As LLMs continue to integrate into clinical practice, attention to how questions are posed will be essential to maximise benefit while minimising risk.
Source: Journal of Healthcare Informatics Research
Image Credit: iStock