A nationwide survey published in BMJ Health Care Informatics examined expectations for medical artificial intelligence performance among physicians and the general population in Sweden. The survey explored acceptable levels of sensitivity and specificity across common clinical scenarios, including triage decisions and ECG interpretation. Participants compared AI performance with defined human benchmarks and indicated how many errors would be acceptable. Response rates reached 45% among physicians and 31% in the general population. Findings show consistent expectations for AI to exceed human performance, alongside widespread but cautious use of AI tools. Moderate trust levels contrast with high accuracy demands, highlighting a gap between expectations and current practice.

 

Higher Sensitivity Expected Across Clinical Scenarios
Both physicians and the general population require AI systems to achieve higher sensitivity than human clinicians in all assessed scenarios. In chest pain triage, where a nurse correctly identifies 84 out of 100 emergencies, physicians expect AI to identify 95 cases, while the general population expects complete detection of all cases. Similar patterns appear in sore throat triage and ECG-based myocardial infarction detection. Across scenarios, most respondents set stricter thresholds for AI than for human performance, with only a minority accepting equivalent levels. Expectations rise further in high-stakes settings, where missing a critical diagnosis carries significant risk. The general population more frequently demands perfect sensitivity, reflecting a lower tolerance for missed cases.

 

These findings indicate a clear prioritisation of sensitivity over other performance measures. High sensitivity reduces the likelihood of missed diagnoses but increases the risk of unnecessary referrals. The survey responses demonstrate that avoiding missed cases outweighs concerns about over-referral for many respondents. At the same time, expectations often exceed the performance of existing systems, suggesting a potential misalignment between perceived and achievable accuracy levels.

 

Must Read: AI for Personalised Digital Therapeutics

 

Diverging Views on Specificity and Risk Tolerance
Expectations for specificity show greater variability than for sensitivity, particularly in triage contexts. In chest pain assessment, where unnecessary referrals are common in human decision-making, both physicians and the general population expect AI to reduce these errors. Median responses indicate a shift towards fewer unnecessary referrals, but the distribution of responses reveals strong divergence. Some respondents demand perfect specificity, eliminating all false positives, while others accept or even favour very low specificity.

 

This polarisation reflects differing attitudes towards risk. High specificity reduces unnecessary interventions but increases the likelihood of missed diagnoses. Conversely, low specificity ensures that fewer critical cases are missed but places greater strain on healthcare resources. The ECG interpretation scenario shows less variation, as human performance is already near optimal. In this case, both groups align AI expectations with existing human benchmarks rather than demanding improvement.

 

The variation in specificity requirements suggests that a single performance threshold may not satisfy all stakeholders. Preferences differ according to perceived risk, clinical context and individual tolerance for uncertainty. These findings underline the complexity of balancing sensitivity and specificity in clinical AI deployment, particularly in decision-making systems that directly affect patient pathways.

 

Growing Use of AI with Moderate Trust Levels
AI use is already established among both physicians and the general population. A majority of physicians report experience with chatbots, with a smaller proportion using them for clinical or administrative tasks. Some physicians report applying chatbot-generated information to real patient cases, including differential diagnosis and treatment considerations. Use of other AI systems is more widespread, particularly for ECG interpretation and speech-to-text transcription. Additional applications include radiology image analysis, clinical documentation tools and decision support systems.

 

Among the general population, a smaller proportion reports using chatbots for health-related advice. Despite increasing use, trust in AI-generated medical information remains moderate. Both physicians and the general population report similar levels of trust, with no respondents indicating complete confidence. Physicians express comparable trust in chatbots and established ECG interpretation software, suggesting that familiarity with technology does not necessarily translate into higher confidence.

 

The coexistence of moderate trust and high performance expectations highlights a notable gap. AI tools are already integrated into practice, yet users demand accuracy levels that exceed current capabilities. This gap raises challenges for implementation, particularly in ensuring that expectations align with real-world system performance. Transparent communication about capabilities and limitations becomes essential to maintain confidence while avoiding over-reliance.


Expectations for medical AI consistently exceed human performance across key clinical scenarios, particularly in sensitivity. Divergent views on specificity reflect differing attitudes towards risk and uncertainty. At the same time, AI tools are already in use, supported by moderate but not complete trust. These patterns indicate a need to align user expectations with achievable performance through early engagement and clear communication. Addressing the balance between sensitivity and specificity remains central to successful implementation, particularly in high-stakes clinical settings where both missed diagnoses and unnecessary interventions carry consequences.

 

Source: BMJ Health Care Informatics

Image Credit: iStock 


References:

Arvidsson R, Widén J, Al-Naasan L et al. (2026) Acceptable accuracy for medical AI: a survey of physicians and the general population in Sweden. BMJ Health Care Inform; 33: e101899.



Latest Articles

medical AI, AI accuracy healthcare, clinical AI performance, AI sensitivity specificity, healthcare AI adoption, AI trust healthcare, ECG AI Survey reveals medical AI must exceed human accuracy, with higher sensitivity demands, moderate trust, and challenges in clinical adoption.