Recent advances in medical Artificial Intelligence (AI), especially large language models (LLMs), have transformed human-computer interactions in healthcare by simulating human-like reasoning, which could fundamentally change medical learning and practice. While AI offers educational benefits such as enhanced learning and cognitive off-loading of routine tasks, it also poses risks, including overreliance (automation bias), loss of skills (deskilling, never-skilling), and reinforcement of errors (mis-skilling) due to AI’s opaque and sometimes inaccurate outputs. This creates uncertainty even for experienced clinicians, highlighting the need for adaptive practice supported by critical thinking.
A recent article proposes a framework to support critical thinking during clinical supervision in AI contexts, addressing a key gap in medical education.
While LLMs demonstrate expert-level performance in clinical reasoning tasks, they also inherit healthcare biases and can produce false or misleading information (confabulate). Thus, AI should be considered an aid rather than a replacement, with final clinical decisions remaining the responsibility of human clinicians. Given the high risks involved, AI use in clinical reasoning demands careful attention from educators and learners.
Using AI as a replacement for clinical reasoning rather than as a supportive tool risks harming skill development through deskilling, never-skilling, and mis-skilling. Overreliance on AI can cause learners to lose essential clinical reasoning and information recall abilities. Research shows frequent AI use correlates with decreased critical thinking due to increased cognitive off-loading, leading to less independent problem solving and analytic reasoning, especially among younger users. A trial found that blindly accepting AI outputs without critical evaluation worsened performance on complex tasks, particularly for lower-performing users, highlighting the dangers of unengaged reliance on AI over personal judgment.
Mis-skilling happens when learners blindly trust inaccurate or biased AI outputs, adopting erroneous clinical predictions. Studies show clinicians often accept AI biases, such as overestimating certain diagnoses, and that AI assistance can harm clinicians with lower baseline skills, sometimes leading to worse combined performance than AI alone. Attempts to explain AI decisions don’t always reduce these errors, indicating difficulty in recognising and correcting AI bias, which can reinforce mis-skilling. Conversely, ignoring correct AI advice reflects underuse and lost benefits. When clinicians have strong baseline skills, combining human and AI reasoning improves performance, suggesting that effective AI use depends on the user’s expertise.
In response to AI-related risks, medical education programmes have developed principles, competencies, and curricula for AI use in healthcare. However, educators still need strategies to foster adaptive practice during real-time AI interactions. Emphasising strong foundational knowledge is essential to maximising AI benefits and minimising risks. The growing presence of AI should be viewed as an educational opportunity to enhance both AI and clinical literacy.
An AI interaction is a moment when a computational tool provides an untraceable judgment that requires the user to take a leap of faith in trusting it. This leap of faith highlights that AI outputs can’t be fully trusted without verification, emphasising the need for users to pause and critically evaluate AI recommendations. Recognising AI interactions offers educators a chance to foster critical thinking.
Building on the Socratic method and the existing DEFT framework (Diagnosis, Evidence, Feedback, Teaching), the authors propose an adapted “DEFT-AI” approach to support critical thinking and adaptive practice during AI-assisted clinical reasoning.
The educator starts by exploring the learner’s clinical reasoning and AI use. This includes asking how the learner synthesised the clinical problem (data gathering and inductive reasoning) and developed the differential diagnosis (deductive reasoning and knowledge). The educator also inquires about the AI tool used, the prompts given, how follow-up prompts tested the AI’s output, and whether the AI’s suggestions influenced, replaced, or supplemented the learner’s diagnostic process.
At this stage, the educator assesses the learner’s use of supporting and opposing evidence to evaluate both medical knowledge and AI understanding. This includes examining diagnostic reasoning, hypothesis testing, and adaptive expertise. The educator also explores the learner’s grasp of pathobiology, clinical guidelines, and evidence-based medicine. Simultaneously, the educator encourages self-assessment of AI literacy by asking about the AI’s reasoning, limitations, purpose, and prompting strategies, as well as evidence supporting its use. Learners may be asked to present cases without AI to evaluate independent problem-solving and detect potential overreliance on AI.
Guided self-reflection is key in this phase, where the educator encourages the learner to reflect on growth opportunities related to the clinical case and their AI use. This includes identifying missed diagnoses, medical knowledge gaps, and areas for improvement in AI literacy and application. The educator uses the learner’s self-reflection to offer tailored feedback on their clinical reasoning and AI use. This includes reinforcing reasoning skills, encouraging evidence-based medicine and critical appraisal, and promoting AI literacy based on the learner’s needs.
Educators conclude AI interactions by promoting both foundational clinical skills and AI literacy, encouraging learners to continue practicing AI use under appropriate supervision and self-monitoring. Two common AI collaboration styles—“centaur” and “cyborg”—emerge. Centaur users strategically delegate tasks between themselves and AI, relying on clinical judgment for critical decisions, while cyborg users closely integrate AI throughout the task, iteratively refining AI outputs. Educators should guide learners to adopt a centaur approach for high-risk or uncertain tasks and a cyborg approach for well-defined, low-risk activities. Both styles require active, critical engagement with AI to avoid deskilling or overreliance. Teaching these frameworks helps learners develop adaptive, task-appropriate AI use that evolves with technology and clinical complexity.
AI literacy starts with recognising AI interactions as moments requiring critical pause due to the system’s opaque judgment. The DEFT-AI framework supports adaptive practice by helping learners shift between centaur, cyborg, and AI-independent modes based on the task. Educators should emphasise two key AI literacy skills: structured critical appraisal of AI tools and outputs, and effective prompt engineering to improve AI accuracy. Using evidence-based practice principles, such as Sackett’s five-step model, provides a structured approach for evaluating AI’s trustworthiness in clinical tasks.
Evaluating an AI tool’s trustworthiness begins by clearly defining the question guiding the evidence search. Next, educators and learners gather and critically appraise evidence such as peer-reviewed studies, AI scorecards, leaderboards, and regulatory information. While these resources help assess AI tools, they offer limited real-time value in educational settings, and comprehensive AI evaluations are generally beyond the scope of most educators and learners.
Rather than focusing on evaluating AI tools themselves, clinicians should critically assess AI-generated outputs by integrating their clinical skills, patient preferences, and research evidence. They compare AI suggestions with independent evidence, such as guidelines or expert opinions, to judge accuracy. While agreement between AI and clinician conclusions can build trust, human oversight remains essential. Developing these skills enables learners to reliably evaluate AI output alongside their own clinical reasoning in patient care.
Effective prompting is essential for getting accurate and relevant responses from LLMs in medical applications. Like consulting a human expert, clear, specific, and context-rich prompts yield better results, while vague or biased prompts can cause misleading or sycophantic answers. Using example cases and encouraging the AI to “think out loud” (chain-of-thought prompting) improves accuracy and transparency of reasoning. Engaging in follow-up prompts to clarify or revise AI responses fosters active learning, strengthens critical thinking, and enhances the educational value of AI tools.
Despite AI’s technical advances, its use still requires careful verification and cautious trust. As AI becomes integral to medical training and practice, educators must embrace AI interactions as permanent. Critical thinking is essential to prevent skill loss from overreliance on AI and to build adaptive practice and AI literacy in learners and educators. The DEFT-AI framework offers a structured way to promote critical thinking and validate AI outputs. Educators must foster a culture of verification through curricular redesign and collaboration with AI developers and healthcare systems, including systematic assessment of AI use. Without proper governance and monitoring, AI’s risks could outweigh its benefits, threatening medical education. Ultimately, adopting a verify and trust approach is key to making AI a valuable support to human expertise.
Source: NEJM
Image Credit: iStock
References:
Abdulnour R-E E, Gin B, Boscardin CK (2025) Educational Strategies for Clinical Supervision of Artificial Intelligence Use. NEJM. 393:786-797.
Note: An effective AI paper detector supports responsible academic supervision.