Understanding the nuances of injuries, such as the Maisonneuve fracture, emphasises the importance of looking beyond the obvious in radiology. This lesson, crucial for radiologists, parallels the current fascination and debates surrounding generative artificial intelligence (AI). As the world continues to be captivated by AI, especially in radiology, it's essential to examine beyond what is visible for more effective AI research.

 

The Limitations of Evaluating AI Like Humans

In March 2023, OpenAI's GPT-4 achieved impressive scores on various academic and professional examinations, including the U.S. Medical Licensing Examination (USMLE) and a radiology board-style examination. However, testing large language models (LLMs) like people can lead to misleading results and misinterpretations of their capabilities. High scores from LLMs might not indicate genuine comprehension but could result from statistical correlations or memorisation. Given that we don't know GPT-4’s training data, data contamination is a possibility, where the model might have encountered the exam questions during training.Moreover, LLMs are brittle, and minor changes in questions or the order of multiple-choice options can significantly affect their responses. This fragility highlights the need for dynamic evaluation methods. For instance, when altering a question from "What is the absolute washout for this lesion?" to "What is the relative washout for this lesion?", GPT-4 used the wrong formula. These inconsistencies necessitate a shift in focus from mere test results to understanding the underlying mechanisms of LLM performance.

 

The Challenges of Closed Models

Researching closed models like GPT-4, whose architecture and training data are undisclosed, poses significant challenges. Studies that rely on querying these models and reporting their responses risk being mere product reviews rather than rigorous scientific inquiries. It isn’t easy to attribute scientific credibility to their outputs without knowing what these models are trained on. For example, did GPT-4 genuinely figure out an answer or regurgitate memorised content?

 

LLM drift, where models change over time, can also complicate research. This phenomenon occurs silently, altering prompts and outputs, which can both guard against misuse and produce undesirable outcomes. Thus, while companies might justifiably protect their proprietary models, researchers must exercise caution in assigning scientific merit to something they cannot fully examine.

 

The Illusion of Emergence

Emergence in LLMs refers to new abilities or behaviours not explicitly programmed into the model, appearing only in larger models. However, these so-called emergent abilities often stem from researchers' choice of metrics rather than intrinsic capabilities of the models. Overreliance on this concept can oversimplify the complex workings of these models and discourage deeper investigation.

 

The Potential and Pitfalls of Synthetic Data

In the medical domain, generating synthetic data using generative AI can address privacy concerns and enhance data diversity. Generative adversarial networks and diffusion models have demonstrated the ability to create synthetic radiographs, CT images, and MRI scans. However, synthetic data can perpetuate hidden biases and lack the richness and complexity of real data. This reliance might lead to “model collapse,” where the model's performance degrades over time.

 

The Necessity of Clinical Domain Expertise

Incorporating clinical domain expertise is crucial when developing radiology solutions. Studies lacking this expertise, like the example of XrayGPT, can produce misleading results. Effective radiology requires multiple views, and generating reports from single images can lead to significant errors. Engaging radiologists throughout the development process ensures the technology’s safety and effectiveness.

 

Beyond Clinical Accuracy

Clinical adoption of AI is complex, requiring a balance between clinical accuracy and efficiency. AI solutions should enhance human-AI symbiosis, improving workflows without overburdening radiologists. Personalised AI-generated impressions, which cater to radiologists' narrative styles, highlight the importance of personalisation in clinical AI adoption.

 

Each person's limited perspective can lead to incomplete and divergent conclusions. It is vital to approach generative AI with humility and caution, appreciate others' perspectives, and continue collective exploration. Generative AI's potential impact on healthcare is significant and transformational, requiring an open mind and a comprehensive understanding of the unseen intricacies involved.

 

Source: RSNA Radiology

Image Credit: iStock

 




Latest Articles

AI in radiology, GPT-4 limitations, closed models challenges, emergent behaviors, synthetic data pitfalls, clinical domain expertise Explore the complexities of AI in radiology beyond surface assessments. Discover insights on GPT-4's limitations, closed models, emergent behaviors, and the necessity of clinical expertise.