Patient-reported outcomes (PROs) capture symptoms, function and quality of life directly from those receiving care, yet they remain marginal in clinical workflows and AI development. Despite advances such as electronic collection and Computerised Adaptive Testing, most systems still depend on fixed questionnaires and unidimensional scoring, which struggle to reflect fluctuating, multidimensional lived experience. Generative artificial intelligence, particularly large language models, brings the capability to process unstructured language at scale and to conduct adaptive conversations. This shift opens two complementary directions: refining existing psychometrics with AI support and developing language-native tools that synthesise patient narratives. Realising either pathway requires rigorous validation, attention to equity and bias and a focus on clinical actionability and trust.
Must Read: AI-Enabled Early Detection Is Recasting Patient Care
Why PROs Remain Underused
The psychometric credibility of PROs has strengthened over decades, with instruments demonstrating validity and precision. Digital innovations have eased collection through ePRO platforms and adaptive testing, lowering burden while maintaining accuracy. Yet persistent underuse reflects conceptual and structural limits. Traditional instruments grew from the need for standardisation, privileging predefined items and latent traits such as pain or fatigue. Even with multidimensional extensions, these models compress complex experience into single or small sets of scores, often detached from daily reality. Patients must fit narratives into rigid categories, and clinicians receive numbers that are difficult to translate into tailored action. Meanwhile, AI models have grown from clinician-validated labels and structured datasets in imaging, multi-omics and administrative sources, leaving PROs infrequently captured, inconsistently integrated and absent at scale. The result is limited inclusion in predictive models and a risk of exacerbating disparities when digital exclusion, literacy barriers or impairments reduce access.
Language-Native Tools and Their Risks
Generative AI introduces three shifts with direct relevance for PROs. First, conversational agents enable interactive interviews rather than fixed questionnaires. Dialogues can adapt in real time to what a person says, and alternative interfaces such as voice or video may reduce barriers for those with literacy or motor challenges. Second, large language models support qualitative interpretation of free text, extracting health themes and producing concise clinical summaries rather than only numeric scores. Such summaries may help clinicians integrate patient perspectives into decisions by preserving nuance while reducing cognitive load. Third, language-native representations can move beyond reductionism. By modelling non-linear, multidimensional relations in language, models can synthesise how symptoms, function and context interact in everyday life, approaching a holistic impression derived from patient narratives.
These opportunities bring material risks. Generative systems can produce incorrect or incoherent outputs that interrupt interviews or mislead interpretation. Psychometric validity for measurement is not established, since current models were not designed as calibrated instruments. Ethical challenges include biased outputs, context drift, data security and threats to autonomy if systems overstep supportive roles. Emerging ethical frameworks emphasise non-maleficence, autonomy, equity, transparency and security, providing reference points but not yet resolving operational details for PROs. Any shift toward language-native tools must therefore be matched by safeguards that ensure reliability, fairness and acceptability for patients and clinicians.
Pathways to Validation and Actionability
Progress depends on validation strategies suited to generative tasks in PROs. Legacy metrics are not enough when systems perform summarisation, dialog management and inference of structured information from free text. A staged approach can help: general validation for robustness and consistency, task-specific checks for summary fidelity or scoring accuracy and clinical validation that links outputs to real-world outcomes or workflow impact. Explainability adapted to language models is also important, supporting clinician understanding without overstating certainty. Hybrid architectures that combine language models with traditional machine learning may improve interpretability and mitigate risk in constrained tasks, which is pertinent where semantic nuance must coexist with dependable measurement.
Bias audits need to be proactive and ongoing, evaluating demographic, socioeconomic and cultural patterns in interpretation to promote equity. Crucially, patients should be involved in co-developing validation methods to prevent repeating expert-only design cycles that miss what matters most to those reporting outcomes.
Even with validated systems, value emerges only when insights become actionable in care. Generative tools may produce risk flags, targeted summaries or recommendations, but these must align with workflows, interoperable data flows and staffing realities. Implementation will require decision support that clinicians can trust and training that fits into routine practice. Broader organisational factors remain central, including fragmented workflows, interoperability gaps, misaligned incentives and clinician burden. Social acceptability also matters. Inclusion during design, transparency about use and mitigation of bias are essential to build trust among patients, professionals and the public.
Generative AI can help PROs become more accurate through personalisation, more meaningful by preserving narrative richness and more actionable with outputs that support decisions. Two complementary routes are visible: hybrid refinement that augments established instruments with conversational capture and qualitative synthesis and a paradigm shift toward language-native assessment grounded in open-ended narratives. Either path will require purpose-built validation, systematic bias auditing and implementation that prioritises workflow fit, interoperability and trust. If these scientific, systemic and social conditions are met, PROs can inform individual care and generate narrative datasets that also support population insights, strengthening the link between what patients report and how health systems deliver care.
Source: npj digital medicine
Image Credit: iStock