Bridging the Gap: Explainable Differential Diagnosis with Dual-Inference LLMs

1 views
0
0

The Evolving Role of AI in Clinical Diagnosis

The landscape of medical diagnostics is undergoing a profound transformation, driven in large part by the rapid advancements in artificial intelligence, particularly large language models (LLMs). While LLMs have demonstrated impressive capabilities in tasks ranging from natural language understanding to complex reasoning, their application in clinical settings, especially for differential diagnosis (DDx), presents unique challenges and opportunities. The ability of an LLM to not only suggest potential diagnoses but also to explain the reasoning behind these suggestions is paramount for clinical adoption and trust. This need for explainability is at the forefront of research aimed at enhancing clinical decision-making processes.

Introducing the Dual-Inf Framework for Explainable DDx

Addressing the critical gap in explainable differential diagnosis, researchers have introduced a novel framework named Dual-Inf. This innovative approach is designed to harness the power of LLMs to generate not just accurate diagnostic suggestions but also high-quality, understandable explanations. The Dual-Inf framework is built upon a sophisticated architecture comprising four key components. Firstly, a forward-inference module, powered by an LLM, generates initial diagnoses based on patient-reported symptoms. Secondly, a backward-inference module, also an LLM, performs inverse reasoning by recalling representative symptoms associated with the initial diagnoses, effectively moving from diagnoses back to symptoms. This bidirectional reasoning is crucial for a comprehensive understanding of the diagnostic possibilities. Thirdly, an examination module, another LLM, scrutinizes the patient's notes alongside the outputs from the inference modules. This component is responsible for prediction assessment, such as completeness checks, and decision-making, including the filtering of low-confidence diagnoses. Finally, an iterative self-reflection mechanism allows the system to refine its diagnoses. When low-confidence diagnoses are identified, they are fed back to the forward-inference module, prompting it to "think twice" and potentially arrive at a more accurate conclusion.

The Open-XDDx Dataset: A Foundation for Evaluation

A significant hurdle in developing and evaluating explainable DDx systems has been the lack of specialized datasets. To address this, the first publicly available DDx dataset, termed Open-XDDx, has been developed. This dataset comprises 570 clinical notes, each meticulously annotated with expert-derived explanations for differential diagnoses. The dataset is structured to facilitate a comprehensive evaluation of DDx explanations, covering a range of clinical specialties. Table 1 provides a statistical overview of the dataset's characteristics, including the number of notes, average note length, and the distribution of diagnoses and explanations per note and per diagnosis. Table 2 further breaks down the dataset by clinical specialty, highlighting the distribution across nine areas such as nervous system diseases, digestive system diseases, and cardiovascular diseases, underscoring the dataset's breadth and relevance to diverse medical fields.

Performance and Efficacy of Dual-Inf

The efficacy of the Dual-Inf framework has been rigorously tested using prominent LLMs, including GPT-4 and GPT-4o, across the nine clinical specialties represented in the Open-XDDx dataset. Figure 1b illustrates the differential diagnosis performance, showing averaged results over five runs with standard deviations. The results indicate that Dual-Inf significantly enhances DDx accuracy. Beyond mere diagnostic accuracy, the framework

AI Summary

The integration of large language models (LLMs) into healthcare has shown immense promise, particularly in diagnostic tasks. However, a significant challenge has been their ability to provide clear and reliable explanations for their diagnostic suggestions. This analysis explores the recent advancements in addressing this gap, focusing on the development of explainable differential diagnosis (DDx) systems using LLMs. A key development is the creation of the first publicly available DDx dataset, Open-XDDx, which includes expert-derived explanations for 570 clinical notes. This dataset is crucial for evaluating the explainability of LLM-generated DDx. Complementing this, a novel framework named Dual-Inf has been proposed. Dual-Inf employs a dual-inference mechanism, involving both forward inference (symptoms to diagnoses) and backward inference (diagnoses to symptoms), alongside an examination module and an iterative self-reflection process. This framework aims to improve the precision of DDx and the quality of their explanations. Early evaluations using prominent LLMs like GPT-4 and GPT-4o demonstrate Dual-Inf

Related Articles