Advancing Radiology: A Two-Stage LLM Approach for Enhanced Entity and Relationship Mapping in Reports

Introduction: The Challenge of Radiology Report Analysis

Radiology reports are indispensable for accurate clinical assessment and prognosis, providing detailed insights derived from medical imaging modalities such as chest CT and brain MRI. However, the free-text nature of these reports presents a significant challenge for automated analysis. Extracting precise relational information and classifying entities accurately requires a deep understanding of complex medical narratives. Traditional methods often struggle with the linguistic variability, nuances, and implicit information present in these documents, leading to a critical gap in effectively structuring radiologic findings and tracking disease progression over time.

A Novel Two-Stage Pipeline for Enhanced Analysis

To address these challenges, a groundbreaking two-stage natural language processing (NLP) pipeline has been developed, integrating the capabilities of Bidirectional Encoder Representations from Transformers (BERT) with a large language model (LLM). This innovative approach aims to significantly enhance the accuracy of entity classification and relationship mapping within radiology reports. The pipeline is designed to tackle complex tasks, including lesion-location mapping in chest CT scans and diagnosis-episode mapping in brain MRI reports, both of which are crucial for clinical decision-making and understanding disease trajectories.

Stage 1: Entity Key Classification with BERT

The initial stage of the pipeline focuses on Entity Key Classification. Here, a BERT-based model is employed to meticulously identify and classify clinically relevant entities mentioned in the radiology reports. BERT, known for its prowess in understanding context and lexical nuances, excels at pinpointing specific terms such as mentions of lesions, diagnoses, or anatomical locations. This stage acts as a crucial data extraction layer, ensuring that the foundational information is accurately captured before further analysis.

Stage 2: Relationship Mapping with LLMs

In the second stage, the extracted entities from the BERT model are fed into a sophisticated LLM. This Relationship Mapping stage leverages the advanced contextual and semantic understanding capabilities of the LLM to infer relationships between these entities. Crucially, this stage considers the actual presence of entities and can even identify nuanced information such as negations. For chest CT reports, the pipeline targets the mapping of lesion-location pairs, while for brain MRI reports, it focuses on diagnosis-episode mapping. This allows for the structuring of radiologic findings and the capture of temporal patterns indicative of disease progression.

Dataset and Performance Metrics

The pipeline was trained and validated using a substantial dataset comprising over 400,000 radiology reports from the Seoul Asan Medical Center. This large-scale data allowed for robust model development and evaluation. The effectiveness of the pipeline was measured using macro F1-scores, achieving an impressive 77.39 for chest CT reports and 70.58 for brain MRI reports. These results underscore the significant improvement offered by the integrated BERT and LLM approach over individual models.

Key Contributions and Advantages

This research introduces several key contributions to the field of medical NLP:

Integrated Approach: The pipeline effectively combines BERT