OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 10.04.2026, 09:28

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluation of large language models for diagnostic impression generation from brain MRI report findings: a multicenter benchmark and reader study

2026·0 Zitationen·npj Digital MedicineOpen Access
Volltext beim Verlag öffnen

0

Zitationen

12

Autoren

2026

Jahr

Abstract

Automatically deriving radiological diagnoses from brain MRI report findings is challenging due to high complexity and domain expertise. This study evaluated 10 large language models (LLMs) in generating diagnoses from brain MRI report findings, using 4293 reports (9973 diagnostic labels) covering 15 brain disease categories from three medical centers. DeepSeek-R1 achieved the highest performance among the evaluated models on the full dataset and across different clinical scenarios and subgroups, particularly when provided with structured report findings and clinical information. A top three differential-diagnosis prompting strategy achieved superior performance, with 97.6% patient-level accuracy versus 87.1% for single-diagnosis prompting. The diagnostic performance of six radiologists was assessed with and without DeepSeek-R1 assistance on 500 reports. Integration of DeepSeek-R1 significantly improved diagnostic accuracy (AUPRC: 0.774-0.893) and reduced reading time (from 61 to 53 s), with more pronounced benefits for junior radiologists. Our findings indicate that effective automated diagnostic impression generation in brain MRI reporting requires advanced large-scale LLMs like DeepSeek-R1. With optimized prompting and input strategies, this framework may serve as a supportive tool in drafting brain MRI reports and contribute to enhanced workflow efficiency in radiology practice.

Ähnliche Arbeiten