OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 09.04.2026, 23:23

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative Evaluation of Large Language Models in Clinical Diagnostics for Real-World Medical Cases

2026·0 Zitationen·Applied SciencesOpen Access
Volltext beim Verlag öffnen

0

Zitationen

7

Autoren

2026

Jahr

Abstract

Background/Aim: Although large language models (LLMs) achieve >90% accuracy on medical multiple-choice questions, their actual diagnostic utility remains unproven. This study compared four approaches to general medical diagnostics using real clinical data and LLMs: standalone LLM, LLM with Retrieval-Augmented Generation (RAG), LLM with expert system, and full integration of LLM, RAG, and expert system. Methods: Twenty LLMs were tested on 1655 unpublished clinical cases (paediatrics and internal medicine) in Polish and English, yielding 264,800 diagnostic evaluations. TOP-1, TOP-3, and grouped International Classification of Diseases, 10th Revision (ICD-10) accuracy were measured. Results: Standalone LLMs achieved only 16–20% TOP-1 accuracy. RAG added minimal benefit (+2–5 percentage points). The expert system improved performance 2.5-fold (55% paediatrics, 39% internal medicine). Unexpectedly, combining all components reduced results compared to the expert system alone, revealing an “integration paradox”. Conclusions: LLMs alone are insufficient for clinical diagnostics in practical applications with incomplete data. Developers of clinical decision support systems should not treat LLMs as standalone diagnostic engines. Expert systems based on machine learning algorithms provide better support and should serve as the primary component in hybrid architectures. Combining LLM, RAG, and expert systems without deliberate output weighting paradoxically reduces performance. Hybrid systems should implement dynamic source selection or prediction weighting mechanisms rather than simple integration.

Ähnliche Arbeiten