Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Two-stage prompting framework with predefined verification steps for evaluating diagnostic reasoning tasks on two datasets
0
Zitationen
2
Autoren
2025
Jahr
Abstract
Despite their growing use in medicine, large language models (LLMs) demonstrate limited diagnostic reasoning. We evaluated a two-stage prompting framework with predefined verification steps (Initial Diagnosis → Verification → Final Diagnosis) on 589 MedQA-USMLE and 300 NEJM cases using GPT-4o and DeepSeek-V3. Each case was sampled five times and evaluated by blinded board-certified doctors. After verification of the initial diagnosis, the final diagnosis achieved up to 5.2% higher accuracy, 16.0% lower uncertainty, and 23.3% greater consistency. Among three reasoning errors, the reasoning procedure of the final diagnosis showed the largest reduction in incorrect medical knowledge (63.0%). Compared with Chain-of-Thought, the framework yielded improvements of up to 4.0% in accuracy, 4.9% reductions in uncertainty, and 11.0% increases in consistency. These results suggest that the two-stage prompting framework with predefined verification steps may contribute to improved diagnostic reasoning, as observed on these two datasets under experimental conditions. More datasets and models will be needed to evaluate performance.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.418 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.288 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.726 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.516 Zit.