Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Two-stage prompting framework with predefined verification steps for evaluating diagnostic reasoning tasks on two datasets

2025·0 Zitationen·npj Digital MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Despite their growing use in medicine, large language models (LLMs) demonstrate limited diagnostic reasoning. We evaluated a two-stage prompting framework with predefined verification steps (Initial Diagnosis → Verification → Final Diagnosis) on 589 MedQA-USMLE and 300 NEJM cases using GPT-4o and DeepSeek-V3. Each case was sampled five times and evaluated by blinded board-certified doctors. After verification of the initial diagnosis, the final diagnosis achieved up to 5.2% higher accuracy, 16.0% lower uncertainty, and 23.3% greater consistency. Among three reasoning errors, the reasoning procedure of the final diagnosis showed the largest reduction in incorrect medical knowledge (63.0%). Compared with Chain-of-Thought, the framework yielded improvements of up to 4.0% in accuracy, 4.9% reductions in uncertainty, and 11.0% increases in consistency. These results suggest that the two-stage prompting framework with predefined verification steps may contribute to improved diagnostic reasoning, as observed on these two datasets under experimental conditions. More datasets and models will be needed to evaluate performance.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsMachine Learning in Healthcare

Volltext beim Verlag öffnen

Two-stage prompting framework with predefined verification steps for evaluating diagnostic reasoning tasks on two datasets

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen