Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Beyond Accuracy: Assessing LLMS' Ability to Recognize Their Limits in Medical Decision-Making

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

While Large Language Models (LLMs) demonstrate impressive medical capabilities through Retrieval-Augmented Generation (RAG) and domain optimization, a critical question remains: can LLMs autonomously recognize when to seek external help rather than provide independent medical recommendations? This metacognitive capability is essential for safe healthcare deployment. To address this gap, we introduce a novel evaluation framework assessing LLMs' autonomous help-seeking behavior through three workflows: Force-RAG (mandated external retrieval), No-RAG (internal knowledge only), and Auto-RAG (autonomous decision-making). Our comprehensive evaluation of 13 LLMs configurations across six clinical departments using 954 real-world cases reveals three key insights: (1) larger models don't necessarily exhibit superior help-seeking calibration; (2) reasoning strategies significantly impact metacognitive performance across medical domains; (3) proprietary models demonstrate superior autonomy in balancing self-reliance with appropriate help-seeking. These findings challenge conventional scaling assumptions and establish help-seeking behavior as fundamental to medical AI reliability.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

Beyond Accuracy: Assessing LLMS' Ability to Recognize Their Limits in Medical Decision-Making

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen