Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

MAIA: A Multidimensional Benchmark for Assessing Medical AI Agents

2026·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Large language models show remarkable potential in medical scenarios, especially as autonomous agents for complex clinical reasoning. Rigorous evaluation is essential to ensure their reliability in real-world healthcare applications. However, existing medical benchmarks suffer from narrow task scopes, dependence on public datasets prone to data leakage, and limited coverage of diverse agent capabilities. To address these gaps, we introduce Medical AI Assessment (MAIA), a comprehensive benchmark evaluating medical agents along three dimensions: retrieval-based medical questions generated through biomedical APIs, multi-hop reasoning tasks derived from curated biomedical knowledge graphs, clinical-pathway reasoning questions constructed from authoritative guidelines. MAIA leverages large language models for automatic question generation, reducing manual effort while maintaining clinical fidelity and reasoning depth. Experiments across base and reasoning models reveal both strengths and gaps, underscoring MAIA’s value for advancing medical agent evaluation. MAIA is publicly available at https://huggingface.co/datasets/DiligentDing/MAIA.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)Machine Learning in Healthcare

Volltext beim Verlag öffnen

MAIA: A Multidimensional Benchmark for Assessing Medical AI Agents

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen