Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Artificial intelligence-driven virtual tumor board enhances precision care in myelodysplastic syndromes
0
Zitationen
20
Autoren
2026
Jahr
Abstract
Abstract Background Large language models (LLMs) perform well on standardized medical exam questions, but their reliability for complex hematology decision making is uncertain. We compared four general-purpose LLMs (GPT-4o, GPT-o3, Claude Sonnet 4, and DeepSeek-V3) with a Virtual MDS Panel (VMP), a coordinated multi-agent AI system in which domain-specialized, rule-bound software agents (WHO/ICC guidelines; IPSS-R/IPSS-M; NCCN) collaborate to generate tumor-board-level recommendations. Methods Each model generated diagnostic, prognostic, and treatment recommendations for 30 myelodysplastic syndrome cases. Nine international MDS experts from five institutions, blinded to model identity, completed 3,000 structured ratings using 5-point Likert scales for diagnosis, prognosis, and therapy and classified errors by severity. Results General-purpose LLMs achieved modest expert ratings (overall mean scores: 3.7 for GPT-o3, 3.2 for GPT-4o, 3.1 for DeepSeek, and 3.0 for Claude) and contained major factual errors in at least 24% of responses. The VMP increased the proportion of outputs rated 4 or higher to 87% (vs. 34-66% for general-purpose models), improved mean scores to 4.3 overall (4.3 for diagnosis, 4.4 for prognosis, and 4.1 for therapy), and reduced major errors to 8%. Conclusions In this blinded evaluation of 30 complex MDS cases, general-purpose LLMs produced clinically important errors at rates that raise safety concerns for autonomous hematology decision making. The VMP, a rule-bound, multi-agent architecture, approached expert-level accuracy supporting its potential role as an effective decision-support tool for MDS in the future.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.418 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.288 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.726 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.516 Zit.
Autoren
- David M Swoboda
- Amy E DeZern
- James T. England
- Sangeetha Venugopal
- Thomas J. Kehoe
- Brandon J. Aubrey
- Marco Gabriele Raddi
- Angela Consagra
- Jiasheng Wang
- Jonathan Andreadakis
- Gustavo Rivero
- Maximilian Stahl
- Amer M Zeidan
- Torsten Haferlach
- Andrew M. Brunner
- Rena Buckstein
- Valeria Santini
- Matteo Giovanni Della Porta
- Mikkael A Sekeres
- Aziz Nazha
Institutionen
- Tampa General Hospital(US)
- Johns Hopkins University(US)
- Sidney Kimmel Comprehensive Cancer Center(US)
- Sunnybrook Health Science Centre(CA)
- Health Sciences Centre(CA)
- Sylvester Comprehensive Cancer Center(US)
- University of South Florida(US)
- Massachusetts General Hospital(US)
- University of Siena(IT)
- Azienda Ospedaliero-Universitaria Careggi(IT)
- The Ohio State University(US)
- Yale Cancer Center(US)
- Munich Leukemia Laboratory (Germany)(DE)
- Humanitas University(IT)
- IRCCS Humanitas Research Hospital(IT)
- Sidney Kimmel Cancer Center(US)