Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative performance of Chinese and international large language models on the Chinese radiology attending physician qualification examination
1
Zitationen
8
Autoren
2025
Jahr
Abstract
This study evaluates the accuracy and reliability of six large language models (LLMs)-three Chinese (Doubao, Kimi, DeepSeek) and three international (ChatGPT-4o, Gemini 2.0 Pro, Grok3)-in radiology, using simulated questions from the 2025 Chinese Radiology Attending Physician Qualification Examination (CRAPQE). Analysis covered 400 CRAPQE-simulated questions, spanning various formats (A1, A2-A4, B, C-type) and modalities (text-only, image-based). Expert radiologists scored responses against official answer keys. Performance comparisons within and between Chinese and international LLM groups assessed overall, unit-specific, question-type-specific, and modality-specific accuracy. All LLMs passed the CRAPQE simulation, showing proficiency comparable to a radiology attending. Chinese LLMs achieved a higher mean accuracy (87.2%) than international LLMs (80.4%, P < 0.05), excelling in text-only and A1-type questions (P < 0.05). DeepSeek (91.6%) and Doubao (89.5%) outperformed Kimi (80.5%, P < 0.0167), while international LLMs showed no significant differences (P > 0.05). All models surpassed the passing threshold on image-based questions but performed worse than on text questions, with no group difference (P > 0.05). This pioneering comparison highlights the potential of LLMs in radiology, with Chinese models outperforming their international counterparts, likely due to localized training, providing evidence to guide the development of medical AI.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.418 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.288 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.726 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.516 Zit.