Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Diagnostic Performance of ChatGPT‐4o and DeepSeek‐3 Differential Diagnosis of Complex Oral Lesions: A Multimodal Imaging and Case Difficulty Analysis
22
Zitationen
7
Autoren
2025
Jahr
Abstract
ABSTRACT Background AI models like ChatGPT‐4o and DeepSeek‐3 show diagnostic promise, but their reliability in complex, image‐based oral lesions remains unclear. This study aimed to evaluate and compare the diagnostic accuracy of ChatGPT‐4o and DeepSeek‐3 despite their differing modalities against oral medicine (OM) experts across varied lesion types and case difficulty levels. Methods Eighty standardized clinical vignettes derived from real‐world oral disease cases, including clinical images/radiographs, were evaluated. Differential diagnoses were generated by ChatGPT‐4o, DeepSeek‐3, and four board‐certified OM specialists, with accuracy assessed at Top‐1, Top‐3, and Top‐5 levels. Results OM specialists consistently achieved the highest diagnostic accuracy. However, DeepSeek‐3 significantly outperformed ChatGPT‐4o at the Top‐3 level ( p = 0.0153) and showed greater robustness in high‐difficulty and inflammatory cases despite its text‐only modality. Multimodal imaging enhanced diagnostic accuracy. Regression analysis indicated lesion type and imaging modality as positive predictors, while diagnostic difficulty negatively impacted Top‐1 performance. Conclusions Remarkably, the text‐only DeepSeek‐3 model exceeded the diagnostic performance of the multimodal ChatGPT‐4o model for complex oral lesions, highlighting its structured reasoning capabilities and reduced hallucination rate. These findings underscore the potential of non‐vision LLMs in diagnostic support, emphasizing the critical need for expert oversight in complex scenarios.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.707 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.613 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.159 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.875 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.