Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the performance of generative AI in assisting the differential diagnosis of weight loss
0
Zitationen
4
Autoren
2026
Jahr
Abstract
OBJECTIVES: To systematically evaluate the performance of generative artificial intelligence (GenAI) models, DeepSeek-V3 and the Qwen3 series, in the differential diagnosis of weight loss. METHODS: between January 1, 2012 and June 2, 2025, containing the term "weight loss" in the title or abstract. Two senior general practitioners independently reviewed each case to determine whether it met predefined diagnostic criteria for weight loss (emaciation). Cases that did not meet these criteria, had incomplete information, or involved clearly defined specialty-specific diagnoses and treatments were excluded. The remaining cases were then compiled into standardized clinical case summaries. These summaries were presented to DeepSeek-V3 and the Qwen3 series models (Qwen3-235B-A22B, Qwen3-30B-A3B, and Qwen3-32B) to generate ranked lists of the top 10 differential diagnoses. The models were not specifically fine-tuned for this task. Sensitivity, precision, and F1-score were used to evaluate performance. Intergroup comparisons were performed using McNemar's test and Cochran's Q test. RESULTS: >0.05). CONCLUSIONS: Domestic GenAI models exhibit a characteristic of "breadth over precision" in the differential diagnosis of weight loss, with DeepSeek-V3 performing better at key diagnostic levels. Although the sensitivity and precision for the top-ranked diagnosis require improvement, these models have the potential to serve as effective clinical decision support tools, broadening the diagnostic perspectives of general practitioners.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.773 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.682 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.242 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.