Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative evaluation of ChatGPT-4o and DeepSeek-V3 in head and neck oncology
2
Zitationen
6
Autoren
2025
Jahr
Abstract
BACKGROUND: Large language models (LLMs) are increasingly used in clinical decision-making and patient education, including in complex specialties such as head and neck cancer (HNC). OBJECTIVE: To evaluate the performance of ChatGPT-4o and DeepSeek-V3 in answering HNC-related clinical questions. METHODS: A set of 154 questions across six clinical categories was submitted twice to both models. Responses were independently graded by head and neck surgeons using a four-point accuracy scale. Accuracy, reproducibility, and inter-model agreement were assessed. RESULTS: = .08); however, these differences did not reach statistical significance. Reproducibility was high for both models (ChatGPT-4o: 96.1%; DeepSeek-V3: 96.8%). CONCLUSIONS: Both models demonstrated strong accuracy and consistency in HNC-related queries. SIGNIFICANCE: LLMs hold promise as reliable tools in clinical decision-making and patient education within HNCs when used with careful consideration of their inherent limitations.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.646 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.554 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.071 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.851 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.