Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the accuracy of ChatGPT model versions for giving care-seeking advice
0
Zitationen
3
Autoren
2026
Jahr
Abstract
BACKGROUND: Artificial Intelligence tools such as ChatGPT are increasingly used by laypeople to support their care-seeking decisions, although the accuracy of newer models remains unclear. We aimed to evaluate the accuracy of care-seeking advice that is generated by all currently available ChatGPT models. METHODS: We evaluated 22 ChatGPT models using 45 validated vignettes, each prompted ten times (9,900 total assessments). Each model classified the vignettes as requiring emergency care, non-emergency care, or self-care. We evaluated accuracy against each case's gold standard solution (determined by two physicians), examined the variability across trials, and tested algorithms to aggregate multiple recommendations to improve accuracy. RESULTS: We show that o1-mini achieves the highest accuracy (74%), but we cannot observe an overall improvement with newer models - although reasoning models (e.g., o4-mini) improved their accuracy in identifying self-care cases. Selecting the lowest urgency level across multiple trials improves accuracy by 4 percentage points. CONCLUSIONS: Although newer increasingly provide self-care advice, their accuracy remains insufficient for standalone use. However, making use of output variability with aggregation algorithms can improve the performance of existing models.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.652 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.567 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.083 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.856 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.