Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the accuracy of ChatGPT model versions for giving care-seeking advice

2026·0 Zitationen·Communications MedicineOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

BACKGROUND: Artificial Intelligence tools such as ChatGPT are increasingly used by laypeople to support their care-seeking decisions, although the accuracy of newer models remains unclear. We aimed to evaluate the accuracy of care-seeking advice that is generated by all currently available ChatGPT models. METHODS: We evaluated 22 ChatGPT models using 45 validated vignettes, each prompted ten times (9,900 total assessments). Each model classified the vignettes as requiring emergency care, non-emergency care, or self-care. We evaluated accuracy against each case's gold standard solution (determined by two physicians), examined the variability across trials, and tested algorithms to aggregate multiple recommendations to improve accuracy. RESULTS: We show that o1-mini achieves the highest accuracy (74%), but we cannot observe an overall improvement with newer models - although reasoning models (e.g., o4-mini) improved their accuracy in identifying self-care cases. Selecting the lowest urgency level across multiple trials improves accuracy by 4 percentage points. CONCLUSIONS: Although newer increasingly provide self-care advice, their accuracy remains insufficient for standalone use. However, making use of output variability with aggregation algorithms can improve the performance of existing models.

Autoren

Institutionen

Technische Universität Berlin(DE)

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsDigital Mental Health Interventions

Volltext beim Verlag öffnen

Evaluating the accuracy of ChatGPT model versions for giving care-seeking advice

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen