Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Accuracy of a Commercial Large Language Model (ChatGPT) to Predict the Diagnosis for Pre-Hospital Patients Suitable for Ambulance Dispatch: Diagnostic Accuracy Study
0
Zitationen
6
Autoren
2026
Jahr
Abstract
Introduction: While ambulance dispatch guided by artificial intelligence (AI) could be useful, little is known about the accuracy of AI in making patient diagnoses based on the pre-hospital patient care report (PCR). The primary objective of this study was to assess the accuracy of ChatGPT (OpenAI, Inc., San Francisco, CA, USA) to predict a patient’s diagnosis using the PCR by comparing to a reference standard assigned by experienced paramedics. The secondary objective was to classify cases where the AI diagnosis did not agree with the reference standard as paramedic correct, ChatGPT correct, or equally correct. Methods: In this diagnostic accuracy study, a convenience sample of PCRs from paramedic students was analyzed by ChatGPT-4 to determine the most likely diagnosis. A reference standard was provided by an experienced paramedic, reviewing each PCR and giving a differential diagnosis of three items. A trained pre-hospital professional assessed the ChatGPT diagnosis as concordant or non-concordant with one of the three paramedic diagnoses. If non-concordant, two board-certified emergency physicians independently decided if the ChatGPT or the paramedic diagnosis was more likely to be correct. Results: ChatGPT-4 triaged 78/104 (75.0%) PCRs correctly (95% confidence interval 65.3% to 82.7%). Among the 26 cases of disagreement, judgment by the emergency physicians was that in 6/26 (23.0%) the paramedic diagnosis was more likely to be correct. There was only one case of the 104 (0.96%) where dispatch decisions based on the AI-guided diagnosis would have been potentially dangerous to the patient (under-triage). Conclusion: In this study, the overall accuracy of ChatGPT to diagnose patients based on their emergency medical services PCR was 75.0%. In cases where the ChatGPT diagnosis was considered less likely than the paramedic diagnosis, most commonly, the AI diagnosis was more critical than the paramedic diagnosis, potentially leading to over-triage. The Under-triage rate was low at less than 1%.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.439 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.315 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.756 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.526 Zit.