Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing the medical reasoning skills of GPT-4 in complex ophthalmology cases
50
Zitationen
13
Autoren
2024
Jahr
Abstract
BACKGROUND/AIMS: This study assesses the proficiency of Generative Pre-trained Transformer (GPT)-4 in answering questions about complex clinical ophthalmology cases. METHODS: Ophthalmology Clinical Challenges, and prompted the model to determine the diagnosis (open-ended question) and identify the next-step (multiple-choice question). We generated responses using two zero-shot prompting strategies, including zero-shot plan-and-solve+ (PS+), to improve the reasoning of the model. We compared the best-performing model to human graders in a benchmarking effort. RESULTS: Using PS+ prompting, GPT-4 achieved mean accuracies of 48.0% (95% CI (43.1% to 52.9%)) and 63.0% (95% CI (58.2% to 67.6%)) in diagnosis and next step, respectively. Next-step accuracy did not significantly differ by subspecialty (p=0.44). However, diagnostic accuracy in pathology and tumours was significantly higher than in uveitis (p=0.027). When the diagnosis was accurate, 75.2% (95% CI (68.6% to 80.9%)) of the next steps were correct. Conversely, when the diagnosis was incorrect, 50.2% (95% CI (43.8% to 56.6%)) of the next steps were accurate. The next step was three times more likely to be accurate when the initial diagnosis was correct (p<0.001). No significant differences were observed in diagnostic accuracy and decision-making between board-certified ophthalmologists and GPT-4. Among trainees, senior residents outperformed GPT-4 in diagnostic accuracy (p≤0.001 and 0.049) and in accuracy of next step (p=0.002 and 0.020). CONCLUSION: Improved prompting enhances GPT-4's performance in complex clinical situations, although it does not surpass ophthalmology trainees in our context. Specialised large language models hold promise for future assistance in medical decision-making and diagnosis.
Ähnliche Arbeiten
The Strengths and Difficulties Questionnaire: A Research Note
1997 · 14.711 Zit.
Making sense of Cronbach's alpha
2011 · 14.115 Zit.
QUADAS-2: A Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies
2011 · 13.830 Zit.
A method for estimating the probability of adverse drug reactions
1981 · 11.551 Zit.
Clarifying Confusion: The Confusion Assessment Method
1990 · 5.253 Zit.
Autoren
Institutionen
- Hôpital Maisonneuve-Rosemont(CA)
- Université de Montréal(CA)
- Centre Hospitalier de l’Université de Montréal(CA)
- University College London(GB)
- University of Waterloo(CA)
- McGill University(CA)
- University of Toronto(CA)
- Institut Universitaire en Santé Mentale de Québec(CA)
- Institut universitaire en santé mentale de Montréal(CA)
- Hôpital du Sacré-Cœur de Montréal(CA)
- Renault (France)(FR)