Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessment of ChatGPT-4 in Family Medicine Board Examinations: An Observational Study Using Advanced AI Learning and Analytical Methods (Preprint)
1
Zitationen
9
Autoren
2024
Jahr
Abstract
<sec> <title>BACKGROUND</title> This research explores the capabilities of ChatGPT-4 in passing the American Board of Family Medicine (ABFM) Certification Examination. Addressing a gap in existing literature, where earlier Artificial Intelligence (AI) models showed limitations in medical board exams, this study evaluates the enhanced features and potential of ChatGPT-4, especially in document analysis and information synthesis. </sec> <sec> <title>OBJECTIVE</title> The primary goal is to assess whether ChatGPT-4, when provided with extensive preparation resources and using sophisticated data analysis, can achieve a score equal to or above the passing threshold for the Family Medicine Board Examinations. </sec> <sec> <title>METHODS</title> In this study, ChatGPT-4 was embedded in a specialized subenvironment, "AI Family Medicine Board Exam Taker," designed to closely mimic the conditions of the ABFM Certification Examination. This subenvironment enabled the AI to access and analyze a range of relevant study materials, including a primary medical textbook and supplementary online resources. The AI was presented with a series of past ABFM exam questions, reflecting the breadth and complexity typical of the exam. Emphasis was placed on assessing the AI's ability to interpret and respond to these questions accurately, leveraging its advanced data processing and analysis capabilities within this controlled subenvironment. </sec> <sec> <title>RESULTS</title> In our study, ChatGPT-4's performance was quantitatively assessed on 300 practice ABFM exam questions. The AI achieved a correct response rate of 88.67% (95% CI: 85.08% to 92.25%) for the Custom Robot version and 87.33% (95% CI: 83.57% to 91.10%) for the Regular version. Statistical analysis, including the McNemar test (P-Value: 0.4533), indicated no significant difference in accuracy between the two versions. Additionally, the Chi-square test for error type distribution (P-Value: 0.3163) revealed no significant variation in the pattern of errors across versions. These results highlight ChatGPT-4's capacity for high-level performance and consistency in responding to complex medical examination questions under controlled conditions. </sec> <sec> <title>CONCLUSIONS</title> The study demonstrates that ChatGPT-4, particularly when equipped with specialized preparation and operating in a tailored subenvironment, shows promising potential in handling the intricacies of medical board examinations. While its performance is comparable to the expected standards for passing the ABFM Certification Examination, further enhancements in AI technology and tailored training methods could push these capabilities to new heights. This exploration opens avenues for integrating AI tools like ChatGPT-4 in medical education and assessment, emphasizing the importance of continuous advancement and specialized training in AI applications in healthcare. </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.456 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.332 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.779 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.533 Zit.