Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Evaluating the Evolution of ChatGPT as an Information Resource in Shoulder and Elbow Surgery
5
Zitationen
6
Autoren
2025
Jahr
Abstract
Background The purpose of this study was to evaluate the performance and evolution of Chat Generative Pre-Trained Transformer (ChatGPT; OpenAI) as a resource for shoulder and elbow surgery information by assessing its accuracy on the American Academy of Orthopaedic Surgeons shoulder-elbow self-assessment questions. We hypothesized that both ChatGPT models would demonstrate proficiency and that there would be significant improvement with progressive iterations. Materials and Methods A total of 200 questions were selected from the 2019 and 2021 American Academy of Orthopaedic Surgeons shoulder-elbow self-assessment questions. ChatGPT 3.5 and 4 were used to evaluate all questions. Questions with non-text data were excluded (114 questions). Remaining questions were input into ChatGPT and categorized as follows: anatomy, arthroplasty, basic science, instability, miscellaneous, nonoperative, and trauma. ChatGPT's performances were quantified and compared across categories with chi-square tests. The continuing medical education credit threshold of 50% was used to determine proficiency. Statistical significance was set at P <.05. Results ChatGPT 3.5 and 4 answered 52.3% and 73.3% of the questions correctly, respectively ( P =.003). ChatGPT 3.5 performed significantly better in the instability category ( P =.037). ChatGPT 4's performance did not significantly differ across categories ( P =.841). ChatGPT 4 performed significantly better than ChatGPT 3.5 in all categories except instability and miscellaneous. Conclusion ChatGPT 3.5 and 4 exceeded the proficiency threshold. ChatGPT 4 performed better than ChatGPT 3.5, showing an increased capability to correctly answer shoulder and elbow-focused questions. Further refinement of ChatGPT's training may improve its performance and utility as a resource. Currently, ChatGPT remains unable to answer questions at a high enough accuracy to replace clinical decision-making. [ Orthopedics . 2025;48(2):e69–e74.]
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.707 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.613 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.159 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.875 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.