OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 13.05.2026, 01:27

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

ChatGPT Achieves Only Fair Agreement with ACFAS Expert Panelist Clinical Consensus Statements

2025·0 Zitationen·Foot & Ankle Specialist
Volltext beim Verlag öffnen

0

Zitationen

2

Autoren

2025

Jahr

Abstract

INTRODUCTION: As artificial intelligence (AI) becomes increasingly integrated into medicine and surgery, its applications are expanding rapidly-from aiding clinical documentation to providing patient information. However, its role in medical decision-making remains uncertain. This study evaluates an AI language model's alignment with clinical consensus statements in foot and ankle surgery. METHODS: Clinical consensus statements from the American College of Foot and Ankle Surgeons (ACFAS; 2015-2022) were collected and rated by ChatGPT-o1 as being inappropriate, neither appropriate nor inappropriate, and appropriate. Ten repetitions of the statements were entered into ChatGPT-o1 in a random order, and the model was prompted to assign a corresponding rating. The AI-generated scores were compared to the expert panel's ratings, and intra-rater analysis was performed. RESULTS: The analysis of 9 clinical consensus documents and 129 statements revealed an overall Cohen's kappa of 0.29 (95% CI: 0.12, 0.46), indicating fair alignment between expert panelists and ChatGPT. Overall, ankle arthritis and heel pain showed the highest concordance at 100%, while flatfoot exhibited the lowest agreement at 25%, reflecting variability between ChatGPT and expert panelists. Among the ChatGPT ratings, Cohen's kappa values ranged from 0.41 to 0.92, highlighting variability in internal reliability across topics. CONCLUSION: ChatGPT achieved overall fair agreement and demonstrated variable consistency when repetitively rating ACFAS expert panel clinical practice guidelines representing a variety of topics. These data reflect the need for further study of the causes, impacts, and solutions for this disparity between intelligence and human intelligence. LEVEL OF EVIDENCE: Level IV: Retrospective cohort study.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationClinical Reasoning and Diagnostic SkillsRadiomics and Machine Learning in Medical Imaging
Volltext beim Verlag öffnen