Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Evaluating the progression of artificial intelligence and large language models in medicine through comparative analysis of ChatGPT-3.5 and ChatGPT-4 in generating vascular surgery recommendations

2023·12 Zitationen·JVS-Vascular InsightsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2023

Jahr

Abstract

ObjectiveArtificial intelligence (AI) continues to become increasingly integrated with clinical medicine. Generative AI, and particularly Large Language Models (LLMs) like ChatGPT-3.5 and ChatGPT-4, have shown promise in generating human-like text, providing a potential tool for augmenting clinical care. These online AI chatbots have already demonstrated remarkable clinical potential, having passed the USMLE, for example. The evaluation of these LLMs in the surgical literature, especially as it applies to judgement and decision-making, is sparse. This study aimed to 1) evaluate the efficacy of ChatGPT-4 in providing clinician-level vascular surgery recommendations and 2) compare its performance with its predecessor, ChatGPT-3.5, to gauge the progression of clinical competencies of LLMs.MethodsA set of forty clinician-level questions spanning four domains of vascular surgery (carotid artery disease, visceral artery aneurysms, abdominal aortic aneurysms, chronic limb-threatening ischemia) were generated by clinical experts. These domains were chosen based on the availability of updated guidelines published before September 2021, which served as the cut-off date for the training dataset of the LLMs. The questions, devoid of additional context or prompts, were inputted into ChatGPT-3.5 and ChatGPT-4 between March 20 and March 25, 2023. Responses were independently evaluated by two blinded reviewers using a 5-point Likert scale assessing comprehensiveness, accuracy, and consistency with guidelines. The Flesch-Kincaid Grade Level of each response was also determined. Independent samples t-test and Fisher’s exact test were employed for comparative analysis.ResultsChatGPT-4 significantly outperformed ChatGPT-3.5 by providing appropriate recommendations in 38 out of 40 questions (95%) as compared to 13 out of 40 (32.5%) by ChatGPT-3.5 (Fisher's Exact Test p < 0.001). Despite longer response lengths (chatGPT-4 mean 317 ± 58 words vs. chatGPT-3.5 mean 265 ± 74 words (p < 0.001), the reading ease of both models remained similar, corresponding to college-level graduate texts.ConclusionChatGPT-4 can consistently respond accurately to complex clinician-level vascular surgery questions. This also represents a substantial advancement in performance compared to its predecessor, which was released only a few months prior, highlighting the progress of performance of LLMs in clinical medicine. Several limitations persist with the use of LLMs, including hallucinations, data privacy issues, and the black box problem, However, these findings suggest that with further refinements, LLMs like ChatGPT-4 have the potential to become indispensable tools in clinical decision-making, thereby marking an exciting frontier in the fusion of AI with clinical medicine and vascular surgery.

Autoren

Institutionen

Themen

Cardiac, Anesthesia and Surgical OutcomesArtificial Intelligence in Healthcare and EducationAortic aneurysm repair treatments

Volltext beim Verlag öffnen

Evaluating the progression of artificial intelligence and large language models in medicine through comparative analysis of ChatGPT-3.5 and ChatGPT-4 in generating vascular surgery recommendations

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen