OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 20.05.2026, 01:15

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Accuracy and consistency of publicly available Large Language Models as clinical decision support tools for the management of colon cancer

2024·20 Zitationen·Journal of Surgical OncologyOpen Access
Volltext beim Verlag öffnen

20

Zitationen

9

Autoren

2024

Jahr

Abstract

Abstract Background Large Language Models (LLM; e.g., ChatGPT) may be used to assist clinicians and form the basis of future clinical decision support (CDS) for colon cancer. The objectives of this study were to (1) evaluate the response accuracy of two LLM‐powered interfaces in identifying guideline‐based care in simulated clinical scenarios and (2) define response variation between and within LLMs. Methods Clinical scenarios with “next steps in management” queries were developed based on National Comprehensive Cancer Network guidelines. Prompts were entered into OpenAI ChatGPT and Microsoft Copilot in independent sessions, yielding four responses per scenario. Responses were compared to clinician‐developed responses and assessed for accuracy, consistency, and verbosity. Results Across 108 responses to 27 prompts, both platforms yielded completely correct responses to 36% of scenarios ( n = 39). For ChatGPT, 39% ( n = 21) were missing information and 24% ( n = 14) contained inaccurate/misleading information. Copilot performed similarly, with 37% ( n = 20) having missing information and 28% ( n = 15) containing inaccurate/misleading information ( p = 0.96). Clinician responses were significantly shorter (34 ± 15.5 words) than both ChatGPT (251 ± 86 words) and Copilot (271 ± 67 words; both p < 0.01). Conclusions Publicly available LLM applications often provide verbose responses with vague or inaccurate information regarding colon cancer management. Significant optimization is required before use in formal CDS.

Ähnliche Arbeiten