OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.05.2026, 20:52

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Comparative Study of Five Large Language Models’ Response for Liver Cancer Comprehensive Treatment

2025·0 Zitationen·Journal of Hepatocellular CarcinomaOpen Access
Volltext beim Verlag öffnen

0

Zitationen

10

Autoren

2025

Jahr

Abstract

Introduction: Large language models (LLMs) are increasingly used in healthcare, yet their reliability in specialized clinical fields remains uncertain. Liver cancer, as a complex and high-burden disease, poses unique challenges for AI-based tools. This study aimed to evaluate the comprehensibility and clinical applicability of five mainstream LLMs in addressing liver cancer-related clinical questions. Methods: We developed 90 standardized questions covering multiple aspects of liver cancer management. Five LLMs-GPT-4, Gemini, Copilot, Kimi, and Ernie Bot-were evaluated in a blinded fashion by three independent hepatobiliary experts. Responses were scored using predefined criteria for comprehensibility and clinical applicability. Overall group comparisons were conducted using the Fisher-Freeman-Halton test (for categorical data) and the Kruskal-Wallis test (for ordinal scores), followed by Dunn's post-hoc test or Fisher's exact test with Bonferroni correction. Inter-rater reliability was assessed using Fleiss' kappa. Results: Kimi and GPT-4 achieved the highest proportions of fully applicable responses (68% and 62%, respectively), while Ernie Bot and Copilot showed the lowest. Comprehensibility was generally high, with Kimi and Ernie Bot scoring over 98%. However, none of the LLMs consistently provided guideline-concordant answers to all questions. Performance on professional-level questions was significantly lower than on common-sense ones, highlighting deficiencies in complex clinical reasoning. Conclusion: LLMs demonstrate varied performance in liver cancer-related queries. While GPT-4 and Kimi show promise in clinical applicability, limitations in accuracy and consistency-particularly for complex medical decisions-underscore the need for domain-specific optimization before clinical integration. Trial Registration: Not applicable.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingAI in cancer detection
Volltext beim Verlag öffnen