Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing the Role of Large Language Learning Models and Artificial Intelligence in Improving Oncology Survivorship Care: A Comparative Study of ChatGPT and Gemini (Preprint)

2025·0 ZitationenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

<sec> <title>BACKGROUND</title> As the population of cancer survivors continues to grow, the demand for accurate, accessible, and guideline-consistent survivorship care information has increased. Survivorship care encompasses a range of complex domains, including physical activity, nutrition, mental health, fertility, and the management of long-term treatment effects. However, gaps in patient education and limited healthcare resources create challenges in delivering comprehensive care. Artificial intelligence (AI)-powered large language models (LLMs) have emerged as potential tools to bridge these gaps, yet their performance in survivorship-specific contexts has not been systematically evaluated. </sec> <sec> <title>OBJECTIVE</title> This study aimed to evaluate and compare the performance of three commonly used LLMs: ChatGPT-4, ChatGPT Strawberry, and Gemini in providing evidence-based responses across multiple survivorship care domains, with a focus on alignment with established clinical guidelines. </sec> <sec> <title>METHODS</title> A set of predefined questions on survivorship care was submitted to each LLM. Two independent evaluators assessed responses using five quality metrics: clarity, coherence, completeness, factual accuracy, and relevance. Inter-rater reliability was calculated using Cohen’s kappa. Differences in AI performance were examined using descriptive statistics, paired t-tests, and mixed-effects models to assess variations by model, topic domain, and question complexity. </sec> <sec> <title>RESULTS</title> Inter-rater agreement was high (κ = 0.83), with the highest agreement in coherence (κ = 0.98) and the lowest in factual accuracy (κ = 0.70). ChatGPT Strawberry consistently outperformed both ChatGPT-4 and Gemini across most domains, especially in exercise, nutrition, mental health, and hormone-related symptoms (p < 0.001). ChatGPT-4 performed comparably in fertility and sexual health but lagged in exercise and nutrition. Gemini demonstrated the lowest scores across all metrics. Notably, higher-complexity questions yielded stronger factual accuracy and completeness scores compared to medium-complexity items (p < 0.001). </sec> <sec> <title>CONCLUSIONS</title> LLMs show potential as tools for survivorship care education, though accuracy and completeness remain limitations. ChatGPT Strawberry demonstrated the most consistent and high-quality performance. Future work should focus on refining AI models to better handle complex medical queries, integrating real-world patient interactions, and ensuring equity in AI-driven healthcare solutions. AI models may supplement survivorship care by offering timely, guideline-consistent information. However, given current limitations, these tools should complement clinician guidance, with ongoing validation to ensure their safe integration into cancer care. </sec> <sec> <title>CLINICALTRIAL</title> None </sec>

Autoren

Themen

Artificial Intelligence in Healthcare and EducationRadiomics and Machine Learning in Medical ImagingMachine Learning in Healthcare

Volltext beim Verlag öffnen

Assessing the Role of Large Language Learning Models and Artificial Intelligence in Improving Oncology Survivorship Care: A Comparative Study of ChatGPT and Gemini (Preprint)

Abstract

Ähnliche Arbeiten

Autoren

Themen