Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing the Role of Large Language Learning Models and Artificial Intelligence in Improving Oncology Survivorship Care: A Comparative Study of ChatGPT and Gemini (Preprint)
0
Zitationen
6
Autoren
2025
Jahr
Abstract
<sec> <title>BACKGROUND</title> As the population of cancer survivors continues to grow, the demand for accurate, accessible, and guideline-consistent survivorship care information has increased. Survivorship care encompasses a range of complex domains, including physical activity, nutrition, mental health, fertility, and the management of long-term treatment effects. However, gaps in patient education and limited healthcare resources create challenges in delivering comprehensive care. Artificial intelligence (AI)-powered large language models (LLMs) have emerged as potential tools to bridge these gaps, yet their performance in survivorship-specific contexts has not been systematically evaluated. </sec> <sec> <title>OBJECTIVE</title> This study aimed to evaluate and compare the performance of three commonly used LLMs: ChatGPT-4, ChatGPT Strawberry, and Gemini in providing evidence-based responses across multiple survivorship care domains, with a focus on alignment with established clinical guidelines. </sec> <sec> <title>METHODS</title> A set of predefined questions on survivorship care was submitted to each LLM. Two independent evaluators assessed responses using five quality metrics: clarity, coherence, completeness, factual accuracy, and relevance. Inter-rater reliability was calculated using Cohen’s kappa. Differences in AI performance were examined using descriptive statistics, paired t-tests, and mixed-effects models to assess variations by model, topic domain, and question complexity. </sec> <sec> <title>RESULTS</title> Inter-rater agreement was high (κ = 0.83), with the highest agreement in coherence (κ = 0.98) and the lowest in factual accuracy (κ = 0.70). ChatGPT Strawberry consistently outperformed both ChatGPT-4 and Gemini across most domains, especially in exercise, nutrition, mental health, and hormone-related symptoms (p < 0.001). ChatGPT-4 performed comparably in fertility and sexual health but lagged in exercise and nutrition. Gemini demonstrated the lowest scores across all metrics. Notably, higher-complexity questions yielded stronger factual accuracy and completeness scores compared to medium-complexity items (p < 0.001). </sec> <sec> <title>CONCLUSIONS</title> LLMs show potential as tools for survivorship care education, though accuracy and completeness remain limitations. ChatGPT Strawberry demonstrated the most consistent and high-quality performance. Future work should focus on refining AI models to better handle complex medical queries, integrating real-world patient interactions, and ensuring equity in AI-driven healthcare solutions. AI models may supplement survivorship care by offering timely, guideline-consistent information. However, given current limitations, these tools should complement clinician guidance, with ongoing validation to ensure their safe integration into cancer care. </sec> <sec> <title>CLINICALTRIAL</title> None </sec>
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.626 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.532 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.046 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.843 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.