OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 21.05.2026, 09:21

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A comparative analysis of large language models on clinical questions for autoimmune diseases

2025·2 Zitationen·Frontiers in Digital HealthOpen Access
Volltext beim Verlag öffnen

2

Zitationen

11

Autoren

2025

Jahr

Abstract

Background Artificial intelligence (AI) has made great strides. To explore the potential of Large Language Models (LLMs) in providing medical services to patients and assisting physicians in clinical practice, our study evaluated the performance in delivering clinical questions related to autoimmune diseases. Methods 46 questions related to autoimmune diseases were input into ChatGPT 3.5, ChatGPT 4.0, and Gemini. The responses were then evaluated by rheumatologists based on five quality dimensions: relevance, correctness, completeness, helpfulness, and safety. Simultaneously, the responses were assessed by laboratory specialists across six medical fields: concept, clinical features, report interpretation, diagnosis, prevention and treatment, and prognosis. Finally, statistical analysis and comparisons were performed on the performance of the three chatbots in the five quality dimensions and six medical fields. Results ChatGPT 4.0 outperformed both ChatGPT 3.5 and Gemini across all five quality dimensions, with an average score of 199.8 ± 10.4, significantly higher than ChatGPT 3.5 (175.7 ± 16.6) and Gemini (179.1 ± 11.8) ( p = 0.009 and p = 0.001, respectively). The average performance differences between ChatGPT 3.5 and Gemini across these five dimensions were not statistically significant. Specifically, ChatGPT 4.0 demonstrated superior performance in relevance ( p < 0.0001, p < 0.0001), completeness ( p < 0.0001, p = 0.0006), correctness ( p = 0.0001, p = 0.0002), helpfulness ( p < 0.0001, p < 0.0001), and safety ( p < 0.0001, p = 0.0025) compared to both ChatGPT 3.5 and Gemini. Furthermore, ChatGPT 4.0 scored significantly higher than both ChatGPT 3.5 and Gemini in medical fields such as report interpretation ( p < 0.0001, p = 0.0025), prevention and treatment ( p < 0.0001, p = 0.0103), prognosis ( p = 0.0458, p = 0.0458). Conclusions This study demonstrates that ChatGPT 4.0 significantly outperforms ChatGPT 3.5 and Gemini in addressing clinical questions related to autoimmune diseases, showing notable advantages across all five quality dimensions and six clinical domains. These findings further highlight the potential of large language models in enhancing healthcare services.

Ähnliche Arbeiten