OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 09.04.2026, 06:52

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparative evaluation of large language models in generating clinical insights for HIV associated oral Kaposi sarcoma

2026·0 Zitationen·Discover Artificial IntelligenceOpen Access
Volltext beim Verlag öffnen

0

Zitationen

10

Autoren

2026

Jahr

Abstract

Abstract Objectives The objective of this study was to quantitatively evaluate and compare the performance of three advanced generative AI models, ChatGPT (v4.0), Gemini (v2.0 advanced), and Meta AI (Llama 3.2) in providing accurate information of AIDS-associated Oral Kaposi Sarcoma (OKS). Methods This was a cross-sectional analytical study with three advanced large language models (LLM) against a gold standard (oral pathologists). A structured questionnaire was adapted from the WHO Oral Health Survey and modified WHO guidelines for the treatment of skin and oral HIV-associated lesions. Data collection was conducted over a 24-hour window utilizing the same protocol for all models. Prompting was introduced in a second round of testing using CARE (Context, Ask, Rule, Example) to examine whether engineered prompts improved response accuracy. Responses were collected on a 5-point Likert scale (strongly agree, agree, neutral, disagree and strongly disagree) and then collapsed into a binary scale where agreement between two or more pathologists served as the correct score. Descriptive statistics, including means and standard deviations, were used to summarize the results. Comparative analyses employed ANOVA to evaluate differences in accuracy scores across AI models and the gold standard. All statistical significance were set at p < 0.05. Results The results demonstrated that, before prompting, both ChatGPT and Gemini AI achieved an accurate score of 81.48%, while Meta AI lagged with a score of 66.67%. After prompting, Gemini AI exhibited the greatest improvement, achieving an accuracy of 85.18%. Meta AI also improved to 81.48%, while ChatGPT’s accuracy declined slightly to 77.78%. The pathologist achieved an accuracy score of 85.19%, highlighting that the best-performing AI (Gemini after prompting) approached expert-level accuracy. Analysis of variance (ANOVA) revealed no statistically significant differences in mean accuracy scores between AI models and the pathologist (F = 0.64, P = 0.6996). Conclusion AI models have the potential to provide comprehensive information on presentation, examination, and follow-up on HIV associated oral KS. Finally, models may struggle with more complex clinical aspects, such as investigations and treatment recommendations.

Ähnliche Arbeiten