Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparative evaluation of large language models in generating clinical insights for HIV associated oral Kaposi sarcoma
0
Zitationen
10
Autoren
2026
Jahr
Abstract
Abstract Objectives The objective of this study was to quantitatively evaluate and compare the performance of three advanced generative AI models, ChatGPT (v4.0), Gemini (v2.0 advanced), and Meta AI (Llama 3.2) in providing accurate information of AIDS-associated Oral Kaposi Sarcoma (OKS). Methods This was a cross-sectional analytical study with three advanced large language models (LLM) against a gold standard (oral pathologists). A structured questionnaire was adapted from the WHO Oral Health Survey and modified WHO guidelines for the treatment of skin and oral HIV-associated lesions. Data collection was conducted over a 24-hour window utilizing the same protocol for all models. Prompting was introduced in a second round of testing using CARE (Context, Ask, Rule, Example) to examine whether engineered prompts improved response accuracy. Responses were collected on a 5-point Likert scale (strongly agree, agree, neutral, disagree and strongly disagree) and then collapsed into a binary scale where agreement between two or more pathologists served as the correct score. Descriptive statistics, including means and standard deviations, were used to summarize the results. Comparative analyses employed ANOVA to evaluate differences in accuracy scores across AI models and the gold standard. All statistical significance were set at p < 0.05. Results The results demonstrated that, before prompting, both ChatGPT and Gemini AI achieved an accurate score of 81.48%, while Meta AI lagged with a score of 66.67%. After prompting, Gemini AI exhibited the greatest improvement, achieving an accuracy of 85.18%. Meta AI also improved to 81.48%, while ChatGPT’s accuracy declined slightly to 77.78%. The pathologist achieved an accuracy score of 85.19%, highlighting that the best-performing AI (Gemini after prompting) approached expert-level accuracy. Analysis of variance (ANOVA) revealed no statistically significant differences in mean accuracy scores between AI models and the pathologist (F = 0.64, P = 0.6996). Conclusion AI models have the potential to provide comprehensive information on presentation, examination, and follow-up on HIV associated oral KS. Finally, models may struggle with more complex clinical aspects, such as investigations and treatment recommendations.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.402 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.270 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.702 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.507 Zit.