Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Comparing Physicians’ Assessments of Context-specific AI-powered clinical reasoning assistant with General-Purpose AI agent: A Prospective Multi-Site Physician Evaluation of VITA versus ChatGPT in India and Bangladesh
0
Zitationen
10
Autoren
2026
Jahr
Abstract
Abstract Background Healthcare providers in low- and middle-income countries (LMICs) are increasingly relying on Artificial Intelligence (AI) tools, yet most available AI assistants are general-purpose systems not designed for the specific clinical, epidemiological, and resource contexts of these settings. There is no evidence, from physicians’ assessments, on whether clinical reasoning support from purpose-built, context-specific and retrieval-augmented AI tools can outperform general-purpose AI agents. Methods We conducted a prospective multi-site validation study enrolling 37 physicians across India and Bangladesh. Each physician evaluated two AI tools (a) VITA (Validated Intelligence for Treatment and Assessment), a purpose-built (context-specific and retrieval-augmented) clinical reasoning AI assistant trained on India-specific guidelines, antimicrobial resistance patterns, and formulary constraints, and (b) ChatGPT Plus (version 5.2), a leading general-purpose AI assistant on six hypothetical clinical case vignettes (three predefined, three physician-selected). Evaluations were scored across six dimensions (differential diagnosis, clinical workup, treatment recommendation, dosing, clinical decision-making, and evidence quality) on a 1–5 Likert scale, yielding 444 observations. Analyses included paired t-tests, Wilcoxon signed-rank tests, and multivariate regressions with robust standard errors. Results VITA scored significantly higher than ChatGPT across all six evaluation dimensions. The mean composite score (sum of all dimensions, maximum = 30) was 25.4 for VITA versus 22.3 for ChatGPT (difference = +3.1 points, t = 8.31, p < 0.001). The largest advantage was in evidence quality (VITA: 4.46 vs. ChatGPT: 3.14, a 42% relative gap). VITA’s advantage was consistent across both predefined and doctor-defined hypothetical cases and was robust to controls for physician demographics, case type, and evaluation order in multivariate regression (coefficient = +3.08, p < 0.001). Conclusions In this first systematic head-to-head physician evaluation of a purpose-built clinical reasoning AI assistant versus general-purpose AI in an LMIC setting, physicians consistently rated the context-specific tool as superior. These findings suggest that contextual relevance—including local guidelines, formulary constraints, and resistance patterns—matters for clinical AI adoption and quality in resource-limited settings.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.652 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.567 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.083 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.856 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Autoren
Institutionen
- Cooper Hospital(IN)
- Institute of Mental Health and Hospital(IN)
- All India Institute of Medical Sciences Bhopal(IN)
- All India Institute of Medical Sciences, Deoghar(IN)
- Netaji Subhash Chandra Bose Medical College(IN)
- Pt. Jawahar Lal Nehru Memorial Medical College(IN)
- National Institute of Cardiovascular Diseases(BD)
- Gleneagles Hospital(SG)
- Mahatma Gandhi Memorial Medical College(IN)