Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Comparing Physicians’ Assessments of Context-specific AI-powered clinical reasoning assistant with General-Purpose AI agent: A Prospective Multi-Site Physician Evaluation of VITA versus ChatGPT in India and Bangladesh

2026·0 Zitationen·medRxiv

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Background Healthcare providers in low- and middle-income countries (LMICs) are increasingly relying on Artificial Intelligence (AI) tools, yet most available AI assistants are general-purpose systems not designed for the specific clinical, epidemiological, and resource contexts of these settings. There is no evidence, from physicians’ assessments, on whether clinical reasoning support from purpose-built, context-specific and retrieval-augmented AI tools can outperform general-purpose AI agents. Methods We conducted a prospective multi-site validation study enrolling 37 physicians across India and Bangladesh. Each physician evaluated two AI tools (a) VITA (Validated Intelligence for Treatment and Assessment), a purpose-built (context-specific and retrieval-augmented) clinical reasoning AI assistant trained on India-specific guidelines, antimicrobial resistance patterns, and formulary constraints, and (b) ChatGPT Plus (version 5.2), a leading general-purpose AI assistant on six hypothetical clinical case vignettes (three predefined, three physician-selected). Evaluations were scored across six dimensions (differential diagnosis, clinical workup, treatment recommendation, dosing, clinical decision-making, and evidence quality) on a 1–5 Likert scale, yielding 444 observations. Analyses included paired t-tests, Wilcoxon signed-rank tests, and multivariate regressions with robust standard errors. Results VITA scored significantly higher than ChatGPT across all six evaluation dimensions. The mean composite score (sum of all dimensions, maximum = 30) was 25.4 for VITA versus 22.3 for ChatGPT (difference = +3.1 points, t = 8.31, p < 0.001). The largest advantage was in evidence quality (VITA: 4.46 vs. ChatGPT: 3.14, a 42% relative gap). VITA’s advantage was consistent across both predefined and doctor-defined hypothetical cases and was robust to controls for physician demographics, case type, and evaluation order in multivariate regression (coefficient = +3.08, p < 0.001). Conclusions In this first systematic head-to-head physician evaluation of a purpose-built clinical reasoning AI assistant versus general-purpose AI in an LMIC setting, physicians consistently rated the context-specific tool as superior. These findings suggest that contextual relevance—including local guidelines, formulary constraints, and resistance patterns—matters for clinical AI adoption and quality in resource-limited settings.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationElectronic Health Records SystemsMobile Health and mHealth Applications

Volltext beim Verlag öffnen

Comparing Physicians’ Assessments of Context-specific AI-powered clinical reasoning assistant with General-Purpose AI agent: A Prospective Multi-Site Physician Evaluation of VITA versus ChatGPT in India and Bangladesh

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen