Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Implementation of Human-in-the-Loop ChatGPT-based Patient Screening Across Multiple Diverse Clinical Trials

2026·0 Zitationen·medRxivOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Abstract Purpose Manual screening for trial eligibility is inefficient and costly. We prospectively evaluated a large language model (LLM)-assisted prescreening workflow across multiple active trials. Methods We deployed a retrieval-augmented generation LLM-based pipeline across multiple trials at an academic medical center. Structured electronic health record data and free-text notes were used by the LLM to classify each criterion as either met, likely met, likely not met, not met, uncertain , or no documentation found with accompanying rationale. Coordinators were given a sorted patient list based on the LLM-based eligibility to be reviewed and provide their assessment of each criterion, and final prescreening status (success vs failure). Criterion-level performance– accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score–was calculated and measured over time. Patient prescreening status was evaluated as a function of percent of individual AI criteria met (60-80% and ≥ 80 %). Results From 10/2024–9/2025, 39,182 patients were prescreened by our LLM-workflow across 26 (21 oncology and 5 non-oncology) studies with 112 distinct criteria; 914 patients with high likelihood for eligibility underwent coordinator review (total 5,096 criteria). Aggregated individual criterion-level performance: accuracy 0.94 (95% CI, 0.92–0.96), sensitivity 0.98 (0.97–0.99), specificity 0.81 (0.71–0.88), PPV 0.95 (0.92–0.97), NPV 0.93 (0.90–0.95), F1 0.97 (0.95–0.97). Twenty-seven criteria prompts across 14/26 trials were automatically updated based on coordinator feedback. Patients were more likely to be reviewed by coordinators (544/987, 55.1% vs 372/397, 93.7%) and more likely to be labeled as a prescreen success (104/544, 19.1% vs 162/372, 43.5%) when ≥ 80% AI-labeled criteria were met or likely met vs 60-80%. Cost averaged $0.12 per patient. Conclusion A LLM-assisted, human-in-the-loop prescreening workflow achieved high criterion-level performance at low cost across a variety of actively enrolling clinical trials. Structured coordinator feedback supported an automated learning workflow, enhancing screening operations while maintaining human oversight.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareDigital Mental Health Interventions

Volltext beim Verlag öffnen

Implementation of Human-in-the-Loop ChatGPT-based Patient Screening Across Multiple Diverse Clinical Trials

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen