Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Implementation of Human-in-the-Loop ChatGPT-based Patient Screening Across Multiple Diverse Clinical Trials
0
Zitationen
8
Autoren
2026
Jahr
Abstract
Abstract Purpose Manual screening for trial eligibility is inefficient and costly. We prospectively evaluated a large language model (LLM)-assisted prescreening workflow across multiple active trials. Methods We deployed a retrieval-augmented generation LLM-based pipeline across multiple trials at an academic medical center. Structured electronic health record data and free-text notes were used by the LLM to classify each criterion as either met, likely met, likely not met, not met, uncertain , or no documentation found with accompanying rationale. Coordinators were given a sorted patient list based on the LLM-based eligibility to be reviewed and provide their assessment of each criterion, and final prescreening status (success vs failure). Criterion-level performance– accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score–was calculated and measured over time. Patient prescreening status was evaluated as a function of percent of individual AI criteria met (60-80% and ≥ 80 %). Results From 10/2024–9/2025, 39,182 patients were prescreened by our LLM-workflow across 26 (21 oncology and 5 non-oncology) studies with 112 distinct criteria; 914 patients with high likelihood for eligibility underwent coordinator review (total 5,096 criteria). Aggregated individual criterion-level performance: accuracy 0.94 (95% CI, 0.92–0.96), sensitivity 0.98 (0.97–0.99), specificity 0.81 (0.71–0.88), PPV 0.95 (0.92–0.97), NPV 0.93 (0.90–0.95), F1 0.97 (0.95–0.97). Twenty-seven criteria prompts across 14/26 trials were automatically updated based on coordinator feedback. Patients were more likely to be reviewed by coordinators (544/987, 55.1% vs 372/397, 93.7%) and more likely to be labeled as a prescreen success (104/544, 19.1% vs 162/372, 43.5%) when ≥ 80% AI-labeled criteria were met or likely met vs 60-80%. Cost averaged $0.12 per patient. Conclusion A LLM-assisted, human-in-the-loop prescreening workflow achieved high criterion-level performance at low cost across a variety of actively enrolling clinical trials. Structured coordinator feedback supported an automated learning workflow, enhancing screening operations while maintaining human oversight.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.402 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.270 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.702 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.507 Zit.