Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Performance of Large Language Models in Automated Medical Literature Screening: A Systematic Review and Meta-analysis
0
Zitationen
7
Autoren
2026
Jahr
Abstract
Abstract Objective To systematically evaluate the diagnostic performance of large language models (LLMs) in automated medical literature screening and to determine their potential role in supporting evidence synthesis workflows. Methods A systematic review and meta-analysis was conducted according to PRISMA DTA guidance. PubMed, Web of Science, Embase, the Cochrane Library and Google Scholar were searched from 1 January 2022 to 17 November 2025. Studies assessing LLMs for automated title and abstract screening or full-text eligibility assessment in medical literature were included. Diagnostic accuracy metrics were extracted and pooled using a bivariate random effects model and hierarchical summary receiver operating characteristic (HSROC) analysis. Subgroup analyses and meta-regression were performed to explore sources of heterogeneity. Results Eighteen studies published between 2023 and 2025 were included. In title and abstract screening, the pooled sensitivity was 0.92 and pooled specificity was 0.94. The SROC area under the curve (AUC) reached 0.98. In full-text screening, pooled sensitivity and specificity both reached 0.99 and the AUC was 0.99. Prompt strategies incorporating examples or chain-of-thought reasoning significantly improved sensitivity. Across studies, most models were deployed without task specific fine tuning and still achieved strong performance. Subgroup analyses and meta regression did not identify significant sources of heterogeneity. Many studies also reported substantial efficiency gains, including large reductions in screening workload, time and cost. Conclusion LLMs demonstrate high diagnostic accuracy for automated medical literature screening, particularly in full-text assessment. These models show strong potential as high sensitivity assistive tools that can substantially reduce manual screening burden while supporting evidence synthesis. Further methodological optimization and validation in large scale real-world settings are required to establish their long term role in evidence-based medicine.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.418 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.288 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.726 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.516 Zit.