Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Automated Full-text screening and accelerated reviews using large language models with Context-Aware Agents: An exploratory analysis in biomarker research
0
Zitationen
12
Autoren
2026
Jahr
Abstract
Abstract Aim Artificial Intelligence (AI) tools utilizing large language models (LLMs) can accelerate scientific literature reviews by automating title, abstract and full-text based screenings of relevant patient populations and biomarkers. We developed an AI-based tool to automate and improve full-text screening performance using LLMs to accurately identify relevant publications that meet complex criteria. Methods We conducted a literature review utilizing the PICO framework to define our inclusion and exclusion criteria, focusing on biomarkers in heart failure with reduced ejection fraction (HFrEF). An AI-based full-text screening tool was created to process 5405 selected publications, combining multi-level and task-oriented retrieval-augmented generation (RAG) and agent-based methods establishing ground truth standards to evaluate performance metrics both for the tool and human reviewers. Intra-LLM reliability was assessed by rerunning screenings on a batch of publications. Results Amongst the public and private domain models, LLaMA 3.3 70B, was selected for its superior accuracy (82%), precision (71%), and recall (100%) in screening 49 manuscripts by LLMs. During the training phase based on several hundred manuscripts, performance metrics significantly improved. Validation results showed a sensitivity of 91.4%, specificity of 53.2%, a false positive rate of 46.8%, and a false negative rate of 8.6%. The LLM outperformed human reviewers in F1 score and interrater reliability, achieving 100% consistency across multiple runs, with each run consisting of multiple LLMs on 1000 documents. Conclusion Our study demonstrated that AI tool can reduce labor-intensive efforts while maintaining accuracy in literature reviews, with greater inter-rater agreement compared to human reviewers.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.646 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.554 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.071 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.851 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.