OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.05.2026, 22:42

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Automated Full-text screening and accelerated reviews using large language models with Context-Aware Agents: An exploratory analysis in biomarker research

2026·0 Zitationen·European Heart Journal - Digital HealthOpen Access
Volltext beim Verlag öffnen

0

Zitationen

12

Autoren

2026

Jahr

Abstract

Abstract Aim Artificial Intelligence (AI) tools utilizing large language models (LLMs) can accelerate scientific literature reviews by automating title, abstract and full-text based screenings of relevant patient populations and biomarkers. We developed an AI-based tool to automate and improve full-text screening performance using LLMs to accurately identify relevant publications that meet complex criteria. Methods We conducted a literature review utilizing the PICO framework to define our inclusion and exclusion criteria, focusing on biomarkers in heart failure with reduced ejection fraction (HFrEF). An AI-based full-text screening tool was created to process 5405 selected publications, combining multi-level and task-oriented retrieval-augmented generation (RAG) and agent-based methods establishing ground truth standards to evaluate performance metrics both for the tool and human reviewers. Intra-LLM reliability was assessed by rerunning screenings on a batch of publications. Results Amongst the public and private domain models, LLaMA 3.3 70B, was selected for its superior accuracy (82%), precision (71%), and recall (100%) in screening 49 manuscripts by LLMs. During the training phase based on several hundred manuscripts, performance metrics significantly improved. Validation results showed a sensitivity of 91.4%, specificity of 53.2%, a false positive rate of 46.8%, and a false negative rate of 8.6%. The LLM outperformed human reviewers in F1 score and interrater reliability, achieving 100% consistency across multiple runs, with each run consisting of multiple LLMs on 1000 documents. Conclusion Our study demonstrated that AI tool can reduce labor-intensive efforts while maintaining accuracy in literature reviews, with greater inter-rater agreement compared to human reviewers.

Ähnliche Arbeiten