Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Assessing the risk of bias of clinical trials with large language models and ROBUST-RCT: a feasibility study
0
Zitationen
4
Autoren
2026
Jahr
Abstract
Risk of bias assessment is a crucial step in evidence synthesis. The traditionally adopted tool, however, is complex, resource-intensive, and unreliable. While prior investigations have focused on whether Large Language Models (LLMs) could perform assessments with RoB 2, this study is the first to evaluate the reliability of ROBUST-RCT, a novel risk-of-bias tool, as applied by humans and LLMs. Reviewers working independently used ROBUST-RCT to assess different aspects of a sample of RCTs and then reached a consensus through discussion. A chain-of-thought prompt instructed four LLMs on how to apply ROBUST-RCT. The primary analysis used Gwet’s AC2 to assess inter-rater reliability based on all the final ratings (i.e., the ratings in the second step of the tool) for all the core items of the ROBUST-RCT. A sample of 56 assessments, derived from 9 studies, was compared for each LLM against human consensus. In the primary analysis, Gwet’s AC2 inter-rater reliability varied across the LLMs. DeepSeek-R1, the lowest performer, yielded an AC2 of 0.46 ( 95% CI: 0.24 to 0.69). On the other side, Gemini 2.5 Pro Preview – the model with higher consistency with human consensus – yielded an AC2 of 0.69 (95% CI: 0.54 to 0.84). With 95% confidence, three of the four tested LLMs achieved ‘moderate’ or higher reliability based on benchmarking. LLMs could be helpful in the risk-of-bias assessment of systematic reviews using the ROBUST-RCT tool.
Ähnliche Arbeiten
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews
2021 · 87.395 Zit.
Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement
2009 · 82.932 Zit.
The Measurement of Observer Agreement for Categorical Data
1977 · 77.384 Zit.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement
2009 · 63.130 Zit.
Measuring inconsistency in meta-analyses
2003 · 61.808 Zit.