Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Assessing the risk of bias of clinical trials with large language models and ROBUST-RCT: a feasibility study

2026·0 Zitationen·Scientific ReportsOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Risk of bias assessment is a crucial step in evidence synthesis. The traditionally adopted tool, however, is complex, resource-intensive, and unreliable. While prior investigations have focused on whether Large Language Models (LLMs) could perform assessments with RoB 2, this study is the first to evaluate the reliability of ROBUST-RCT, a novel risk-of-bias tool, as applied by humans and LLMs. Reviewers working independently used ROBUST-RCT to assess different aspects of a sample of RCTs and then reached a consensus through discussion. A chain-of-thought prompt instructed four LLMs on how to apply ROBUST-RCT. The primary analysis used Gwet’s AC2 to assess inter-rater reliability based on all the final ratings (i.e., the ratings in the second step of the tool) for all the core items of the ROBUST-RCT. A sample of 56 assessments, derived from 9 studies, was compared for each LLM against human consensus. In the primary analysis, Gwet’s AC2 inter-rater reliability varied across the LLMs. DeepSeek-R1, the lowest performer, yielded an AC2 of 0.46 ( 95% CI: 0.24 to 0.69). On the other side, Gemini 2.5 Pro Preview – the model with higher consistency with human consensus – yielded an AC2 of 0.69 (95% CI: 0.54 to 0.84). With 95% confidence, three of the four tested LLMs achieved ‘moderate’ or higher reliability based on benchmarking. LLMs could be helpful in the risk-of-bias assessment of systematic reviews using the ROBUST-RCT tool.

Autoren

Institutionen

Themen

Meta-analysis and systematic reviewsArtificial Intelligence in Healthcare and EducationReliability and Agreement in Measurement

Volltext beim Verlag öffnen

Assessing the risk of bias of clinical trials with large language models and ROBUST-RCT: a feasibility study

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen