Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Human-led and artificial intelligence-automated critical appraisal of systematic reviews: Comparative evaluation
1
Zitationen
4
Autoren
2025
Jahr
Abstract
AIM: To evaluate and compare human-led and artificial intelligence-automated critical appraisal of evidence. BACKGROUND: Critical appraisal is essential in evidence-based practice, yet many nurses lack the skills to perform it. Large language models offer potential support, but their role in critical appraisal remains underexplored. DESIGN: We conducted a comparative study to evaluate the performance of five commonly used large language models versus two human reviewers in appraising four systematic reviews on interventions to reduce medication administration errors. METHODS: We compared large language models and two human reviewers in independently appraising four systematic reviews using the JBI Critical Appraisal Checklist. These models were Perplexity Sonar (Pro), Claude 3.7 Sonnet, Gemini 2.0 Flash, GPT-4.5 and Grok-2. All models received identical full texts and standardized prompts. Responses were analyzed descriptively and agreement was assessed using Cohen's Kappa. RESULTS: Large language models showed full agreement with human reviewers on five of 11 JBI items. Most disagreements occurred in appraising search strategy, inclusion criteria and publication bias. The agreement between human reviewers and large language models ranged from slight to moderate. The highest level of agreement was observed with Claude (κ = 0.732), while the lowest level was observed with Gemini (κ = 0.394). CONCLUSION: Large language models can support aspects of critical appraisal evidence but lack contextual reasoning and methodological insight required for complex judgments. While Claude 3.7 Sonnet aligned most closely with human reviewers, human oversight remains essential. Large language models should serve as adjuncts and not substitutes for evidence-based practice.
Ähnliche Arbeiten
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews
2021 · 90.290 Zit.
Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement
2009 · 83.068 Zit.
The Measurement of Observer Agreement for Categorical Data
1977 · 77.957 Zit.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement
2009 · 63.529 Zit.
Measuring inconsistency in meta-analyses
2003 · 62.162 Zit.