Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Human-led and artificial intelligence-automated critical appraisal of systematic reviews: Comparative evaluation

2025·1 Zitationen·Nurse Education in PracticeOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

AIM: To evaluate and compare human-led and artificial intelligence-automated critical appraisal of evidence. BACKGROUND: Critical appraisal is essential in evidence-based practice, yet many nurses lack the skills to perform it. Large language models offer potential support, but their role in critical appraisal remains underexplored. DESIGN: We conducted a comparative study to evaluate the performance of five commonly used large language models versus two human reviewers in appraising four systematic reviews on interventions to reduce medication administration errors. METHODS: We compared large language models and two human reviewers in independently appraising four systematic reviews using the JBI Critical Appraisal Checklist. These models were Perplexity Sonar (Pro), Claude 3.7 Sonnet, Gemini 2.0 Flash, GPT-4.5 and Grok-2. All models received identical full texts and standardized prompts. Responses were analyzed descriptively and agreement was assessed using Cohen's Kappa. RESULTS: Large language models showed full agreement with human reviewers on five of 11 JBI items. Most disagreements occurred in appraising search strategy, inclusion criteria and publication bias. The agreement between human reviewers and large language models ranged from slight to moderate. The highest level of agreement was observed with Claude (κ = 0.732), while the lowest level was observed with Gemini (κ = 0.394). CONCLUSION: Large language models can support aspects of critical appraisal evidence but lack contextual reasoning and methodological insight required for complex judgments. While Claude 3.7 Sonnet aligned most closely with human reviewers, human oversight remains essential. Large language models should serve as adjuncts and not substitutes for evidence-based practice.

Autoren

Institutionen

Themen

Meta-analysis and systematic reviewsArtificial Intelligence in Healthcare and EducationDelphi Technique in Research

Volltext beim Verlag öffnen

Human-led and artificial intelligence-automated critical appraisal of systematic reviews: Comparative evaluation

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen