Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Collaborating with large language models in literature screening for a systematic review of college students’ GenAI literacy

2026·0 Zitationen·Information Research an international electronic journalOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

Introduction. Literature screening is among the most time- and labour-intensive phases of systematic literature reviews (SLRs). Although large language models (LLMs) have been explored for screening, prior work has focused on medical and environmental domains and mainly benchmarked LLMs against human coders, offering limited guidance on collaborative integration in SLRs. Method. This study evaluated 12 GPT model–prompt configurations in an SLR of 1,616 publications. Two human coders screened a 10% sample (n = 162) to create a gold standard for model comparison. Performance was assessed using balanced accuracy, recall for inclusion, time efficiency, and cost efficiency. Disagreements between humans and models were analysed. Analysis. Bootstrap tests compared performance across configurations. Open coding identified error types in human–model disagreements. Group discussion resolved discrepancies. Results. GPT-5 Zeroshot and GPT-4o-mini Fewshot achieved the highest performance (accuracy = 0.990 and 0.946; recall = 1.000 and 0.933). GPT-4o-mini was faster and cheaper but more prone to overly rigid rule application. Error analysis identified 10 mismatches, leading to two corrections of human miscoding. Conclusion. LLM-assisted screening can reduce workload, improve efficiency, and correct human errors in SLR. Practical guidelines for prompt design and confidence thresholds can position LLMs as collaborative tools in SLRs.

Autoren

Institutionen

Themen

Meta-analysis and systematic reviewsMental Health via WritingArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Collaborating with large language models in literature screening for a systematic review of college students’ GenAI literacy

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen