Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Collaborating with large language models in literature screening for a systematic review of college students’ GenAI literacy
0
Zitationen
5
Autoren
2026
Jahr
Abstract
Introduction. Literature screening is among the most time- and labour-intensive phases of systematic literature reviews (SLRs). Although large language models (LLMs) have been explored for screening, prior work has focused on medical and environmental domains and mainly benchmarked LLMs against human coders, offering limited guidance on collaborative integration in SLRs. Method. This study evaluated 12 GPT model–prompt configurations in an SLR of 1,616 publications. Two human coders screened a 10% sample (n = 162) to create a gold standard for model comparison. Performance was assessed using balanced accuracy, recall for inclusion, time efficiency, and cost efficiency. Disagreements between humans and models were analysed. Analysis. Bootstrap tests compared performance across configurations. Open coding identified error types in human–model disagreements. Group discussion resolved discrepancies. Results. GPT-5 Zeroshot and GPT-4o-mini Fewshot achieved the highest performance (accuracy = 0.990 and 0.946; recall = 1.000 and 0.933). GPT-4o-mini was faster and cheaper but more prone to overly rigid rule application. Error analysis identified 10 mismatches, leading to two corrections of human miscoding. Conclusion. LLM-assisted screening can reduce workload, improve efficiency, and correct human errors in SLR. Practical guidelines for prompt design and confidence thresholds can position LLMs as collaborative tools in SLRs.
Ähnliche Arbeiten
The PRISMA 2020 statement: an updated guideline for reporting systematic reviews
2021 · 88.007 Zit.
Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement
2009 · 82.969 Zit.
The Measurement of Observer Agreement for Categorical Data
1977 · 77.527 Zit.
Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement
2009 · 63.221 Zit.
Measuring inconsistency in meta-analyses
2003 · 61.896 Zit.