OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 12.05.2026, 16:17

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

AutoReporter: development of an artificial intelligence tool for automated assessment of research reporting guideline adherence

2025·0 Zitationen·Journal of the American Medical Informatics Association
Volltext beim Verlag öffnen

0

Zitationen

9

Autoren

2025

Jahr

Abstract

OBJECTIVES: To develop AutoReporter, a large language model (LLM) system that automates evaluation of adherence to research reporting guidelines. MATERIALS AND METHODS: Eight prompt-engineering and retrieval strategies coupled with reasoning and general-purpose LLMs were benchmarked on the SPIRIT-CONSORT-TM corpus. The top-performing approach, AutoReporter, was validated on BenchReport, a novel benchmark dataset of expert-rated reporting guideline assessments from 10 systematic reviews. RESULTS: AutoReporter, a zero-shot, no-retrieval prompt coupled with the o3-mini reasoning LLM, demonstrated strong accuracy (CONSORT 90.09%; SPIRIT: 92.07%), substantial agreement with humans (CONSORT Cohen's κ = 0.70, SPIRIT Cohen's κ = 0.77), runtime (CONSORT: 617.26 s; SPIRIT: 544.51 s), and cost (CONSORT: 0.68 USD; SPIRIT: 0.65 USD). AutoReporter achieved a mean accuracy of 91.8% and substantial agreement (Cohen's κ > 0.6) with expert ratings from the BenchReport benchmark. DISCUSSION: Structured prompting alone can match or exceed fine-tuned domain models while forgoing manually annotated corpora and computationally intensive training. CONCLUSION: Large language models can feasibly automate reporting guideline adherence assessments for scalable quality control in scientific research reporting. AutoReporter is publicly accessible at https://autoreporter.streamlit.app.

Ähnliche Arbeiten