Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

GPT for RCTs? Using AI to determine adherence to clinical trial reporting guidelines

2025·12 Zitationen·BMJ OpenOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

OBJECTIVES: Adherence to established reporting guidelines can improve clinical trial reporting standards, but attempts to improve adherence have produced mixed results. This exploratory study aimed to determine how accurate a large language model generative artificial intelligence system (AI-LLM) was for determining reporting guideline compliance in a sample of sports medicine clinical trial reports. DESIGN: This study was an exploratory retrospective data analysis. OpenAI GPT-4 and Meta Llama 2 AI-LLM were evaluated for their ability to determine reporting guideline adherence in a sample of sports medicine and exercise science clinical trial reports. SETTING: Academic research institution. PARTICIPANTS: The study sample included 113 published sports medicine and exercise science clinical trial papers. For each paper, the GPT-4 Turbo and Llama 2 70B models were prompted to answer a series of nine reporting guideline questions about the text of the article. The GPT-4 Vision model was prompted to answer two additional reporting guideline questions about the participant flow diagram in a subset of articles. The dataset was randomly split (80/20) into a TRAIN and TEST dataset. Hyperparameter and fine-tuning were performed using the TRAIN dataset. The Llama 2 model was fine-tuned using the data from the GPT-4 Turbo analysis of the TRAIN dataset. PRIMARY AND SECONDARY OUTCOME MEASURES: The primary outcome was the F1-score, a measure of model performance on the TEST dataset. The secondary outcome was the model's classification accuracy (%). RESULTS: Across all questions about the article text, the GPT-4 Turbo AI-LLM demonstrated acceptable performance (F1-score=0.89, accuracy (95% CI) = 90% (85% to 94%)). Accuracy for all reporting guidelines was >80%. The Llama 2 model accuracy was initially poor (F1-score=0.63, accuracy (95% CI) = 64% (57% to 71%)) and improved with fine-tuning (F1-score=0.84, accuracy (95% CI) = 83% (77% to 88%)). The GPT-4 Vision model accurately identified all participant flow diagrams (accuracy (95% CI) = 100% (89% to 100%)) but was less accurate at identifying when details were missing from the flow diagram (accuracy (95% CI) = 57% (39% to 73%)). CONCLUSIONS: Both the GPT-4 and fine-tuned Llama 2 AI-LLMs showed promise as tools for assessing reporting guideline compliance. Next steps should include developing an efficient, open-source AI-LLM and exploring methods to improve model accuracy.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationEthics in Clinical ResearchMeta-analysis and systematic reviews

Volltext beim Verlag öffnen

GPT for RCTs? Using AI to determine adherence to clinical trial reporting guidelines

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen