Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Benchmarking large language models for predictive modeling in biomedical research with a focus on reproductive health
2
Zitationen
12
Autoren
2026
Jahr
Abstract
Large language models (LLMs) are increasingly used for code generation and data analysis. This study assesses LLM performance across four predictive tasks from three DREAM challenges: gestational age regression from transcriptomics and DNA methylation and classification of preterm birth and early preterm birth from microbiome data. We prompt LLMs with task descriptions, data locations, and target outcomes and then run LLM-generated code to fit prediction models and determine accuracy on test sets. Among the eight LLMs tested, o3-mini-high, 4o, DeepseekR1, and Gemini 2.0 can complete at least one task. R code generation is more successful (14/16) than Python (7/16). OpenAI's o3-mini-high outperforms others, completing 7/8 tasks. Test set performance of the top LLM-generated models matches or exceeds the median-participating team for all four tasks and surpasses the top-performing team for one task (p = 0.02). These findings underscore the potential of LLMs to democratize predictive modeling in omics and increase research output.
Ähnliche Arbeiten
The “Golden Age” of Probiotics: A Systematic Review and Meta-Analysis of Randomized and Observational Studies in Preterm Infants
2017 · 24.490 Zit.
Epidemiology and causes of preterm birth
2008 · 7.739 Zit.
National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: a systematic analysis and implications
2012 · 4.636 Zit.
Global, regional, and national estimates of levels of preterm birth in 2014: a systematic review and modelling analysis
2018 · 3.126 Zit.
Antenatal corticosteroids for accelerating fetal lung maturation for women at risk of preterm birth
2017 · 2.770 Zit.