Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Auto-METRICS: LLM-assisted scientific quality control for radiomics research

2025·3 Zitationen·European Journal of RadiologyOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

PURPOSE: The quality of radiomics research is critical for reliable clinical translation, yet methodological flaws remain prevalent. This study evaluates whether large language models (LLMs) can reliably assess radiomics methodological quality using the METhodological RadiomICs Score (METRICS). METHODS: We compared a commercial cloud-based LLM (Gemini Flash 2.0) METRICS assessments for 46 articles with those of radiologists using two reproducibility studies (ADA2025 and K2025, with 6 radiologist groups and 3 radiologists, respectively, with varying degrees of experience). Cohen's kappa (κ) and METRICS Pearson's correlation (PC), and error rates between LLMs and human raters were evaluated. Prompt clarifications to METRICS were suggested to improve human-LLM agreement. Twenty four privacy-preserving open LLMs were compared with Gemini Flash 2.0. RESULTS: In ADA2025, the commercial LLM achieved inter-rater agreements with human raters comparable to those between human raters (average κ = 0.48 vs. average κ = 0.48, respectively, Wilcoxon rank-sum test p = 0.41), leading to similar correlation values in METRICS scoring (average PC = 0.62 vs. average PC = 0.56, Wilcoxon rank-sum test p = 0.11). This was confirmed with K2025 (mean human-LLM κ = 0.58 vs. human-human κ = 0.57, Wilcoxon rank-sum test p = 0.28), with no evidence for correlation differences (PC = 0.68 vs. PC = 0.51, respectively, Wilcoxon rank-sum test p = 0.55). Phi4-Reasoning, an open model which can be run locally, performed comparably to Gemini Flash 2.0 (median ranking = 1 vs. median ranking = 3, respectively, across all raters). CONCLUSION: LLMs can assist in standardized radiomics quality assessment. Open privacy-preserving models can offer comparable performance to commercial cloud-based LLMs, suggesting their utility in supporting human raters for evaluating radiomics research integrity.

Autoren

Institutionen

Themen

Radiomics and Machine Learning in Medical ImagingArtificial Intelligence in Healthcare and EducationRadiology practices and education

Volltext beim Verlag öffnen

Auto-METRICS: LLM-assisted scientific quality control for radiomics research

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen