Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Explainable machine learning for long-term cardiovascular disease risk prediction in Chinese middle-aged and older adults: a 9-year longitudinal cohort study with web-based risk calculator
0
Zitationen
5
Autoren
2026
Jahr
Abstract
Cardiovascular disease represents the leading cause of mortality in China, accounting for over 40% of all deaths. Existing risk prediction models predominantly derive from Western populations, rendering them suboptimally calibrated for the Chinese middle-aged and elderly demographic. Conventional statistical approaches inadequately capture non-linear associations within high-dimensional data, whilst machine learning models, despite superior performance, suffer from insufficient interpretability. This study leveraged a nationally representative cohort to develop an interpretable machine learning-based tool for long-term cardiovascular risk prediction tailored to the Chinese population. To compare the predictive performance of ten machine learning algorithms using data from the China Health and Retirement Longitudinal Study (CHARLS), identify the optimal model, achieve transparent interpretation through SHapley Additive exPlanations (SHAP) methodology, and develop an individualized cardiovascular risk assessment tool for Chinese residents aged 45 years and above. The study enrolled 8,080 participants aged ≥ 45 years without baseline cardiovascular disease from the CHARLS 2011-2020 longitudinal dataset, with 9-year follow-up. The primary outcome was incident cardiovascular disease. From 77 candidate variables, logistic regression analysis identified 11 predictors: geographical region, hypertension, dyslipidaemia, liver disease, asthma, depression score, age, sleep duration, triglycerides, high-density lipoprotein cholesterol, and waist circumference. The cohort was randomly partitioned into training (n = 5,657, 70%) and validation (n = 2,423, 30%) sets. Ten predictive models were constructed, including random forest, gradient boosting machine, and extreme gradient boosting. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC), calibration plots, and decision curve analysis. Feature contributions were elucidated using SHAP values. Incident cardiovascular disease occurred in 1,246 participants (22.0%) within the training cohort. Multivariable analysis identified hypertension (adjusted OR 1.80), waist circumference (adjusted OR 1.05 per 1-cm increment), dyslipidaemia (adjusted OR 1.42), and liver disease (adjusted OR 1.60) as principal independent predictors. Among ten algorithms evaluated, random forest demonstrated superior performance: validation set AUC 0.829 (95% CI 0.809-0.848), accuracy 0.770, sensitivity 0.681, specificity 0.795. The model exhibited excellent calibration and yielded maximal net clinical benefit across the 10%-85% risk threshold spectrum. SHAP analysis revealed waist circumference as the predominant contributor, followed by triglycerides, age, and hypertension. Psychobehavioural factors (depression, sleep duration) demonstrated independent predictive value. A web-based risk calculator was developed, providing real-time individual 9-year cardiovascular disease probability estimates. The random forest model accurately predicts cardiovascular disease risk in the Chinese middle-aged and elderly population, with waist circumference emerging as the most critical predictor. Translated into an online assessment tool, this model facilitates community-based screening and individualized prevention, offering a pragmatic risk stratification approach for resource-constrained settings.
Ähnliche Arbeiten
"Why Should I Trust You?"
2016 · 14.432 Zit.
A Comprehensive Survey on Graph Neural Networks
2020 · 8.749 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.288 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.726 Zit.
Artificial intelligence in healthcare: past, present and future
2017 · 4.449 Zit.