Cardiovascular Disease Prediction Using Supervised Ensemble Machine Learning and Shapley Values

(1) Milpitas High School, (2) University of Oxford

https://doi.org/10.59720/23-257
Cover photo for Cardiovascular Disease Prediction Using Supervised Ensemble Machine Learning and Shapley Values

Cardiovascular disease (CVD) is the leading cause of death globally. The lack of awareness of coronary heart disease (CHD), a type of CVD, symptoms can potentially increase the vulnerability to experiencing a heart attack or cardiac arrest, making the early diagnosis and treatment of CHD imperative. The predictive modeling of clinical data has seen exponential growth over the past decade. Enhancing the traditional prognosis capacity with predictive modeling presents a lucrative and viable approach for doctors to predict the risk of CVD. This research is focused on evaluating multiple machine learning and deep learning algorithms to predict the onset of CVD. We hypothesized that supervised machine learning models with feature interpretability and ensemble learning could be deployed using clinical diagnosis data for reasonably accurate cardiovascular disease prediction. We observed that the smaller CVD dataset had a class imbalance problem, which was minimized by employing the adaptive synthetic (ADASYN) sampling technique to improve model performance. This study demonstrated that boosting algorithms can efficiently be deployed on small or large clinical datasets to predict diseases more accurately. The results indicated that while deep learning performs better on larger unstructured datasets, it is less efficient on tabular data, and ensemble boosting models outperformed other supervised machine learning and deep learning models, with 74% prediction accuracy. Shapley values were utilized to identify the risk factors that contributed most to the classification decision with XGBoost, demonstrating the high impact of systolic blood pressure and age on CVD, which aligned with findings in the field of clinical research.

Download Full Article as PDF