Abstract 11985: Prediction of Atherosclerotic Cardiovascular Disease Risk Using Machine Learning and Electronic Health Record Data
Background: Risk assessment is the cornerstone for guiding atherosclerotic cardiovascular disease (ASCVD) treatment decisions. Machine learning methods and electronic health record (EHR) data offer great promise in the development of novel risk prediction models.
Methods: We used EHR data from a community-based, outpatient healthcare system in Northern California from 2006 to 2018. Patients were free of prior ASCVD, were not on statins, and had at least 5 years of follow-up. We used random forests (RF) and gradient boosting models (GBM) to develop tree-based classifiers to predict ASCVD risk. We calculated risk using the pooled cohort equations (PCE) by considering patients with complete data within the pre-specified ranges. We then allowed our tree-based models access to additional patients with missing variables, and additional variables from the EHR (e.g. weight, education, medications), and compared model performance.
Results: Our study cohort consisted of 292,758 eligible patients with ?5 years of follow-up; 2,022 had ASCVD events. Among the 148,141 patients with complete data for the PCE, the C-statistics of GBM and RF were statistically significantly higher than that of the PCE (Figure). When we trained on the complete patient cohort but restricted to variables used in the PCE, the C-statistics of recalibrated logistic regression, GBM, and RF all improved significantly to values between 0.812 and 0.817. With inclusion of additional EHR variables, GBM (C-statistic 0.818 ± 0.016) and RF (C-statistic: 0.811 ± 0.016) had significantly better performance than that of recalibrated logistic regression (C-statistic: 0.774 ± 0.016).
Conclusion: Cardiovascular risk prediction improved significantly with the inclusion of more patients and more variables and with the use of machine learning algorithms. Machine learning models developed using data from more patients and additional variables may improve ASCVD risk assessment.