TY - JOUR
T1 - Machine Learning-Based Prediction of Mortality and Risk Factors in Patients With Chronic Kidney Disease Developed With Data From 10000 Patients Over 11 Years
AU - Ibeas, Jose
AU - Galles, Oscar
AU - Monill, Nuria
AU - Macias, Edwar
AU - Morell, Antoni
AU - Serrano, Javier
AU - Rexachs, Dolores
AU - Vicario, Jose
AU - Cokas, Jordi
AU - Martinez, Elisenda
PY - 2022/5/1
Y1 - 2022/5/1
N2 - Around the globe, over 850 million patients suffer from chronic kidney disease (CKD). These have associated with high mortality rates, in particular when undergoing renal replacement therapies (RRT) such as dialysis, reaching up to 10\ and therefore, are considered of a fragile status. CKD is also associated with cardiovascular complications that can cause mutual aggravation. Available clinical guidelines identify certain risk factors and predictive models, but those have not been tested and validated successfully for renal patients and consequently, there is for the identification of predictive factors and the prediction of mortality. This is caused by the limitations of current methodologies and statistics: current models simplify complex relationships by assuming a linear relation between risk factors and certain events, and so, there is a need for a new approach. Over the last years, a rise in Artificial Intelligence and Machine Learning has been seen, presenting an alternative for the first time. This project aimed to study the performance of different ML algorithms for the prediction of mortality and the identification of risk factors for CKD patients.Design: Retrospective analysis of a historical cohort from the Register of Renal Patients of Catalonia (RMRC) and the Catalan Agency for Health Quality and Evaluation. Group of 10 473 patients with CKD stages from first to RRT. Follow-up of 11 years, from January 2010 to December 2020. Inclusion criteria: ˃18 years. Training of an Extreme Gradient Boosting model, and comparison with other algorithms for the prediction of mortality at different times, using different follow-up periods for each patient.Methodology: Variables: i) Age, gender, body mass index, time for death (9), ii) Diagnoses (ICD-9/10) (26); iii) Laboratory variables (37) iv) All pharmacological treatments (46). For all executions, data was balanced using the SMOTETomek technique.Analysis:The patient sample presented a mean of 68.2 ± 12.9 years and 65.8\4.2\up and time windows were tested and the best results were obtained when using a 2-year period follow-up and a 4-year mortality prediction. The Area Under the Curve values obtained for each model were: XBGClassifier (0.89), LGBM Classifier (0.90), CatBoost Classifier (0.91). The 10 variables with major relevance according to the XBGClassifier (54.65\1 variables) and in this order, are cardiopathy, advanced chronic kidney disease, vasculopathy, age, neoplasia, transplant, digestive pathology, estimated glomerular filtration rate, high blood pressure. The results presented in Figure and Table correspond to the mean obtained for the 5-folds of the Cross-Validation.Table 1.Metric AlgorithmAccuracySensibilitySpecificityPPVNPVAUCXGBClassifier0.8050.8170.7930.8180.7910.886LGBMClassifier0.8150.8220.8100.8250.8070.898CatBoostClassifier0.8200.8090.8330.8380.8030.905Machine Learning techniques suppose an alternative to classical statistical methods, with a high predictive capacity for mortality. The possibility of generating algorithms with real-world data can allow the individualization of the mortality risk as well as the predictive factors.
AB - Around the globe, over 850 million patients suffer from chronic kidney disease (CKD). These have associated with high mortality rates, in particular when undergoing renal replacement therapies (RRT) such as dialysis, reaching up to 10\ and therefore, are considered of a fragile status. CKD is also associated with cardiovascular complications that can cause mutual aggravation. Available clinical guidelines identify certain risk factors and predictive models, but those have not been tested and validated successfully for renal patients and consequently, there is for the identification of predictive factors and the prediction of mortality. This is caused by the limitations of current methodologies and statistics: current models simplify complex relationships by assuming a linear relation between risk factors and certain events, and so, there is a need for a new approach. Over the last years, a rise in Artificial Intelligence and Machine Learning has been seen, presenting an alternative for the first time. This project aimed to study the performance of different ML algorithms for the prediction of mortality and the identification of risk factors for CKD patients.Design: Retrospective analysis of a historical cohort from the Register of Renal Patients of Catalonia (RMRC) and the Catalan Agency for Health Quality and Evaluation. Group of 10 473 patients with CKD stages from first to RRT. Follow-up of 11 years, from January 2010 to December 2020. Inclusion criteria: ˃18 years. Training of an Extreme Gradient Boosting model, and comparison with other algorithms for the prediction of mortality at different times, using different follow-up periods for each patient.Methodology: Variables: i) Age, gender, body mass index, time for death (9), ii) Diagnoses (ICD-9/10) (26); iii) Laboratory variables (37) iv) All pharmacological treatments (46). For all executions, data was balanced using the SMOTETomek technique.Analysis:The patient sample presented a mean of 68.2 ± 12.9 years and 65.8\4.2\up and time windows were tested and the best results were obtained when using a 2-year period follow-up and a 4-year mortality prediction. The Area Under the Curve values obtained for each model were: XBGClassifier (0.89), LGBM Classifier (0.90), CatBoost Classifier (0.91). The 10 variables with major relevance according to the XBGClassifier (54.65\1 variables) and in this order, are cardiopathy, advanced chronic kidney disease, vasculopathy, age, neoplasia, transplant, digestive pathology, estimated glomerular filtration rate, high blood pressure. The results presented in Figure and Table correspond to the mean obtained for the 5-folds of the Cross-Validation.Table 1.Metric AlgorithmAccuracySensibilitySpecificityPPVNPVAUCXGBClassifier0.8050.8170.7930.8180.7910.886LGBMClassifier0.8150.8220.8100.8250.8070.898CatBoostClassifier0.8200.8090.8330.8380.8030.905Machine Learning techniques suppose an alternative to classical statistical methods, with a high predictive capacity for mortality. The possibility of generating algorithms with real-world data can allow the individualization of the mortality risk as well as the predictive factors.
U2 - 10.1093/ndt/gfac070.077
DO - 10.1093/ndt/gfac070.077
M3 - Article
SN - 0931-0509
VL - 37
SP - i332
JO - Nephrology Dialysis Transplantation
JF - Nephrology Dialysis Transplantation
IS - Supplement_3
ER -