PREDICTING DIABETES MELLITUS USING CATBOOST CLASSIFIER AND SHAPLEY ADDITIVE EXPLANATION (SHAP) APPROACH

  • Novia Permatasari BPS-Statistics Indonesia
  • Shafiyah Asy Syahidah BPS-Statistics Indonesia
  • Aldo Leofiro Irfiansyah Statistics Sula Islands District
  • M. Ghozy Al-Haqqoni BPS-Statistics Indonesia
Keywords: Machine Learning, Classification, CatBoost, SHAP Value, Diabetes Mellitus

Abstract

Diabetes mellitus as a metabolic disease characterized by hyperglycemia can be dangerous if it cannot be handled properly. Early detection of existing symptoms can reduce the impact of delays in treatment. This study aims to carry out early-detection patients with diabetes mellitus using a machine learning approach through data from MIT’s GOSSIS (Global Open Source Severity of Illness Score). By using Shapley Additive Explanation (SHAP) which enables prioritization of feature that determine compound classification, this study shows that the CatBoost classifier has 14 features that significantly can be used for classification with feature ‘d1_glucose_max’ or the highest glucose concentration of the patient in their serum or plasma during the first 24 hours of their unit stay has the highest impact to classify diabetes mellitus patients, then followed by age and glucose APACHE. The selected features are then classified and get the validation AUC score of 86.86%.

Downloads

Download data is not yet available.

References

World Health Organization, "Diabetes," [Online]. Available: https://www.who.int/health-topics/diabetes .

J. Chaki, S. T. Ganesh, S. Cidham and S. A. Theertan, "Machine learning and artificial intelligence based Diabetes Mellitus detection and self-management: A systematic review," Journal of King Saud University - Computer and Information Sciences, pp. 1-22, 2020.

L. Kopitar, P. Kocbek, L. Cilar, A. Sheikh and Stiglic, "Early Detection of Type 2 Diabetes Mellitus Using Machine Learning-Based Prediction Models," Scientific Reports, vol. 10/11981, 2020.

Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju and H. Tang, "Predicting Diabetes Mellitus With Machine Learning Techniques," Frontiers in Genetics, vol. 9, 2018.

Y. S. j. V. K. B. Y. S. P. Srinivasa R., "Prediction of Diabetes using Machine Learning," International Journal of Advanced Science and Technology, vol. 29, pp. 7593-9601, 2020.

H. Lai , H. Huang, K. Keshavjee, A. Guergachi and X. Gao, "Predictive Models for Diabetes Mellitus Using Machine Learning Techniques," BMC Endocr Disord, vol. 19, pp. 1-9, 2019.

R. D. Joshi and C. K. Dhakal, "Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches," International Journal of Enviromental Research and Public Health, vol. 18, pp. 1-17, 2021.

P. Rajendra and S. Latifi, "Prediction of diabetes using logisctic regression and ensemble techniques," Computer Methods and Programs in Biomedicine Update, vol. 1, pp. 1-8, 2021.

P. S. Kumar, A. Kumari K, S. Mohapatra, B. Naik, J. Nayak and M. Mishra, "CatBoost Ensemble Approach for Diabetes Risk Prediction at Early Stages," in 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON), Bhubaneswar, India, 2021.

R. Rodriguez-Perez and J. Bajorath, "Interpretation of Machine Learning Models Using Shapley Values: Application to Compound Potency and Multi-Target Activity Predictions," Journal of Computer-Aided Molecular Design, vol. 34, no. 10, pp. 1013-1026, 2020.

Q. A. Hathway, S. M. Roth, M. V. Pinti, D. C. Sprando, A. Kunovac, A. J. Durr, C. C. Cook, G. K. Fink, T. B. Cheuvront, J. H. Grossman, G. A. Aljahli, A. D. Taylor, A. P. Giromini, J. L. Allen and Hollander John M., "Machine-Learning to Stratify Diabetic Patients Using Novel Cardiac Biomarkers and Integrative Genomics," Cardiovasc Diabetol, vol. 18, no. 78, 2019.

W. McGinnis, "Target Encoder," 2016. [Online]. Available: https://contrib.scikit-learn.org/category_encoders/targetencoder.html.

J. T. Hancock and T. M. Khosghoftaar, "CatBoost for Big Data: An Interdisciplinary Review," Journal of Big Data, vol. 7, no. 94, pp. 1-45, 2020.

Yandex, "CatBoost," 2021. [Online]. Available: http://yandex.com/dev/catboost.

K. Hajian-Tilaki, "Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation," Caspian Journal of Internal Medicine, vol. 4, no. 2, pp. 627-635, 2013.

T. Akiba, S. Sano, T. Yanase and T. Ohta, "Optuna: A Next-generation Hyperparameter Optimization Framework," in KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019.

Published
2022-06-01
How to Cite
[1]
N. Permatasari, S. Asy Syahidah, A. Leofiro Irfiansyah, and M. G. Al-Haqqoni, “PREDICTING DIABETES MELLITUS USING CATBOOST CLASSIFIER AND SHAPLEY ADDITIVE EXPLANATION (SHAP) APPROACH”, BAREKENG: J. Math. & App., vol. 16, no. 2, pp. 615-624, Jun. 2022.