INTEGRATION OF SVM AND SMOTE-NC FOR CLASSIFICATION OF HEART FAILURE PATIENTS

Dina Tri Utari

doi:10.30598/barekengvol17iss4pp2263-2272

Dina Tri Utari Department of Statistics, Faculty of Mathematics and Natural Sciences, Indonesia Islamic University, Indonesia

DOI: https://doi.org/10.30598/barekengvol17iss4pp2263-2272

Keywords: Imbalanced data, SMOTE-NC, SVM, Classification, Heart failure

Abstract

SMOTE (Synthetic Minority Over-sampling Technique) and SMOTE-NC (SMOTE for Nominal and Continuous features) are variations of the original SMOTE algorithm designed to handle imbalanced datasets with continuous and nominal features. The primary difference lies in their ability to generate synthetic examples for the minority class when dealing with continuous and nominal features. We employed a dataset comprising continuous and nominal features from heart failure patients. The distribution of patients' statuses, either deceased or alive, exhibited an imbalance. To address this, we executed a data balancing procedure using SMOTE-NC before conducting the classification analysis with SVM. It was found that the combination of SVM and SMOTE-NC methods gave better results than the SVM method, seen from the higher level of accuracy and F1 score. F1 gives less sensitivity to class imbalance compared to accuracy. Suppose there is a significant imbalance in the number of instances between classes. In that case, the F1 score can be a more informative metric for evaluating a classifier's performance, especially when the minority class is of interest.

Downloads

Download data is not yet available.

References

S. Agarwal, “Data Mining: Data Mining Concepts and Techniques,” in 2013 International Conference on Machine Intelligence and Research Advancement, 2013.

A. Charleonnan, T. Fufaung, T. Niyomwong, W. Chokchueypattanakit, S. Suwannawach, and N. Ninchawee, “Predictive Analytics for Chronic Kidney Disease using Machine Learning Technique,” in 2016 Management and Innovation Technology International Conference (MITicon), 2016.

Y. Yao et al., “K-SVM: An Effective SVM Algorithm Based on K-Means Clustering,” J Comput (Taipei), vol. 8, no. 10, pp. 2632–2639, 2013.

World Health Organization, “cardiovascular diseases (CVDs),” World Health Organization, Jun. 11, 2021.

L. Mathews and S. Hari, “Learning from Imbalanced Data,” 2019, pp. 403–414. doi: 10.4018/978-1-5225-7598-6.ch030.

A. E. Karrar, “Investigate the Ensemble Model by Intelligence Analysis to Improve the Accuracy of the Classification Data in the Diagnostic and Treatment Interventions for Prostate Cancer,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 1, 2022, doi: 10.14569/IJACSA.2022.0130122.

R. A. Barro and D. S. Itasia, “Penerapan Synthetic Minority Oversampling Technique (SMOTE) Terhadap Data Tidak Seimbang pada Pembuatan Model Komposisi Jamu,” Xplore, vol. 1, no. 1, pp. 1–6, 2013.

E. C. Gök and M. O. Olgun, “SMOTE-NC and gradient boosting imputation based random forest classifier for predicting severity level of covid-19 patients with blood samples,” Neural Comput Appl, vol. 33, pp. 15693–15707, 2021.

J. Nayak, B. Naik, and H. S. Behera, “A Comprehensive Survey on Support Vector Machine in Data Mining Tasks: Applications & Challenges,” International Journal of Database Theory and Application, vol. 8, no. 1, pp. 169–186, 2015.

M. Nemati, J. Ansary, and N. Nemati, “Machine-Learning Approaches in COVID-19 Survival Analysis and Discharge-Time Likelihood Prediction Using Clinical Data,” Patterns, vol. 1, no. 5, p. 100074, Aug. 2020, doi: 10.1016/j.patter.2020.100074.

I. Ibrahim and A. Abdulazeez, “The Role of Machine Learning Algorithms for Diagnosing Diseases,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 10–19, Mar. 2021, doi: 10.38094/jastt20179.

M. S. Hossain and G. Muhammad, “Healthcare Big Data Voice Pathology Assessment Framework,” IEEE Access, vol. 4, pp. 7806–7815, 2016, doi: 10.1109/ACCESS.2016.2626316.

N. L. Fitriyani, M. Syafrudin, G. Alfian, and J. Rhee, “HDPM: An Effective Heart Disease Prediction Model for a Clinical Decision Support System,” IEEE Access, vol. 8, pp. 133034–133050, 2020, doi: 10.1109/ACCESS.2020.3010511.

E. Sadrfaridpour, T. Razzaghi, and I. Safro, “Engineering fast multilevel support vector machines,” Mach Learn, vol. 108, no. 11, pp. 1879–1917, Nov. 2019, doi: 10.1007/s10994-019-05800-7.

L. Cao and H. Shen, “Imbalanced data classification based on hybrid resampling and twin support vector machine,” Computer Science and Information Systems, vol. 14, no. 3, pp. 579–595, 2017, doi: 10.2298/CSIS161221017L.

H. Xi and T. Chang, “Image Classification Based on Histogram Intersection Kernel,” Journal of Computer and Communications, vol. 03, no. 11, pp. 158–163, 2015, doi: 10.4236/jcc.2015.311025.

D. Chicco and G. Jurman, “Machine Learning Can Predict Survival of Patients with Heart Failure from Serum Creatinine and Ejection Fraction Alone,” BMC Med Inform Decis Mak, vol. 20, no. 16, pp. 1–16, 2020.

INTEGRATION OF SVM AND SMOTE-NC FOR CLASSIFICATION OF HEART FAILURE PATIENTS

Abstract

Downloads

References

Editorial Office

Contact Info