INTEGRATION OF SVM AND SMOTE-NC FOR CLASSIFICATION OF HEART FAILURE PATIENTS
Abstract
SMOTE (Synthetic Minority Over-sampling Technique) and SMOTE-NC (SMOTE for Nominal and Continuous features) are variations of the original SMOTE algorithm designed to handle imbalanced datasets with continuous and nominal features. The primary difference lies in their ability to generate synthetic examples for the minority class when dealing with continuous and nominal features. We employed a dataset comprising continuous and nominal features from heart failure patients. The distribution of patients' statuses, either deceased or alive, exhibited an imbalance. To address this, we executed a data balancing procedure using SMOTE-NC before conducting the classification analysis with SVM. It was found that the combination of SVM and SMOTE-NC methods gave better results than the SVM method, seen from the higher level of accuracy and F1 score. F1 gives less sensitivity to class imbalance compared to accuracy. Suppose there is a significant imbalance in the number of instances between classes. In that case, the F1 score can be a more informative metric for evaluating a classifier's performance, especially when the minority class is of interest.
Downloads
References
S. Agarwal, “Data Mining: Data Mining Concepts and Techniques,” in 2013 International Conference on Machine Intelligence and Research Advancement, 2013.
A. Charleonnan, T. Fufaung, T. Niyomwong, W. Chokchueypattanakit, S. Suwannawach, and N. Ninchawee, “Predictive Analytics for Chronic Kidney Disease using Machine Learning Technique,” in 2016 Management and Innovation Technology International Conference (MITicon), 2016.
Y. Yao et al., “K-SVM: An Effective SVM Algorithm Based on K-Means Clustering,” J Comput (Taipei), vol. 8, no. 10, pp. 2632–2639, 2013.
World Health Organization, “cardiovascular diseases (CVDs),” World Health Organization, Jun. 11, 2021.
L. Mathews and S. Hari, “Learning from Imbalanced Data,” 2019, pp. 403–414. doi: 10.4018/978-1-5225-7598-6.ch030.
A. E. Karrar, “Investigate the Ensemble Model by Intelligence Analysis to Improve the Accuracy of the Classification Data in the Diagnostic and Treatment Interventions for Prostate Cancer,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 1, 2022, doi: 10.14569/IJACSA.2022.0130122.
R. A. Barro and D. S. Itasia, “Penerapan Synthetic Minority Oversampling Technique (SMOTE) Terhadap Data Tidak Seimbang pada Pembuatan Model Komposisi Jamu,” Xplore, vol. 1, no. 1, pp. 1–6, 2013.
E. C. Gök and M. O. Olgun, “SMOTE-NC and gradient boosting imputation based random forest classifier for predicting severity level of covid-19 patients with blood samples,” Neural Comput Appl, vol. 33, pp. 15693–15707, 2021.
J. Nayak, B. Naik, and H. S. Behera, “A Comprehensive Survey on Support Vector Machine in Data Mining Tasks: Applications & Challenges,” International Journal of Database Theory and Application, vol. 8, no. 1, pp. 169–186, 2015.
M. Nemati, J. Ansary, and N. Nemati, “Machine-Learning Approaches in COVID-19 Survival Analysis and Discharge-Time Likelihood Prediction Using Clinical Data,” Patterns, vol. 1, no. 5, p. 100074, Aug. 2020, doi: 10.1016/j.patter.2020.100074.
I. Ibrahim and A. Abdulazeez, “The Role of Machine Learning Algorithms for Diagnosing Diseases,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 10–19, Mar. 2021, doi: 10.38094/jastt20179.
M. S. Hossain and G. Muhammad, “Healthcare Big Data Voice Pathology Assessment Framework,” IEEE Access, vol. 4, pp. 7806–7815, 2016, doi: 10.1109/ACCESS.2016.2626316.
N. L. Fitriyani, M. Syafrudin, G. Alfian, and J. Rhee, “HDPM: An Effective Heart Disease Prediction Model for a Clinical Decision Support System,” IEEE Access, vol. 8, pp. 133034–133050, 2020, doi: 10.1109/ACCESS.2020.3010511.
E. Sadrfaridpour, T. Razzaghi, and I. Safro, “Engineering fast multilevel support vector machines,” Mach Learn, vol. 108, no. 11, pp. 1879–1917, Nov. 2019, doi: 10.1007/s10994-019-05800-7.
L. Cao and H. Shen, “Imbalanced data classification based on hybrid resampling and twin support vector machine,” Computer Science and Information Systems, vol. 14, no. 3, pp. 579–595, 2017, doi: 10.2298/CSIS161221017L.
H. Xi and T. Chang, “Image Classification Based on Histogram Intersection Kernel,” Journal of Computer and Communications, vol. 03, no. 11, pp. 158–163, 2015, doi: 10.4236/jcc.2015.311025.
D. Chicco and G. Jurman, “Machine Learning Can Predict Survival of Patients with Heart Failure from Serum Creatinine and Ejection Fraction Alone,” BMC Med Inform Decis Mak, vol. 20, no. 16, pp. 1–16, 2020.
Copyright (c) 2023 Dina Tri Utari
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.