ENSEMBLE RESAMPLING SUPPORT VECTOR MACHINE, MULTINOMIAL REGRESSION TO MULTICLASS IMBALANCED DATA
Abstract
Imbalanced data is a commonly encountered issue in classification analysis. This issue gives rise to prediction errors in the classification process, which in turn affects the sensitivity, particularly in the minority class. Resampling techniques can be employed as a means to mitigate the issue of Imbalanced data. Furthermore, ensemble approaches are Utilized in the classification procedure to augment the performance of classification. The present study assesses the efficacy of the bagging ensemble approach in conjunction with ADASYN as a means of addressing the aforementioned issue. The dataset Utilized in this work comprises Imbalanced Glass Identification data, Imbalanced Iris data, and Imbalanced synthetic data. The study Centres on the Utilization of Support Vector Machines (SVM) with parameter optimization using repeated cross-validation (k = 10) and the application of multinomial regression. The evaluation of classification outcomes involves a comparison between the ensemble technique and multinomial regression. This comparison is conducted under pre- and post-resampling conditions, with the evaluation metrics being accuracy, sensitivity, and specificity. The analysis of classification outcomes across the three datasets suggests that the ensemble resampling SVM approach and multinomial regression exhibit superior performance compared to the ensemble SVM and multinomial regression approaches when applied to non-resampled data. Resampling of data has been observed to enhance sensitivity, particularly in the minority class.
Downloads
References
Jumairah&Mulyadi, “Analisis Perbandingan Klasifikasi Algoritma CART dengan Algoritma C 4 . 5 Pada Kasus Penderita Kanker Payudara,” J. Tekno Kompak, vol. 17, no. 1, pp. 171–183, 2017.
F. Marisa, S. Kom, A. L. Maukar, T. M. Akhriza, and P. D. MMSI, Data mining konsep dan penerapannya. Deepublish, 2021.
A. Irma Prianti, “Pebandingan Metode K-Nearest Neighbor dan Adaptive Boosting pada Kasus Klasifikasi Multi Kelas,” J Stat. J. Ilm. Teor. dan Apl. Stat., vol. 13, no. 1, pp. 39–47, 2020, doi: 10.36456/jstat.vol13.no1.a3269.
L. Qadrini, H. Hikmah, and M. Megasari, “Oversampling, Undersampling, Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017,” J. Comput. Syst. Informatics, vol. 3, no. 4, pp. 386–391, 2022, doi: 10.47065/josyc.v3i4.2154.
Q. Meidianingsih, D. E. Wardani, E. Salsabila, and A. N. Mutia, “Perbandingan Performa Metode Berbasis Support Vector Machine untuk Penanganan Klasifikasi Multi Kelas Tidak Seimbang,” vol. 23, no. 1, pp. 8–18, 2023.
S. Li, W. Song, H. Qin, and A. Hao, “Deep variance network: An iterative, improved CNN framework for unbalanced training datasets,” Pattern Recognit., vol. 81, pp. 294–308, 2018.
F. Arofah and A. Sofro, “Penerapan Regresi Logistik Multinomial untuk Analisis Model Tingkat Depresi pada Lansia,” MATHunesa J. Ilm. Mat., vol. 10, no. 1, pp. 84–93, 2022, doi: 10.26740/mathunesa.v10n1.p84-93.
L. Qadrini, “Handling Unbalanced Data With Smote Adaboost,” J. Mantik, vol. 6, no. 2, pp. 2332–2336, 2022.
S. Goswami and E. J. Wegman, “Comparison of different classification methods on glass identification for forensic research,” J. Stat. Sci. App, vol. 4, pp. 65–84, 2016.
J. He and P. Chalise, “Nested and repeated cross validation for classification model with high-dimensional data,” Rev. Colomb. Estadística, vol. 43, no. 1, pp. 103–125, 2020.
J. Beinecke and D. Heider, “Gaussian noise up-sampling is better suited than SMOTE and ADASYN for clinical decision making,” BioData Min., vol. 14, pp. 1–11, 2021.
R. D. Permatasari, S. W. Rizki, and N. N. Debataraja, “PENERAPAN SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE DALAM MENGATASI DATA TIDAK SEIMBANG PADA METODE CLASSIFICATION AND REGRESSION TREE,” Bimaster Bul. Ilm. Mat. Stat. dan Ter., vol. 9, no. 1, 2020.
L. Qadrini, “Undersampling dan K-Fold Random Forest Untuk Klasifikasi Kelas Tidak Seimbang,” Build. Informatics, Technol. Sci., vol. 4, no. 4, pp. 1967–1974, 2023, doi: 10.47065/bits.v4i4.3141.
A. Herdiansah, R. I. Borman, D. Nurnaningsih, A. A. J. Sinlae, and R. R. Al Hakim, “Klasifikasi Citra Daun Herbal Dengan Menggunakan Backpropagation Neural Networks Berdasarkan Ekstraksi Ciri Bentuk,” JURIKOM (Jurnal Ris. Komputer), vol. 9, no. 2, pp. 388–395, 2022.
A. A. Arifiyanti and E. D. Wahyuni, “SMOTE: Metode penyeimbang kelas pada klasifikasi data mining,” Scan J. Teknol. Inf. dan Komun., vol. 15, no. 1, pp. 34–39, 2020.
Copyright (c) 2024 Laila Qadrini, Hikmah Hikmah, Elviani Tande, Ignasius Presda, Aulia Atika Maghfirah, Nilawati Nilawati, Handayani Handayani
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.