ENSEMBLE RESAMPLING SUPPORT VECTOR MACHINE, MULTINOMIAL REGRESSION TO MULTICLASS IMBALANCED DATA

  • Laila Qadrini Statistics Department, Faculty of Mathematics and Natural Sciences, University of West Sulawesi, Indonesia
  • Hikmah Hikmah Statistics Department, Faculty of Mathematics and Natural Sciences, University of West Sulawesi, Indonesia
  • Elviani Tande Statistics Department, Faculty of Mathematics and Natural Sciences, University of West Sulawesi, Indonesia
  • Ignasius Presda Statistics Department, Faculty of Mathematics and Natural Sciences, University of West Sulawesi, Indonesia
  • Aulia Atika Maghfirah Statistics Department, Faculty of Mathematics and Natural Sciences, University of West Sulawesi, Indonesia
  • Nilawati Nilawati Statistics Department, Faculty of Mathematics and Natural Sciences, University of West Sulawesi, Indonesia
  • Handayani Handayani Statistics Department, Faculty of Mathematics and Natural Sciences, University of West Sulawesi, Indonesia
Keywords: Bagging, Adasyn, SVM, Multinomial Regression

Abstract

Imbalanced data is a commonly encountered issue in classification analysis. This issue gives rise to prediction errors in the classification process, which in turn affects the sensitivity, particularly in the minority class. Resampling techniques can be employed as a means to mitigate the issue of Imbalanced data. Furthermore, ensemble approaches are Utilized in the classification procedure to augment the performance of classification. The present study assesses the efficacy of the bagging ensemble approach in conjunction with ADASYN as a means of addressing the aforementioned issue. The dataset Utilized in this work comprises Imbalanced Glass Identification data, Imbalanced Iris data, and Imbalanced synthetic data. The study Centres on the Utilization of Support Vector Machines (SVM) with parameter optimization using repeated cross-validation (k = 10) and the application of multinomial regression. The evaluation of classification outcomes involves a comparison between the ensemble technique and multinomial regression. This comparison is conducted under pre- and post-resampling conditions, with the evaluation metrics being accuracy, sensitivity, and specificity. The analysis of classification outcomes across the three datasets suggests that the ensemble resampling SVM approach and multinomial regression exhibit superior performance compared to the ensemble SVM and multinomial regression approaches when applied to non-resampled data. Resampling of data has been observed to enhance sensitivity, particularly in the minority class.

Downloads

Download data is not yet available.

References

Jumairah&Mulyadi, “Analisis Perbandingan Klasifikasi Algoritma CART dengan Algoritma C 4 . 5 Pada Kasus Penderita Kanker Payudara,” J. Tekno Kompak, vol. 17, no. 1, pp. 171–183, 2017.

F. Marisa, S. Kom, A. L. Maukar, T. M. Akhriza, and P. D. MMSI, Data mining konsep dan penerapannya. Deepublish, 2021.

A. Irma Prianti, “Pebandingan Metode K-Nearest Neighbor dan Adaptive Boosting pada Kasus Klasifikasi Multi Kelas,” J Stat. J. Ilm. Teor. dan Apl. Stat., vol. 13, no. 1, pp. 39–47, 2020, doi: 10.36456/jstat.vol13.no1.a3269.

L. Qadrini, H. Hikmah, and M. Megasari, “Oversampling, Undersampling, Smote SVM dan Random Forest pada Klasifikasi Penerima Bidikmisi Sejawa Timur Tahun 2017,” J. Comput. Syst. Informatics, vol. 3, no. 4, pp. 386–391, 2022, doi: 10.47065/josyc.v3i4.2154.

Q. Meidianingsih, D. E. Wardani, E. Salsabila, and A. N. Mutia, “Perbandingan Performa Metode Berbasis Support Vector Machine untuk Penanganan Klasifikasi Multi Kelas Tidak Seimbang,” vol. 23, no. 1, pp. 8–18, 2023.

S. Li, W. Song, H. Qin, and A. Hao, “Deep variance network: An iterative, improved CNN framework for unbalanced training datasets,” Pattern Recognit., vol. 81, pp. 294–308, 2018.

F. Arofah and A. Sofro, “Penerapan Regresi Logistik Multinomial untuk Analisis Model Tingkat Depresi pada Lansia,” MATHunesa J. Ilm. Mat., vol. 10, no. 1, pp. 84–93, 2022, doi: 10.26740/mathunesa.v10n1.p84-93.

L. Qadrini, “Handling Unbalanced Data With Smote Adaboost,” J. Mantik, vol. 6, no. 2, pp. 2332–2336, 2022.

S. Goswami and E. J. Wegman, “Comparison of different classification methods on glass identification for forensic research,” J. Stat. Sci. App, vol. 4, pp. 65–84, 2016.

J. He and P. Chalise, “Nested and repeated cross validation for classification model with high-dimensional data,” Rev. Colomb. Estadística, vol. 43, no. 1, pp. 103–125, 2020.

J. Beinecke and D. Heider, “Gaussian noise up-sampling is better suited than SMOTE and ADASYN for clinical decision making,” BioData Min., vol. 14, pp. 1–11, 2021.

R. D. Permatasari, S. W. Rizki, and N. N. Debataraja, “PENERAPAN SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE DALAM MENGATASI DATA TIDAK SEIMBANG PADA METODE CLASSIFICATION AND REGRESSION TREE,” Bimaster Bul. Ilm. Mat. Stat. dan Ter., vol. 9, no. 1, 2020.

L. Qadrini, “Undersampling dan K-Fold Random Forest Untuk Klasifikasi Kelas Tidak Seimbang,” Build. Informatics, Technol. Sci., vol. 4, no. 4, pp. 1967–1974, 2023, doi: 10.47065/bits.v4i4.3141.

A. Herdiansah, R. I. Borman, D. Nurnaningsih, A. A. J. Sinlae, and R. R. Al Hakim, “Klasifikasi Citra Daun Herbal Dengan Menggunakan Backpropagation Neural Networks Berdasarkan Ekstraksi Ciri Bentuk,” JURIKOM (Jurnal Ris. Komputer), vol. 9, no. 2, pp. 388–395, 2022.

A. A. Arifiyanti and E. D. Wahyuni, “SMOTE: Metode penyeimbang kelas pada klasifikasi data mining,” Scan J. Teknol. Inf. dan Komun., vol. 15, no. 1, pp. 34–39, 2020.

Published
2024-03-01
How to Cite
[1]
L. Qadrini, “ENSEMBLE RESAMPLING SUPPORT VECTOR MACHINE, MULTINOMIAL REGRESSION TO MULTICLASS IMBALANCED DATA”, BAREKENG: J. Math. & App., vol. 18, no. 1, pp. 0269-0280, Mar. 2024.