EVALUATION OF MULTIVARIATE ADAPTIVE REGRESSION SPLINES ON IMBALANCED DATASET FOR POVERTY CLASSIFICATION IN BENGKULU PROVINCE

  • Idhia Sriliana Department of Statistics, Faculty of Mathematics and Natural Sciences, University of Bengkulu, Indonesia https://orcid.org/0000-0003-3926-4129
  • Sigit Nugroho Department of Statistics, Faculty of Mathematics and Natural Sciences, University of Bengkulu, Indonesia https://orcid.org/0000-0003-4535-2045
  • Winalia Agwil Department of Statistics, Faculty of Mathematics and Natural Sciences, University of Bengkulu, Indonesia https://orcid.org/0009-0005-5893-3879
  • Esther Damayanti Sihombing Department of Statistics, Faculty of Mathematics and Natural Sciences, University of Bengkulu, Indonesia https://orcid.org/0009-0004-9301-3683
Keywords: Classification, Class Imbalance, MARS, Poverty

Abstract

Classification is a statistical method that aims to predict the class of an object whose class label is unknown. The Multivariate Adaptive Regression Splines (MARS) classification method is a classification model that involves several basis functions with influential predictor variables. The MARS classification model is generally effective in classifying imbalanced data, including poverty data classification. The response variable used is the poverty status of households classified into poor and non-poor households, and the predictor variables consist of several poverty indicators. The problem that often arises in classification methods is a class imbalance in the response variable. Due to the poverty status included in the class imbalance data, the Bootstrap Aggregating (Bagging) and Synthetic Minority Over-sampling Technique (SMOTE) approaches will be used to improve classification accuracy on the MARS model. Bagging works by replicating data to strengthen the stability of classification accuracy, while SMOTE works by synthesizing data from minority data classes. The evaluation results showed that the classification model of poverty in Bengkulu Province using the SMOTE-MARS method provides the best classification accuracy compared to the MARS (25.81%) and Bagging-MARS (32.26%) methods based on the sensitivity value obtained, which is 85.36%.

Downloads

Download data is not yet available.

References

D. Barry and W. Hardle, APPLIED NONPARAMETRIC REGRESSION., 1st ed. Cambridge University Press., 1994. doi: 10.2307/2982873.

J. Han, J. Pei, and H. Tong, DATA MINING CONCEPTS AND TECHNIQUES, 4th ed. Morgan Kaufmann, 2023.

W. Zhang, A. T. C. Goh, and Y. Zhang, “MULTIVARIATE ADAPTIVE REGRESSION SPLINES APPLICATION FOR MULTIVARIATE GEOTECHNICAL PROBLEMS WITH BIG DATA,” Geotech. Geol. Eng., vol. 34, no. 1, pp. 193–204, 2016, doi: 10.1007/s10706-015-9938-9.

D. Çanga, “USE OF MARS DATA MINING ALGORITHM BASED ON TRAINING AND TEST SETS IN DETERMINING CARCASS WEIGHT OF CATTLE IN DIFFERENT BREEDS,” Tarim Bilim. Derg., vol. 28, no. 2, pp. 259–268, 2022, doi: 10.15832/ankutbd.818397.

J. M. Johnson and T. M. Khoshgoftaar, “SURVEY ON DEEP LEARNING WITH CLASS IMBALANCE,” J. Big Data, vol. 6, no. 1, pp. 1–54, 2019, doi: 10.1186/s40537-019-0192-5.

Tamonob, Onisimus, Sumertajaya, I. Made, Rahman, and L. O. Abdul, “ANALISIS MULTIVARIATE ADAPTIVE REGRESSION SPLINES (MARS) UNTUK MENGKLASIFIKASIKAN STATUS DESA DI PROVINSI NUSA TENGGARA TIMUR,” Institute Pertanian Bogor, 2020.

R. D. L. N. Karisma, J. Juhari, and R. A Rosa, “POVERTY IN CENTRAL JAVA USING MULTIVARIATE ADAPTIVE REGRESSION SPLINES AND BOOTSTRAP AGGREGATING MULTIVARIATE ADAPTIVE REGRESSION SPLINES,” CAUCHY J. Mat. Murni dan Apl., vol. 6, no. 4, pp. 238–245, 2021, doi: 10.18860/ca.v6i4.10871.

M. Hasyim et al., “BOOTSTRAP AGGREGATING MULTIVARIATE ADAPTIVE REGRESSION SPLINES (BAGGING MARS) TO ANALYSE THE LECTURER RESEARCH PERFORMANCE IN PRIVATE UNIVERSITY,” J. Phys. Conf. Ser., vol. 1114, no. 1, 2018, doi: 10.1088/1742-6596/1114/1/012117.

B. K. Kilinc, S. Malkoc, A. S. Koparal, and B. Yazici, “USING MULTIVARIATE ADAPTIVE REGRESSION SPLINES TO ESTIMATE POLLUTION IN SOIL,” Int. J. Adv. Appl. Sci., vol. 4, no. 2, pp. 10–16, 2017, doi: 10.21833/ijaas.2017.02.002.

Nidhomuddin and B. W. Otok, “RANDOM FOREST DAN MULTIVARIATE ADAPTIVE REGRESSION SPLINE (MARS) BINARY RESPONSE UNTUK KLASIFIKASI PENDERITA HIV/AIDS DI SURABAYA,” Stat. Fak. Mat. dan Ilmu Pengetah. Alam Inst. Teknol. Sepuluh Novemb., vol. 1, no. 3, pp. 50–57, 2015.

B. P. Statistik, “PROFIL KEMISKINAN DI INDONESIA MARET 2023,” 2023. [Online]. Available: https://www.bps.go.id/pressrelease/2018/07/16/1483/persentase-penduduk-miskin-maret-2018-turun-menjadi-9-82-persen.html

Badan Pusat Statistik, “PROFIL KEMISKINAN PROVINSI BENGKULU MARET 2023,” 2023.

M. A. Sahraei, H. Duman, M. Y. Çodur, and E. Eyduran, “PREDICTION OF TRANSPORTATION ENERGY DEMAND: MULTIVARIATE ADAPTIVE REGRESSION SPLINES,” Energy, vol. 224, pp. 1–9, 2021, doi: 10.1016/j.energy.2021.120090.

D. R. Cox and E. J. Snell, ANALYSIS OF BINARY DATA, 2nd ed. CRC PRess, 1989.

B. C. L. Huang, Y. Xiang, and Z. H. Huang, “USE LOGISTIC REGRESSION TO PREDICT USER’ BEHAVIORS,” Appl. Mech. Mater., vol. 651–653, pp. 1695–1698, 2014, doi: 10.4028/www.scientific.net/AMM.651-653.1695.

A. Agresti, AN INTRODUCTION TO CATEGORICAL DATA ANALYSIS, 2nd ed., vol. 28, no. 11. Florida: A John Wiley & Sons, Inc, 2009. doi: 10.1002/sim.3564.

W. Agwil, D. Agustina, H. Fransiska, and N. Hidayati, “KLASIFIKASI KARAKTERISTIK KEMISKINAN DI PROVINSI BENGKULU TAHUN 2020 MENGGUNAKAN METODE POHON KLASIFIKASI GABUNGAN,” J. Apl. Stat. Komputasi Stat., vol. 14, no. 2, pp. 23–32, 2022.

D. Elreedy and A. F. Atiya, “A COMPREHENSIVE ANALYSIS OF SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE (SMOTE) FOR HANDLING CLASS IMBALANCE,” Inf. Sci. (Ny)., vol. 505, pp. 32–64, 2019, doi: 10.1016/j.ins.2019.07.070.

Published
2025-04-01
How to Cite
[1]
I. Sriliana, S. Nugroho, W. Agwil, and E. D. Sihombing, “EVALUATION OF MULTIVARIATE ADAPTIVE REGRESSION SPLINES ON IMBALANCED DATASET FOR POVERTY CLASSIFICATION IN BENGKULU PROVINCE”, BAREKENG: J. Math. & App., vol. 19, no. 2, pp. 1143-1156, Apr. 2025.