Classification of Poverty in Maluku Province using SMOTE-Random Forest Algorithm

Ferina L Damamain; Lexy Janzen Sinay; Sanlly J Latupeirissa; Lusye Bakarbessy

doi:10.30598/pijmathvol4iss1pp17-28

Ferina L Damamain Universitas Pattimura
Lexy Janzen Sinay Universitas Pattimura http://orcid.org/0000-0001-6311-8354
Sanlly J Latupeirissa Universitas Pattimura
Lusye Bakarbessy Universitas Pattimura

DOI: https://doi.org/10.30598/pijmathvol4iss1pp17-28

Keywords: Classification, Maluku Province, Poverty, Random Forest, SMOTE

Abstract

Poverty is a complex issue. According to BPS publications, in 2023, the poverty line in Indonesia has reached 9.57%. Maluku is one of the provinces with a high poverty rate, reaching 16.23%. This research aims to classify poverty status in Maluku Province using the SMOTE-random forest algorithm. This research uses SUSENAS 2022 data, where the data is not balanced. SMOTE is used to overcome this problem. The best model obtained has an accuracy rate of 85.8%. The model is based on a training data proportion of 75% and testing 25%, with parameters m=4 and r=100. The critical factor that influences poverty status in Maluku Province is the number of households.

Downloads

Download data is not yet available.

References

I. L. Organization, “World Employment Social Outlook,” 2019.

BPS, “Indonesia Poverty Profile in March 2023,” 2023.

Statistics Indonesia, Maluku Province, “Poverty Profile in Maluku September 2022,” Ambon, 2023.

BPS, “Indonesia Poverty Profile in September 2022,” 2023.

BPS, “Maluku Province in Figures 2019,” 2020.

M. of Health, “Indonesia Health Profile 2017,” Jakarta, 2018.

Statistics Indonesia, Maluku Province, “Maluku Province in Figures 2019,” Ambon, 2020.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. Waltham, MA: Morgan Kaufmann Publishers, 2012.

L. Breiman, “Random Forests,” Machine Learning, vol. 45, pp. 5–32, 2001.

R. Supriyadi, W. Gata, N. Maulidah, and A. Fauzi, “Penerapan Algoritma Random Forest Untuk Menentukan Kualitas Anggur Merah,” E-Bisnis : Jurnal Ilmiah Ekonomi dan Bisnis, vol. 13, no. 2, pp. 67–75, 2020, doi: 10.51903/e-bisnis.v13i2.247.

A. Liaw and M. Wiener, “Classification and regression by randomForest,” R Journal, vol. 2, no. 3, pp. 18–22, 2002, [Online]. Available: http://www.stat.berkeley.edu/

F. Izzati, “Perbandingan Metode CHAID dan Random Forest,” Skripsi, 2022.

M. I. Putra, “Sistem Rekomendasi Kelayakan Kredit,” Skripsi, 2019.

L. Fadilah, Kalsifikasi Random Forest Pada Data Imbalanced. 2018.

R. O’Brien and H. Ishwaran, “A Random Forests Quantile Classifier for Class Imbalanced Data,” Pattern Recognition, vol. 90, pp. 232–249, 2019.

N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.

S. Wang, Y. Dai, J. Shen, and J. Xuan, “Research on Expansion and Classification of Imbalanced Data Based on SMOTE Algorithm,” Scientific Reports, vol. 11, no. 24039, 2021.

D. L. Weller, T. M. T. Love, and M. Wiedmann, “Comparison of Resampling Algorithms to Address Class Imbalance when Developing Machine Learning Models,” Frontiers in Environmental Science, vol. 9, p. 701288, 2021.

E. Ismail, W. Gad, and M. Hashem, “A Hybrid Stacking-SMOTE Model for Optimizing the Prediction of Autistic Genes,” BMC Bioinformatics, vol. 24, no. 379, 2023.

S. Fayz, M. A. Rizka, and F. Maghraby, “Cervical Cancer Diagnosis Using Random Forest Classifier with SMOTE and Feature Reduction Techniques,” IEEE Access, vol. 6, pp. 59475–59485, 2018.

C. Zhang, C. Liu, X. Zhang, and G. Almpanidis, “An up-to-date comparison of state-of-the-art classification algorithms,” Expert Systems with Applications, vol. 82, pp. 128–150, 2017.

J. Hayton, “Predictive Modeling Based on Random Forests. In Predictive Modeling of Drug Sensitivity,” Ranadip: Academic Press, 2017.

A. E. Hastie, R. Tibshirani, and J. Friedman, The Element of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed. Springer, 2009.

G. Biau and E. Scornet, “A random forest guided tour,” TEST, vol. 25, no. 2, pp. 197–227, 2016.

A. Cutler, D. R. Cutler, and J. R. Stevens, “Random Forest,” in Ensemble Machine Learning: Methods and Applications, New York: Springer, 2012, pp. 157–175.

P. Sandri, M., & Zuccolotto, “Variable Selection Using Random Forests. In Data Analysis, Classification and the Forward Search,” Springer, 2006.

F. Gorunescu, Data Mining: Concepts, Models and Techniques. Berlin Heidelberg: Springer-Verlag, 2011.

J. Zhao, P., Su, X., Ge, T., & Fan, “Propensity Score and Proximity Matching Using Random Forest,” Contemporary Clinical Trials, 2016.

P. Probst, M. Wright, and A.-L. Boulesteix, “Hyperparameters and Tuning

A. Gholamy, V. Kreinovich, and O. Kosheleva, “Why 70/30 or 80/20 Relation Between Training and Testing Sets: A Pedagogical Explanation,” El Paso, 2018.

L. O. F. and N. Agustina, “Analisis Faktor-Faktor yang Memengaruhi Status Kemiskinan Ekstrem,” Seminar Nasional Official Statistics, 2023.

Suryana and K. Swarniati, “Eradicating Poverty And Human Capital Development In Indonesia: An Approach with Multilevel Logistic Regression Model,” Welfare: Jurnal Ilmu Kesejahteraan Sosial, vol. 10, no. 2, pp. 107–121, 2021.