Classification of Poverty in Maluku Province using SMOTE-Random Forest Algorithm
Abstract
Poverty is a complex issue. According to BPS publications, in 2023, the poverty line in Indonesia has reached 9.57%. Maluku is one of the provinces with a high poverty rate, reaching 16.23%. This research aims to classify poverty status in Maluku Province using the SMOTE-random forest algorithm. This research uses SUSENAS 2022 data, where the data is not balanced. SMOTE is used to overcome this problem. The best model obtained has an accuracy rate of 85.8%. The model is based on a training data proportion of 75% and testing 25%, with parameters m=4 and r=100. The critical factor that influences poverty status in Maluku Province is the number of households.
Downloads
References
I. L. Organization, “World Employment Social Outlook,” 2019.
BPS, “Indonesia Poverty Profile in March 2023,” 2023.
Statistics Indonesia, Maluku Province, “Poverty Profile in Maluku September 2022,” Ambon, 2023.
BPS, “Indonesia Poverty Profile in September 2022,” 2023.
BPS, “Maluku Province in Figures 2019,” 2020.
M. of Health, “Indonesia Health Profile 2017,” Jakarta, 2018.
Statistics Indonesia, Maluku Province, “Maluku Province in Figures 2019,” Ambon, 2020.
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. Waltham, MA: Morgan Kaufmann Publishers, 2012.
L. Breiman, “Random Forests,” Machine Learning, vol. 45, pp. 5–32, 2001.
R. Supriyadi, W. Gata, N. Maulidah, and A. Fauzi, “Penerapan Algoritma Random Forest Untuk Menentukan Kualitas Anggur Merah,” E-Bisnis : Jurnal Ilmiah Ekonomi dan Bisnis, vol. 13, no. 2, pp. 67–75, 2020, doi: 10.51903/e-bisnis.v13i2.247.
A. Liaw and M. Wiener, “Classification and regression by randomForest,” R Journal, vol. 2, no. 3, pp. 18–22, 2002, [Online]. Available: http://www.stat.berkeley.edu/
F. Izzati, “Perbandingan Metode CHAID dan Random Forest,” Skripsi, 2022.
M. I. Putra, “Sistem Rekomendasi Kelayakan Kredit,” Skripsi, 2019.
L. Fadilah, Kalsifikasi Random Forest Pada Data Imbalanced. 2018.
R. O’Brien and H. Ishwaran, “A Random Forests Quantile Classifier for Class Imbalanced Data,” Pattern Recognition, vol. 90, pp. 232–249, 2019.
N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
S. Wang, Y. Dai, J. Shen, and J. Xuan, “Research on Expansion and Classification of Imbalanced Data Based on SMOTE Algorithm,” Scientific Reports, vol. 11, no. 24039, 2021.
D. L. Weller, T. M. T. Love, and M. Wiedmann, “Comparison of Resampling Algorithms to Address Class Imbalance when Developing Machine Learning Models,” Frontiers in Environmental Science, vol. 9, p. 701288, 2021.
E. Ismail, W. Gad, and M. Hashem, “A Hybrid Stacking-SMOTE Model for Optimizing the Prediction of Autistic Genes,” BMC Bioinformatics, vol. 24, no. 379, 2023.
S. Fayz, M. A. Rizka, and F. Maghraby, “Cervical Cancer Diagnosis Using Random Forest Classifier with SMOTE and Feature Reduction Techniques,” IEEE Access, vol. 6, pp. 59475–59485, 2018.
C. Zhang, C. Liu, X. Zhang, and G. Almpanidis, “An up-to-date comparison of state-of-the-art classification algorithms,” Expert Systems with Applications, vol. 82, pp. 128–150, 2017.
J. Hayton, “Predictive Modeling Based on Random Forests. In Predictive Modeling of Drug Sensitivity,” Ranadip: Academic Press, 2017.
A. E. Hastie, R. Tibshirani, and J. Friedman, The Element of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed. Springer, 2009.
G. Biau and E. Scornet, “A random forest guided tour,” TEST, vol. 25, no. 2, pp. 197–227, 2016.
A. Cutler, D. R. Cutler, and J. R. Stevens, “Random Forest,” in Ensemble Machine Learning: Methods and Applications, New York: Springer, 2012, pp. 157–175.
P. Sandri, M., & Zuccolotto, “Variable Selection Using Random Forests. In Data Analysis, Classification and the Forward Search,” Springer, 2006.
F. Gorunescu, Data Mining: Concepts, Models and Techniques. Berlin Heidelberg: Springer-Verlag, 2011.
J. Zhao, P., Su, X., Ge, T., & Fan, “Propensity Score and Proximity Matching Using Random Forest,” Contemporary Clinical Trials, 2016.
P. Probst, M. Wright, and A.-L. Boulesteix, “Hyperparameters and Tuning
A. Gholamy, V. Kreinovich, and O. Kosheleva, “Why 70/30 or 80/20 Relation Between Training and Testing Sets: A Pedagogical Explanation,” El Paso, 2018.
L. O. F. and N. Agustina, “Analisis Faktor-Faktor yang Memengaruhi Status Kemiskinan Ekstrem,” Seminar Nasional Official Statistics, 2023.
Suryana and K. Swarniati, “Eradicating Poverty And Human Capital Development In Indonesia: An Approach with Multilevel Logistic Regression Model,” Welfare: Jurnal Ilmu Kesejahteraan Sosial, vol. 10, no. 2, pp. 107–121, 2021.
Copyright (c) 2025 Ferina L Damamain, Lexy Janzen Sinay, Sanlly J Latupeirissa, Lusye Bakarbessy

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The author(s) hold the copyright of the published article without restriction. This policy means that the journal allows the author(s) to hold and retain publishing rights without restrictions.
The author(s) holds the copyright of published articles without limitation. This policy means that the journal allows the author to hold and retain publishing rights without restrictions. Journal editors are given the copyright to publish articles in according to agreement signed by the author and also include statement of originality of the article