PERFORMANCE ANALYSIS OF RANDOM FOREST CLASSIFICATION ON UNEMPLOYMENT RATE IN MALUKU PROVINCE BASED ON DATA BALANCING METHOD

Mahdayani Putri Yunizar; Lexy Janzen Sinay; Yudistira Yudistira

doi:10.30598/variancevol7iss1page31-38

Mahdayani Putri Yunizar Statistics Study Program, Faculty of Science and Technology, Pattimura University
Lexy Janzen Sinay Statistics Study Program, Faculty of Science and Technology, Pattimura University
Yudistira Yudistira Statistics Study Program, Faculty of Science and Technology, Pattimura University

DOI: https://doi.org/10.30598/variancevol7iss1page31-38

Keywords: ADASYN, Classification, Random Forest, ROSE, SMOTE, Unemployment

Abstract

In 2023, the number of unemployed people in Maluku will reach 59,800 or 6.08% of the total population. To reduce unemployment in Maluku, it is essential to understand the unemployment situation of the Moluccan population based on socioeconomic factors immediately. Therefore, applying classification methods such as random forests is the right step, but it is recommended that the data be balanced to get accurate results. However, the unemployment rate in Maluku is much lower than that of the unemployed, so data imbalance affects the accuracy of the classification results. Therefore, a data balancing process is needed, among others, using the Random Oversampling of Sample (ROSE), Synthetic Minority Oversampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN) methods. This study uses data from the 2023 National Labor Force Survey (SAKERNAS) conducted in February by the Central Statistics Agency (BPS) of Maluku. The number of unemployed people is smaller than the number of unemployed residents. Therefore, action needs to be taken to address data inequality. The results of this study show that the random forest classification model with SMOTE has the best performance with a combination of 90% training data and 10% testing data, with a higher AUC value than other methods, and age variables are the most essential variables built into the model.

Downloads

Download data is not yet available.