PERFORMANCE ANALYSIS OF RANDOM FOREST CLASSIFICATION ON UNEMPLOYMENT RATE IN MALUKU PROVINCE BASED ON DATA BALANCING METHOD
Abstract
In 2023, the number of unemployed people in Maluku will reach 59,800 or 6.08% of the total population. To reduce unemployment in Maluku, it is essential to understand the unemployment situation of the Moluccan population based on socioeconomic factors immediately. Therefore, applying classification methods such as random forests is the right step, but it is recommended that the data be balanced to get accurate results. However, the unemployment rate in Maluku is much lower than that of the unemployed, so data imbalance affects the accuracy of the classification results. Therefore, a data balancing process is needed, among others, using the Random Oversampling of Sample (ROSE), Synthetic Minority Oversampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN) methods. This study uses data from the 2023 National Labor Force Survey (SAKERNAS) conducted in February by the Central Statistics Agency (BPS) of Maluku. The number of unemployed people is smaller than the number of unemployed residents. Therefore, action needs to be taken to address data inequality. The results of this study show that the random forest classification model with SMOTE has the best performance with a combination of 90% training data and 10% testing data, with a higher AUC value than other methods, and age variables are the most essential variables built into the model.
Downloads
Copyright (c) 2025 VARIANCE: Journal of Statistics and Its Applications

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Editorial Team
Peer Review Process
Focus & Scope
Open Acces Policy
Privacy Statement
Author Guidelines
Publication Ethics
Publication Fees
Copyrigth Notice
Plagiarism Screening
Digital Archiving




