COMPARATIVE STUDY OF LIGHTGBM, CATBOOST, AND RANDOM FOREST IN MODELING PUBLIC COMPLAINTS CLASSIFICATION
Abstract
Public complaints data on maladministration in Indonesia is a dataset with high-cardinality categorical variables and imbalanced category distributions, posing significant challenges for conventional machine learning algorithms. To address this issue, this study aims to evaluate and compare the performance of three widely used classification algorithms (LightGBM, CatBoost, and Random Forest) on actual public complaint data that has never been analysed using machine learning methods. Hyperparameter tuning was applied to obtain optimal configurations and ensure robust performance. Analysis was conducted using 30 repeated simulations with accuracy and sensitivity as the primary metrics. ANOVA followed by Tukey HSD was used to explicitly determine whether there were differences in performance between models at a 95% confidence level. The results show that LightGBM performed best with an accuracy of 74.50% and a sensitivity of 76.70%, followed by CatBoost with an accuracy of 74.12% and a sensitivity of 75.54%, while Random Forest lagged far behind. Statistical tests confirmed significant performance differences between the three models. This study is not without limitations. Only three classification algorithms were evaluated, encoding strategies were not systematically compared, and the hyperparameter search space was restricted, meaning broader model exploration may yield improved performance. Nonetheless, the study provides originality and value by representing the first empirical application of machine learning to Indonesian public complaint data on maladministration, demonstrating how algorithm selection directly affects predictive outcomes when handling complex categorical structures. The findings offer practical insights for government agencies, highlighting how data-driven models can support policy design, strengthen transparency, and improve the quality of public services.
Downloads
References
S. C. and S. R. Balasundaram, “DATA ANALYSIS IN CONTEXT-BASED STATISTICAL MODELING IN PREDICTIVE ANALYTICS,” pp. 96–114, 2021, doi: https://doi.org/10.4018/978-1-7998-3053-5.ch006
X. Wang, X. Y. Lou, S. Y. Hu, and S. C. He, “EVALUATION OF SAFE DRIVING BEHAVIOR OF TRANSPORT VEHICLES BASED ON K-SVM-XGBOOST,” in 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), IEEE, pp. 84–92, Apr. 2020, doi: https://doi.org/10.1109/AEMCSE50948.2020.00026
G. Ke et al., “LIGHTGBM: A HIGHLY EFFICIENT GRADIENT BOOSTING DECISION TREE,” Adv Neural Inf Process Syst, vol. 30, pp. 3146–3154, 2017.
L. Prokhorenkova, G. Gusev, A. Vorobev, A. Dorogush, and A. Gulin, “CATBOOST: UNBIASED BOOSTING WITH CATEGORICAL FEATURES,” Adv Neural Inf Process Syst, vol. 31, pp. 6638–6648, 2018.
S. M. Intani, B. I. Nasution, M. E. Aminanto, Y. Nugraha, N. Muchtar, and J. I. Kanggrawan, “AUTOMATING PUBLIC COMPLAINT CLASSIFICATION THROUGH JAKLAPOR CHANNEL: A CASE STUDY OF JAKARTA, INDONESIA,” in 2022 IEEE International Smart Cities Conference (ISC2), IEEE, pp. 1–6, Sep. 2022, doi: https://doi.org/10.1109/ISC255366.2022.9922346
E. D. Madyatmadja, C. P. M. Sianipar, C. Wijaya, and D. J. M. Sembiring, “CLASSIFYING CROWDSOURCED CITIZEN COMPLAINTS THROUGH DATA MINING: ACCURACY TESTING OF K-NEAREST NEIGHBORS, RANDOM FOREST, SUPPORT VECTOR MACHINE, AND ADABOOST,” Informatics, vol. 10, no. 4, p. 84, Nov. 2023, doi: https://doi.org/10.3390/informatics10040084
W. Liang, S. Luo, G. Zhao, and H. Wu, “PREDICTING HARD ROCK PILLAR STABILITY USING GBDT, XGBOOST, AND LIGHTGBM ALGORITHMS,” Mathematics, vol. 8, no. 5, p. 765, May 2020, doi: https://doi.org/10.3390/math8050765
J. T. Hancock and T. M. Khoshgoftaar, “CATBOOST FOR BIG DATA: AN INTERDISCIPLINARY REVIEW,” J Big Data, vol. 7, no. 1, p. 94, Dec. 2020, doi: https://doi.org/10.1186/s40537-020-00369-8
D. Setiawan, H. Wijayanto, and L. O. A. Rahman, “BAGGING AND RANDOM FOREST CLASSIFICATION METHODS FOR UNBALANCED DATA SCHOOL DROPOUT CASES IN LAMPUNG PROVINCE,” p. 020026, 2022, doi: https://doi.org/10.1063/5.0109130
A. Pratiwi, K. A. Notodiputro, and H. Wijayanto, “PEMODELAN LOYALITAS KONSUMEN SUSU PERTUMBUHAN DALAM MENGIKUTI PROGRAM REWARDS MENGGUNAKAN METODE RANDOM FOREST DAN NEURAL NETWORK,” Xplore: Journal of Statistics, vol. 2, no. 2, pp. 41–48, Aug. 2018, doi: https://doi.org/10.29244/xplore.v2i2.104
F. Izzati, M. Masjkur, and F. M. Afendi, “COMPARISON OF CHI-SQUARE AUTOMATIC INTERACTION DETECTOR (CHAID) AND RANDOM FOREST METHODS IN THE CLASSIFICATION OF HOUSEHOLD POVERTY STATUS IN CENTRAL JAVA,” Indonesian Journal of Statistics and Its Applications, vol. 8, no. 1, pp. 1–13, Jun. 2024. https://doi.org/10.29244/ijsa.v8i1p1-13
T.-H. Lee, A. Ullah, and R. Wang, “BOOTSTRAP AGGREGATING AND RANDOM FOREST,” 2020, pp. 389–429. https://doi.org/10.1007/978-3-030-31150-6_13
C. Bentéjac, A. Csörgő, and G. Martínez-Muñoz, “A COMPARATIVE ANALYSIS OF GRADIENT BOOSTING ALGORITHMS,” Artif Intell Rev, vol. 54, no. 3, pp. 1937–1967, Mar. 2021, doi: https://doi.org/10.1007/s10462-020-09896-5
D. Zhang and Y. Gong, “THE COMPARISON OF LIGHTGBM AND XGBOOST COUPLING FACTOR ANALYSIS AND PREDIAGNOSIS OF ACUTE LIVER FAILURE,” IEEE Access, vol. 8, pp. 220990–221003, 2020, doi: https://doi.org/10.1109/ACCESS.2020.3042848
A. V. Dorogush, V. Ershov, and A. Gulin, “CATBOOST: GRADIENT BOOSTING WITH CATEGORICAL FEATURES SUPPORT,” ArXiv, vol. abs/1810.11363, 2018.
H. A. Salman, A. Kalakech, and A. Steiti, “RANDOM FOREST ALGORITHM OVERVIEW,” Babylonian Journal of Machine Learning, vol. 2024, pp. 69–79, Jun. 2024, doi: https://doi.org/10.58496/BJML/2024/007
M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: MULTI-LABEL CONFUSION MATRIX,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: https://doi.org/10.1109/ACCESS.2022.3151048
T. Zhu, “ANALYSIS ON THE APPLICABILITY OF THE RANDOM FOREST,” J Phys Conf Ser, vol. 1607, no. 1, p. 012123, Aug. 2020, doi: https://doi.org/10.1088/1742-6596/1607/1/012123
M. N. Wright and I. R. König, “SPLITTING ON CATEGORICAL PREDICTORS IN RANDOM FORESTS,” PeerJ, vol. 7, p. e6339, Feb. 2019, doi: https://doi.org/10.7717/peerj.6339
G. Biau, B. Cadre, and L. Rouvière, “ACCELERATED GRADIENT BOOSTING,” Mach Learn, vol. 108, no. 6, pp. 971–992, Jun. 2019, doi: https://doi.org/10.1007/s10994-019-05787-1
Copyright (c) 2026 Oktaviyani Daswati, Hari Wijayanto, Farit Mochamad Afendi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.




1.gif)


