LEVERAGING XGBOOST, LIGHTGBM, AND CATBOOST FOR ENHANCED CUSTOMER SEGMENTATION IN THE AUTOMOTIVE INDUSTRY
Abstract
This study evaluates the performance of three gradient boosting algorithms, XGBoost, LightGBM, and CatBoost, for customer segmentation in the automotive industry. Utilizing a dataset of 8,068 training and 2,627 testing observations with 11 demographic and behavioral variables, the research aims to classify customers into four segments. The methodology includes preprocessing (handling missing values, encoding), hyperparameter tuning via Randomized Search Cross-Validation, and evaluation using ROC AUC. Results indicate that XGBoost outperforms other models, achieving an AUC of 0.5837 on testing data with significant variables, while LightGBM and CatBoost scored 0.5834 and 0.5759, respectively. Key findings highlight the importance of feature selection, with Age, Profession, and Spending Score being the most influential predictors. The study concludes that XGBoost is the most robust for segmentation tasks, though all models exhibit challenges in distinguishing overlapping classes. These insights can guide data-driven marketing strategies in automotive and related sectors.
Downloads
References
Y. Afrida, “ANALISIS PEMBIAYAAN MURABAHAH DI PERBANKAN SYARIAH,” Jurnal Ekonomi dan Bisnis Islam (JEBI), vol. 1, no. 2, pp. 155–166, 2016.
K. Suresh, “CUSTOMER SEGMENTATION CLASSIFICATION,” https://www.kaggle.com/datasets/kaushiksuresh147/customer-segmentation.
W. Nugraha and M. Syarif, “TEKNIK WEIGHTING UNTUK MENGATASI KETIDAKSEIMBANGAN KELAS PADA PREDIKSI CHURN MENGGUNAKAN XGBOOST, LIGHTGBM, DAN CATBOOST,” Techno.Com, vol. 22, no. 1, pp. 97–108, Feb. 2023. doi: https://doi.org/10.33633/tc.v22i1.7191
S. E. Herni Yulianti, Oni Soesanto, and Yuana Sukmawaty, “PENERAPAN METODE EXTREME GRADIENT BOOSTING (XGBOOST) PADA KLASIFIKASI NASABAH KARTU KREDIT,” Journal of Mathematics: Theory and Applications, pp. 21–26, Aug. 2022. doi: https://doi.org/10.31605/jomta.v4i1.1792
I. Z. A. Illah, W. S. J. Sapu, and A. T. Damaliana, “IMPLEMENTASI METODE KLASIFIKASI LIGHTGBM DAN ANALISIS SURVIVAL DALAM MEMPREDIKSI PELANGGAN CHURN,” Jurnal Komtika (Komputasi dan Informatika), vol. 8, no. 1, pp. 43–53, Jun. 2024. doi: https://doi.org/10.31603/komtika.v8i1.11194
J. Lin, “APPLICATION OF MACHINE LEARNING IN PREDICTING CONSUMER BEHAVIOR AND PRECISION MARKETING,” PLoS One, vol. 20, no. 5, p. e0321854, May 2025. doi: https://doi.org/10.1371/journal.pone.0321854
A. Geron, HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW, 2nd Edition. O’Reilly Media, Inc., 2019.
Z. Mustaffa and M. H. Sulaiman, “ADVANCED FORECASTING OF BUILDING ENERGY LOADS WITH XGBOOST AND METAHEURISTIC ALGORITHMS INTEGRATION,” Energy Storage and Saving, Aug. 2025. doi: https://doi.org/10.1016/j.enss.2025.03.005
A. R. Zaidi, T. Abbas, A. Daud, O. Alghushairy, H. Dawood, and N. Sarwar, “ENHANCING ANDROID MALWARE DETECTION WITH XGBOOST AND CONVOLUTIONAL NEURAL NETWORKS,” Computers, Materials & Continua, vol. 84, no. 2, pp. 3281–3304, 2025. doi: https://doi.org/10.32604/cmc.2025.063646
X. Song, J. Shi, C. Zhu, F. Xian, Z. Dong, and J. Li, “XGBOOST MACHINE LEARNING ALGORITHM FOR PREDICTING UNPLANNED READMISSION IN ELDERLY PATIENTS WITH CORONARY HEART DISEASE,” Geriatr Nurs (Minneap), vol. 66, p. 103609, Nov. 2025. doi: https://doi.org/10.1016/j.gerinurse.2025.103609
N. Qin, et al., “FORECASTING THE MECHANICAL COMPACTION INFLUENCE ON SOYBEAN YIELD USING XGBOOST-ANN,” Information Processing in Agriculture, Sep. 2025. doi: https://doi.org/10.1016/j.inpa.2025.09.002
S.-K. Di, Y.-Y. Wang, D. Yang, Y.-H. Liu, J. Zhang, and W.-Z. Zheng, “SMOTE-ENHANCED XGBOOST FOR RAPID SEISMIC DAMAGE ASSESSMENT OF BRIDGE PORTFOLIOS,” Soil Dynamics and Earthquake Engineering, vol. 199, p. 109712, Dec. 2025. doi: https://doi.org/10.1016/j.soildyn.2025.109712
S. Raschka and V. Mirjalili, Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, 3rd ed. Birmingham: Packt Publishing., 2019.
L. Zhang, H. Liu, and Z. Fan, “GRADIENT BOOSTING MACHINES AND THEIR APPLICATIONS IN CLASSIFICATION TASKS: AN OVERVIEW,” ACM Comput. Surv., vol. 54, no. 5, 2021.
D. D. Rufo, T. G. Debelee, A. Ibenthal, and W. G. Negera, “DIAGNOSIS OF DIABETES MELLITUS USING GRADIENT BOOSTING MACHINE (LIGHTGBM),” Diagnostics, vol. 11, no. 9, p. 1714, Sep. 2021. doi: https://doi.org/10.3390/diagnostics11091714
H. Talebi, A. K. Bardsiri, and V. K. Bardsiri, “DEVELOPING A HYBRID MACHINE LEARNING MODEL FOR EMPLOYEE TURNOVER PREDICTION: INTEGRATING LIGHTGBM AND GENETIC ALGORITHMS,” Journal of Open Innovation: Technology, Market, and Complexity, vol. 11, no. 2, p. 100557, Jun. 2025. doi: https://doi.org/10.1016/j.joitmc.2025.100557
R. Fatahi, H. Abdollahi, M. Noaparast, and M. Hadizadeh, “MODELING PROCESS CONTROL VARIABLES OF A CEMENT VERTICAL ROLLER MILL USING LIGHTGBM: FEED RATE AND MAIN DRIVE POWER,” Chemical Engineering Research and Design, vol. 219, pp. 595–610, Jul. 2025. doi: https://doi.org/10.1016/j.cherd.2025.06.019
Z. Zhang, et al., “EXHAUST EMISSIONS PREDICTION IN SPARK IGNITION ENGINE USING LIGHTGBM OPTIMIZED WITH THE MARINE PREDATORS ALGORITHM,” Appl Therm Eng, vol. 275, p. 126800, Sep. 2025. doi: https://doi.org/10.1016/j.applthermaleng.2025.126800
A. Ampountolas and S. AlGharbi, “AN INNOVATIVE HYBRID LIGHTGBM-BPNN MODEL FOR ENHANCED COMMODITY FORECASTING ACCURACY,” Finance Research Open, vol. 1, no. 1, p. 100004, Mar. 2025. doi: https://doi.org/10.1016/j.finr.2025.100004
M. Li, H. Tao, M. Liu, and T. He, “STUDY ON ENHANCED FAULT DIAGNOSIS OF CHILLER UNITS IN HVAC SYSTEMS UNDER THE IMBALANCED DATA ENVIRONMENT USING GA-OPTIMIZED LIGHTGBM,” Energy Build, vol. 330, p. 115360, Mar. 2025. doi: https://doi.org/10.1016/j.enbuild.2025.115360
Y. Zhang, C. Zhu, and Q. Wang, “LIGHTGBM‐BASED MODEL FOR METRO PASSENGER VOLUME FORECASTING,” IET Intelligent Transport Systems, vol. 14, no. 13, pp. 1815–1823, Dec. 2020. doi: https://doi.org/10.1049/iet-its.2020.0396
J. T. Hancock and T. M. Khoshgoftaar, “CATBOOST FOR BIG DATA: AN INTERDISCIPLINARY REVIEW,” J Big Data, vol. 7, no. 1, p. 94, Dec. 2020. doi: https://doi.org/10.1186/s40537-020-00369-8
M. T. Syamkalla, S. Khomsah, and Y. S. R. Nur, “IMPLEMENTASI ALGORITMA CATBOOST DAN SHAPLEY ADDITIVE EXPLANATIONS (SHAP) DALAM MEMPREDIKSI POPULARITAS GAME INDIE PADA PLATFORM STEAM,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 11, no. 4, pp. 777–786, Aug. 2024. doi: https://doi.org/10.25126/jtiik.1148503
L. O. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CATBOOST: UNBIASED BOOSTING WITH CATEGORICAL FEATURES,” in Neural Information Processing Systems, 2018, pp. 6639–6649.
M. H. Sulaiman, Z. Mustaffa, A. S. Samsudin, A. I. Mohamed, and M. M. Saari, “ELECTRIC VEHICLE BATTERY STATE OF CHARGE ESTIMATION USING METAHEURISTIC-OPTIMIZED CATBOOST ALGORITHMS,” Franklin Open, vol. 11, p. 100293, Jun. 2025. doi: https://doi.org/10.1016/j.fraope.2025.100293
I. Rehan, M. U. Rehman, M. Aamir, and S. Islam, “A CATBOOST AND EXTRATREES-BASED SOFTVOTING ENSEMBLE APPROACH FOR NON-INVASIVE DIABETES DETECTION USING HAIR LIBS SPECTRAL DATA,” Microchemical Journal, vol. 217, p. 114980, Oct. 2025. doi: https://doi.org/10.1016/j.microc.2025.114980
M. Shehab, R. Taherdangkoo, and C. Butscher, “A PHYSICS-BASED CATBOOST MODEL FOR WATER RETENTION OF COMPACTED BENTONITE WITH GLOBAL SENSITIVITY ANALYSIS,” Appl Clay Sci, vol. 277, p. 107948, Dec. 2025. doi: https://doi.org/10.1016/j.clay.2025.107948
H. Lavaei, M. Esmaeili, and M. Mehraein, “ENHANCED PREDICTION OF SCOUR DIMENSIONS: TEMPORAL VARIATIONS INDUCED BY TURBULENT PLANE WALL JETS USING FFNN, CATBOOST, AND XGBOOST MODELS,” Ocean Engineering, vol. 333, p. 121539, Jul. 2025. doi: https://doi.org/10.1016/j.oceaneng.2025.121539
B. So and E. A. Valdez, “ZERO-INFLATED TWEEDIE BOOSTED TREES WITH CATBOOST FOR INSURANCE LOSS ANALYTICS,” Appl Soft Comput, vol. 177, p. 113226, Jun. 2025. doi: https://doi.org/10.1016/j.asoc.2025.113226
T. Fawcett, “AN INTRODUCTION TO ROC ANALYSIS,” Pattern Recognit Lett, vol. 27, no. 8, pp. 861–874, Jun. 2006. doi: https://doi.org/10.1016/j.patrec.2005.10.010
D. M. W. Powers, “EVALUATION: FROM PRECISION, RECALL AND F-MEASURE TO ROC, INFORMEDNESS, MARKEDNESS AND CORRELATION,” International Journal of Machine Learning Technology , vol. 2, no. 1, 2011.
Copyright (c) 2026 Novri Suhermi, Rahida Rihhadatul Aisy, Aulia Afifatur Rohmah, Anis Alif Nurhayati, Agnes Nathania Pramesty, Aura Lovi Ardanika, Fauziyah Nurul Isnaini

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.




1.gif)


