LEVERAGING XGBOOST, LIGHTGBM, AND CATBOOST FOR ENHANCED CUSTOMER SEGMENTATION IN THE AUTOMOTIVE INDUSTRY

  • Novri Suhermi Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia https://orcid.org/0000-0002-8016-5803
  • Rahida Rihhadatul Aisy Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia https://orcid.org/0009-0006-5379-8185
  • Aulia Afifatur Rohmah Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia https://orcid.org/0009-0004-6946-8872
  • Anis Alif Nurhayati Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia https://orcid.org/0009-0004-7759-8212
  • Agnes Nathania Pramesty Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia https://orcid.org/0009-0000-4136-2188
  • Aura Lovi Ardanika Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia https://orcid.org/0009-0003-8588-8132
  • Fauziyah Nurul Isnaini Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Indonesia https://orcid.org/0009-0003-1970-5596
Keywords: CatBoost, Automotive industry, Customer segmentation, XGBoost, LightGBM

Abstract

This study evaluates the performance of three gradient boosting algorithms, XGBoost, LightGBM, and CatBoost, for customer segmentation in the automotive industry. Utilizing a dataset of 8,068 training and 2,627 testing observations with 11 demographic and behavioral variables, the research aims to classify customers into four segments. The methodology includes preprocessing (handling missing values, encoding), hyperparameter tuning via Randomized Search Cross-Validation, and evaluation using ROC AUC. Results indicate that XGBoost outperforms other models, achieving an AUC of 0.5837 on testing data with significant variables, while LightGBM and CatBoost scored 0.5834 and 0.5759, respectively. Key findings highlight the importance of feature selection, with Age, Profession, and Spending Score being the most influential predictors. The study concludes that XGBoost is the most robust for segmentation tasks, though all models exhibit challenges in distinguishing overlapping classes. These insights can guide data-driven marketing strategies in automotive and related sectors.

Downloads

Download data is not yet available.

References

Y. Afrida, “ANALISIS PEMBIAYAAN MURABAHAH DI PERBANKAN SYARIAH,” Jurnal Ekonomi dan Bisnis Islam (JEBI), vol. 1, no. 2, pp. 155–166, 2016.

K. Suresh, “CUSTOMER SEGMENTATION CLASSIFICATION,” https://www.kaggle.com/datasets/kaushiksuresh147/customer-segmentation.

W. Nugraha and M. Syarif, “TEKNIK WEIGHTING UNTUK MENGATASI KETIDAKSEIMBANGAN KELAS PADA PREDIKSI CHURN MENGGUNAKAN XGBOOST, LIGHTGBM, DAN CATBOOST,” Techno.Com, vol. 22, no. 1, pp. 97–108, Feb. 2023. doi: https://doi.org/10.33633/tc.v22i1.7191

S. E. Herni Yulianti, Oni Soesanto, and Yuana Sukmawaty, “PENERAPAN METODE EXTREME GRADIENT BOOSTING (XGBOOST) PADA KLASIFIKASI NASABAH KARTU KREDIT,” Journal of Mathematics: Theory and Applications, pp. 21–26, Aug. 2022. doi: https://doi.org/10.31605/jomta.v4i1.1792

I. Z. A. Illah, W. S. J. Sapu, and A. T. Damaliana, “IMPLEMENTASI METODE KLASIFIKASI LIGHTGBM DAN ANALISIS SURVIVAL DALAM MEMPREDIKSI PELANGGAN CHURN,” Jurnal Komtika (Komputasi dan Informatika), vol. 8, no. 1, pp. 43–53, Jun. 2024. doi: https://doi.org/10.31603/komtika.v8i1.11194

J. Lin, “APPLICATION OF MACHINE LEARNING IN PREDICTING CONSUMER BEHAVIOR AND PRECISION MARKETING,” PLoS One, vol. 20, no. 5, p. e0321854, May 2025. doi: https://doi.org/10.1371/journal.pone.0321854

A. Geron, HANDS-ON MACHINE LEARNING WITH SCIKIT-LEARN, KERAS, AND TENSORFLOW, 2nd Edition. O’Reilly Media, Inc., 2019.

Z. Mustaffa and M. H. Sulaiman, “ADVANCED FORECASTING OF BUILDING ENERGY LOADS WITH XGBOOST AND METAHEURISTIC ALGORITHMS INTEGRATION,” Energy Storage and Saving, Aug. 2025. doi: https://doi.org/10.1016/j.enss.2025.03.005

A. R. Zaidi, T. Abbas, A. Daud, O. Alghushairy, H. Dawood, and N. Sarwar, “ENHANCING ANDROID MALWARE DETECTION WITH XGBOOST AND CONVOLUTIONAL NEURAL NETWORKS,” Computers, Materials & Continua, vol. 84, no. 2, pp. 3281–3304, 2025. doi: https://doi.org/10.32604/cmc.2025.063646

X. Song, J. Shi, C. Zhu, F. Xian, Z. Dong, and J. Li, “XGBOOST MACHINE LEARNING ALGORITHM FOR PREDICTING UNPLANNED READMISSION IN ELDERLY PATIENTS WITH CORONARY HEART DISEASE,” Geriatr Nurs (Minneap), vol. 66, p. 103609, Nov. 2025. doi: https://doi.org/10.1016/j.gerinurse.2025.103609

N. Qin, et al., “FORECASTING THE MECHANICAL COMPACTION INFLUENCE ON SOYBEAN YIELD USING XGBOOST-ANN,” Information Processing in Agriculture, Sep. 2025. doi: https://doi.org/10.1016/j.inpa.2025.09.002

S.-K. Di, Y.-Y. Wang, D. Yang, Y.-H. Liu, J. Zhang, and W.-Z. Zheng, “SMOTE-ENHANCED XGBOOST FOR RAPID SEISMIC DAMAGE ASSESSMENT OF BRIDGE PORTFOLIOS,” Soil Dynamics and Earthquake Engineering, vol. 199, p. 109712, Dec. 2025. doi: https://doi.org/10.1016/j.soildyn.2025.109712

S. Raschka and V. Mirjalili, Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, 3rd ed. Birmingham: Packt Publishing., 2019.

L. Zhang, H. Liu, and Z. Fan, “GRADIENT BOOSTING MACHINES AND THEIR APPLICATIONS IN CLASSIFICATION TASKS: AN OVERVIEW,” ACM Comput. Surv., vol. 54, no. 5, 2021.

D. D. Rufo, T. G. Debelee, A. Ibenthal, and W. G. Negera, “DIAGNOSIS OF DIABETES MELLITUS USING GRADIENT BOOSTING MACHINE (LIGHTGBM),” Diagnostics, vol. 11, no. 9, p. 1714, Sep. 2021. doi: https://doi.org/10.3390/diagnostics11091714

H. Talebi, A. K. Bardsiri, and V. K. Bardsiri, “DEVELOPING A HYBRID MACHINE LEARNING MODEL FOR EMPLOYEE TURNOVER PREDICTION: INTEGRATING LIGHTGBM AND GENETIC ALGORITHMS,” Journal of Open Innovation: Technology, Market, and Complexity, vol. 11, no. 2, p. 100557, Jun. 2025. doi: https://doi.org/10.1016/j.joitmc.2025.100557

R. Fatahi, H. Abdollahi, M. Noaparast, and M. Hadizadeh, “MODELING PROCESS CONTROL VARIABLES OF A CEMENT VERTICAL ROLLER MILL USING LIGHTGBM: FEED RATE AND MAIN DRIVE POWER,” Chemical Engineering Research and Design, vol. 219, pp. 595–610, Jul. 2025. doi: https://doi.org/10.1016/j.cherd.2025.06.019

Z. Zhang, et al., “EXHAUST EMISSIONS PREDICTION IN SPARK IGNITION ENGINE USING LIGHTGBM OPTIMIZED WITH THE MARINE PREDATORS ALGORITHM,” Appl Therm Eng, vol. 275, p. 126800, Sep. 2025. doi: https://doi.org/10.1016/j.applthermaleng.2025.126800

A. Ampountolas and S. AlGharbi, “AN INNOVATIVE HYBRID LIGHTGBM-BPNN MODEL FOR ENHANCED COMMODITY FORECASTING ACCURACY,” Finance Research Open, vol. 1, no. 1, p. 100004, Mar. 2025. doi: https://doi.org/10.1016/j.finr.2025.100004

M. Li, H. Tao, M. Liu, and T. He, “STUDY ON ENHANCED FAULT DIAGNOSIS OF CHILLER UNITS IN HVAC SYSTEMS UNDER THE IMBALANCED DATA ENVIRONMENT USING GA-OPTIMIZED LIGHTGBM,” Energy Build, vol. 330, p. 115360, Mar. 2025. doi: https://doi.org/10.1016/j.enbuild.2025.115360

Y. Zhang, C. Zhu, and Q. Wang, “LIGHTGBM‐BASED MODEL FOR METRO PASSENGER VOLUME FORECASTING,” IET Intelligent Transport Systems, vol. 14, no. 13, pp. 1815–1823, Dec. 2020. doi: https://doi.org/10.1049/iet-its.2020.0396

J. T. Hancock and T. M. Khoshgoftaar, “CATBOOST FOR BIG DATA: AN INTERDISCIPLINARY REVIEW,” J Big Data, vol. 7, no. 1, p. 94, Dec. 2020. doi: https://doi.org/10.1186/s40537-020-00369-8

M. T. Syamkalla, S. Khomsah, and Y. S. R. Nur, “IMPLEMENTASI ALGORITMA CATBOOST DAN SHAPLEY ADDITIVE EXPLANATIONS (SHAP) DALAM MEMPREDIKSI POPULARITAS GAME INDIE PADA PLATFORM STEAM,” Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 11, no. 4, pp. 777–786, Aug. 2024. doi: https://doi.org/10.25126/jtiik.1148503

L. O. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CATBOOST: UNBIASED BOOSTING WITH CATEGORICAL FEATURES,” in Neural Information Processing Systems, 2018, pp. 6639–6649.

M. H. Sulaiman, Z. Mustaffa, A. S. Samsudin, A. I. Mohamed, and M. M. Saari, “ELECTRIC VEHICLE BATTERY STATE OF CHARGE ESTIMATION USING METAHEURISTIC-OPTIMIZED CATBOOST ALGORITHMS,” Franklin Open, vol. 11, p. 100293, Jun. 2025. doi: https://doi.org/10.1016/j.fraope.2025.100293

I. Rehan, M. U. Rehman, M. Aamir, and S. Islam, “A CATBOOST AND EXTRATREES-BASED SOFTVOTING ENSEMBLE APPROACH FOR NON-INVASIVE DIABETES DETECTION USING HAIR LIBS SPECTRAL DATA,” Microchemical Journal, vol. 217, p. 114980, Oct. 2025. doi: https://doi.org/10.1016/j.microc.2025.114980

M. Shehab, R. Taherdangkoo, and C. Butscher, “A PHYSICS-BASED CATBOOST MODEL FOR WATER RETENTION OF COMPACTED BENTONITE WITH GLOBAL SENSITIVITY ANALYSIS,” Appl Clay Sci, vol. 277, p. 107948, Dec. 2025. doi: https://doi.org/10.1016/j.clay.2025.107948

H. Lavaei, M. Esmaeili, and M. Mehraein, “ENHANCED PREDICTION OF SCOUR DIMENSIONS: TEMPORAL VARIATIONS INDUCED BY TURBULENT PLANE WALL JETS USING FFNN, CATBOOST, AND XGBOOST MODELS,” Ocean Engineering, vol. 333, p. 121539, Jul. 2025. doi: https://doi.org/10.1016/j.oceaneng.2025.121539

B. So and E. A. Valdez, “ZERO-INFLATED TWEEDIE BOOSTED TREES WITH CATBOOST FOR INSURANCE LOSS ANALYTICS,” Appl Soft Comput, vol. 177, p. 113226, Jun. 2025. doi: https://doi.org/10.1016/j.asoc.2025.113226

T. Fawcett, “AN INTRODUCTION TO ROC ANALYSIS,” Pattern Recognit Lett, vol. 27, no. 8, pp. 861–874, Jun. 2006. doi: https://doi.org/10.1016/j.patrec.2005.10.010

D. M. W. Powers, “EVALUATION: FROM PRECISION, RECALL AND F-MEASURE TO ROC, INFORMEDNESS, MARKEDNESS AND CORRELATION,” International Journal of Machine Learning Technology , vol. 2, no. 1, 2011.

Published
2026-04-08
How to Cite
[1]
N. Suhermi, “LEVERAGING XGBOOST, LIGHTGBM, AND CATBOOST FOR ENHANCED CUSTOMER SEGMENTATION IN THE AUTOMOTIVE INDUSTRY”, BAREKENG: J. Math. & App., vol. 20, no. 3, pp. 2281-2298, Apr. 2026.