COMPARISON OF XGBOOST AND RANDOM FOREST METHODS IN PREDICTING AIR POLLUTION LEVELS
Abstract
Air is one of the elements needed by living things, including humans, to survive. The air quality in an area also affects the health and quality of human life and its surrounding environment. However, with the current phenomenon, the influence of the increasing number and mobility of humans actually degrades air quality, caused by the pollutants produced. For further impacts, poor air quality can reduce human life expectancy. Big cities in Indonesia, such as Surabaya, also experience the same thing due to the lack of public awareness of air pollution. The biggest contributors to air pollution are motor vehicles and industrial activities that emit carbon monoxide (CO), nitrogen oxides (NO), ozone (O3), and other particles (PM10). This condition is addressed by the Surabaya City Government by installing air condition measuring devices at points considered prone to pollution. This device works to measure urban air conditions daily and provides data that can be utilized to establish strategic policies. By utilizing the data, in this research, we implemented two prediction methods from machine learning technology, namely XG Boost and Random Forest. In accordance with the objective of this research, both methods will be compared for accuracy in predicting air pollution levels in Surabaya based on Ozon (O3) substance within the period of January 1, 2020, to December 31, 2020. Both of them have a similarity in that they implement tree-ensemble based, which are appropriate for handling non-linear data. The XG Boost method managed to achieve the best error value of 0.0510, and the Random Forest method reached the best error value of 0.0468.
Downloads
References
S. Suhardono et al., “CHANGES IN THE DISTRIBUTION OF AIR POLLUTANTS (CARBON MONOXIDE) DURING THE CONTROL OF THE COVID-19 PANDEMIC IN JAKARTA, SURABAYA, AND YOGYAKARTA, INDONESIA,” J. Ecol. Eng., vol. 24, no. 4, 202. doi: https://doi.org/10.12911/22998993/159508
B. Wardhani and V. Dugis, “GREENING SURABAYA: THE CITY’S ROLE IN SHAPING ENVIRONMENTAL DIPLOMACY,” Bandung, vol. 7, no. 2, pp. 236–258, 2020. doi: https://doi.org/10.1163/21983534-00702005
B. Angelevska, V. Atanasova, and I. Andreevski, “URBAN AIR QUALITY GUIDANCE BASED ON MEASURES CATEGORIZATION IN ROAD TRANSPORT,” Civil Eng. J., vol. 7, no. 2, pp. 253–267, 2021. doi: https://doi.org/10.28991/cej-2021-03091651
M. N. Anwar et al., “EMERGING CHALLENGES OF AIR POLLUTION AND PARTICULATE MATTER IN CHINA, INDIA, AND PAKISTAN AND MITIGATING SOLUTIONS,” J. Hazard. Mater., vol. 416, p. 125851, 2021. doi: https://doi.org/10.1016/j.jhazmat.2021.125851
G.-P. Bălă, R.-M. Râjnoveanu, E. Tudorache, R. Motișan, and C. Oancea, “AIR POLLUTION EXPOSURE—THE (IN)VISIBLE RISK FACTOR FOR RESPIRATORY DISEASES,” Environmental Science and Pollution Research, vol. 28, no. 16, pp. 19615–19628, Mar. 2021. doi: https://doi.org/10.1007/s11356-021-13208-x
N. Karin, G. Darmawan, and T. Hendrawati, “ENHANCING 〖PM〗_2.5 PREDICTION IN KEMAYORAN DISTRICT, DKI JAKARTA USING DEEP BILSTM METHOD,” BAREKENG JURNAL ILMU MATEMATIKA DAN TERAPAN, vol. 19, no. 1, pp. 185–198, Jan. 2025. doi: https://doi.org/10.30598/barekengvol19iss1pp185-198
J. Yang, Y. Tian, and C. H. Wu, “AIR QUALITY PREDICTION AND RANKING ASSESSMENT BASED ON BOOTSTRAP-XGBOOST ALGORITHM AND ORDINAL CLASSIFICATION MODELS,” Atmosphere, vol. 15, no. 8, p. 925, 2024. doi: https://doi.org/10.3390/atmos15080925
R. S. Sokhi et al., “ADVANCES IN AIR QUALITY RESEARCH–CURRENT AND EMERGING CHALLENGES,” Atmos. Chem. Phys. Discuss., pp. 1–133, 2021. doi: https://doi.org/10.5194/acp-22-4615-2022
I. Eguiluz-Gracia et al., “THE NEED FOR CLEAN AIR: THE WAY AIR POLLUTION AND CLIMATE CHANGE AFFECT ALLERGIC RHINITIS AND ASTHMA,” Allergy, vol. 75, no. 9, pp. 2170–2184, 2020. doi: https://doi.org/10.1111/all.14177
A. Feberina, A. W. E. Mulyadi, and R. H. Haryanti, “COLLABORATIVE GOVERNANCE IN SOLVING AIR POLLUTION PROBLEMS IN INDONESIA: A SYSTEMATIC LITERATURE REVIEW,” in IOP Conf. Ser.: Earth Environ. Sci., vol. 905, no. 1, p. 012097, Nov. 2021. doi: https://doi.org/10.1088/1755-1315/905/1/012097
N. M. N. Fitriana and C. W. Rubiyanto, “THE IMPACT OF SISTER CITY SURABAYA–KITAKYUSHU COOPERATION ON ENVIRONMENTAL DEVELOPMENT IN SURABAYA,” J. Paradiplomacy City Netw., vol. 1, no. 1, pp. 27–38, 2022. doi: https://doi.org/10.18196/jpcn.v1i1.15
M. Sukarno and S. A. G. Putri, “SMART ENVIRONMENT PLANNING FOR SMART CITY BASED ON REGIONAL MEDIUM-TERM DEVELOPMENT PLAN SURABAYA CITY 2021–2026,” in IOP Conf. Ser.: Earth Environ. Sci., vol. 1105, no. 1, p. 012023, Dec. 2022. doi: https://doi.org/10.1088/1755-1315/1105/1/012023
Y. Deng et al., “MULTI-HIERARCHICAL NANOFIBER MEMBRANE WITH CURVED-RIBBON STRUCTURE FABRICATED BY GREEN ELECTROSPINNING FOR EFFICIENT, BREATHABLE AND SUSTAINABLE AIR FILTRATION,” J. Membr. Sci., vol. 660, p. 120857, 2022. doi: https://doi.org/10.1016/j.memsci.2022.120857
A. P. Ratnasari, B. Susetyo, and K. A. Notodiputro, “COMPARISON OF DOUBLE RANDOM FOREST AND LONG SHORT-TERM MEMORY METHODS FOR ANALYZING ECONOMIC INDICATOR DATA,” Barekeng J. Ilmu Mat. Terap., vol. 17, no. 2, pp. 0757–0766, Jun. 2023, doi: https://doi.org/10.30598/barekengvol17iss2pp0757-0766
J. U. Hansen and P. Quinon, “THE IMPORTANCE OF EXPERT KNOWLEDGE IN BIG DATA AND MACHINE LEARNING,” Synthese, vol. 201, no. 2, p. 35, 2023. doi: https://doi.org/10.1007/s11229-023-04041-5
P. Kumar and M. Sharma, “DATA, MACHINE LEARNING, AND HUMAN DOMAIN EXPERTS: NONE IS BETTER THAN THEIR COLLABORATION,” Int. J. Hum.–Comput. Interact., vol. 38, no. 14, pp. 1307–1320, 2022. doi: https://doi.org/10.1080/10447318.2021.2002040
H. Ke, S. Gong, J. He, L. Zhang, and J. Mo, “A HYBRID XGBOOST-SMOTE MODEL FOR OPTIMIZATION OF OPERATIONAL AIR QUALITY NUMERICAL MODEL FORECASTS,” Front. Environ. Sci., vol. 10, p. 1007530, 2022. doi: https://doi.org/10.3389/fenvs.2022.1007530
B. Zhang, Y. Zhang, and X. Jiang, “FEATURE SELECTION FOR GLOBAL TROPOSPHERIC OZONE PREDICTION BASED ON THE BO-XGBOOST-RFE ALGORITHM,” Sci. Rep., vol. 12, p. 9244, 2022. doi: https://doi.org/10.1038/s41598-022-13498-2
N. Palanichamy, S. C. Haw, S. Subramanian, R. Murugan, and K. Govindasamy, “MACHINE LEARNING METHODS TO PREDICT PARTICULATE MATTER PM2.5,” F1000Research, vol. 11, 2022. doi: https://doi.org/10.12688/f1000research.73166.1
L. Gaur et al., “DISPOSITION OF YOUTH IN PREDICTING SUSTAINABLE DEVELOPMENT GOALS USING THE NEURO-FUZZY AND RANDOM FOREST ALGORITHMS,” Human-Centric Comput. Inf. Sci., vol. 11, 2021. doi: https://doi.org/10.22967/HCIS.2021.11.024
D. K. Dalimunthe and R. B. F. Hakim, “APPLICATION OF RANDOM FOREST ALGORITHM ON WATCH PRICE PREDICTION SYSTEM USING FRAMEWORK FLASK,” Barekeng J. Ilmu Mat. Terap., vol. 17, no. 1, pp. 0171–0184, Apr. 2023. doi: https://doi.org/10.30598/barekengvol17iss1pp0171-0184.
T. Madan, S. Sagar, and D. Virmani, “AIR QUALITY PREDICTION USING MACHINE LEARNING ALGORITHMS—A REVIEW,” in 2020 2nd Int. Conf. Adv. Comput., Commun. Control Netw. (ICACCCN), 2020, pp. 140–145. doi: https://doi.org/10.1109/ICACCCN51052.2020.9362912
J. Lu et al., “ESTIMATION OF MONTHLY 1 KM RESOLUTION PM2.5 CONCENTRATIONS USING A RANDOM FOREST MODEL OVER ‘2+26’ CITIES, CHINA,” Urban Climate, vol. 35, p. 100734, 2021. doi: https://doi.org/10.1016/j.uclim.2020.100734
J. Ma, Z. Yu, Y. Qu, J. Xu, and Y. Cao, “APPLICATION OF THE XGBOOST MACHINE LEARNING METHOD IN PM2.5 PREDICTION: A CASE STUDY OF SHANGHAI,” Aerosol Air Qual. Res., vol. 20, no. 1, pp. 128–138, 2020. doi: https://doi.org/10.4209/aaqr.2019.08.0408
X. Ma, C. Fang, and J. Ji, “PREDICTION OF OUTDOOR AIR TEMPERATURE AND HUMIDITY USING XGBOOST,” in IOP Conf. Ser.: Earth Environ. Sci., vol. 427, no. 1, p. 012013, 2020. doi: https://doi.org/10.1088/1755-1315/427/1/012013
T. V. Vu et al., “ASSESSING THE IMPACT OF CLEAN AIR ACTION ON AIR QUALITY TRENDS IN BEIJING USING A MACHINE LEARNING TECHNIQUE,” Atmos. Chem. Phys., vol. 19, no. 17, pp. 11303–11314, 2019. doi: https://doi.org/10.5194/acp-19-11303-2019
M. Attallah, “PEARSON’S CORRELATION UNDER THE SCOPE: ASSESSMENT OF THE EFFICIENCY OF PEARSON’S CORRELATION TO SELECT PREDICTOR VARIABLES FOR LINEAR MODELS,” arXiv (Cornell University), Sep. 2024
Liu, B., Tan, X., Jin, Y., Yu, W., & Li, C. (2021). APPLICATION OF RR-XGBOOST COMBINED MODEL IN DATA CALIBRATION OF MICRO AIR QUALITY DETECTOR. Scientific Reports, 11(1), 15662. doi: https://doi.org/10.1038/s41598-021-95027-1
V. K. Gupta, A. Gupta, D. Kumar, and A. Sardana, “PREDICTION OF COVID-19 CONFIRMED, DEATH, AND CURED CASES IN INDIA USING RANDOM FOREST MODEL,” Big Data Min. Anal., vol. 4, no. 2, pp. 116–123, 2021. doi: https://doi.org/10.26599/BDMA.2020.9020016
A. Nyangarika et al., “ENERGY STABILITY AND DECARBONIZATION IN DEVELOPING COUNTRIES: RANDOM FOREST APPROACH FOR FORECASTING OF CRUDE OIL TRADE FLOWS AND MACRO INDICATORS,” Front. Environ. Sci., vol. 10, p. 1031343, 2022. doi: https://doi.org/10.3389/fenvs.2022.1031343
D. Novita, T. Herlambang, V. Asy’ari, A. Alimudin, and H. Arof, “COMPARISON OF K-NEAREST NEIGHBOR AND NEURAL NETWORK FOR PREDICTION OF INTERNATIONAL VISITORS IN EAST JAVA,” Barekeng J. Ilmu Mat. Terap., vol. 18, no. 3, pp. 2057–2070, Jul. 2024, doi: https://doi.org/10.30598/barekengvol18iss3pp2057-2070.
M. A. Khan, M. I. Shah, M. F. Javed, M. I. Khan, S. Rasheed, M. A. El-Shorbagy, E. R. El-Zahar, and M. Y. Malik, “APPLICATION OF RANDOM FOREST FOR MODELLING OF SURFACE SALINITY,” Ain Shams Engineering Journal, vol. 13, no. 4, p. 101635, Jun. 2022. doi: https://doi.org/10.1016/j.asej.2021.11.004
Copyright (c) 2025 Akas Yekti Pulih Asih, Firman Yudianto, Puguh Triwinanto, Rachman Sinatriya Marjianto, Teguh Herlambang, Hamzah Arof

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.




1.gif)


