Prediksi Konsentrasi Karbon Monoksida (CO) pada Stasiun Kualitas Udara DKI1 Jakarta Menggunakan Random Forest
Prediction of Carbon Monoxide (CO) Concentration at the DKI1 Jakarta Air Quality Station Using Random Forest
Abstract
This study develops a prediction model for carbon monoxide (CO) concentration in Jakarta using the Random Forest Regressor algorithm with Grid Search-based parameter optimization. The dataset consists of 1,540 daily observations from the DKI1 Air Quality Monitoring Station (January 2017 - March 2021) including meteorological variables and lagged CO values. Feature engineering produces 12 predictors through lag features, rolling mean, and rolling standard deviation. The optimal model with the configuration n_estimators=300, max_depth=10, min_samples_leaf=4, min_samples_split=10, and max_features='sqrt' achieves a testing RMSE of 4.3216 μg/m³ with a coefficient of determination R² = 0.3741. Feature importance analysis revealed that temporal features dominated (52.83% cumulative contribution), with the 3-day rolling mean (17.87%), lag 1 (17.62%), and 7-day rolling mean (17.34%) as the top 3 predictors. Although the model captured the overall trend well, systematic underprediction occurred at extreme values (errors up to -25 μg/m³), indicating the need for a hybrid approach with quantile regression or gradient boosting for improved tail risk capture. The findings support the use of temporal features as the primary anchor in short-term CO forecasting.
