Improving Land Use Classification Accuracy Using Zonal Statistics And Supervised Machine Learning
Abstract
This study aims to improve land use classification accuracy by integrating zonal statistics with supervised machine learning using Sentinel-2 imagery. Two classification models were developed: Model A based on single-pixel values and Model B using aggregated zonal statistics derived from polygon shapefile data. Two algorithms, Random Forest and Classification and Regression Trees (CART), were implemented and evaluated through 5-fold cross-validation. The results show that Model B consistently outperformed Model A, with the best performance achieved by Random Forest Model B, reaching an overall accuracy of 73.74% and a kappa coefficient of 0.5999. Class-wise evaluation based on F1-score revealed strong performance in dominant classes such as Residential, Water Bodies, and Rice Fields, while underrepresented classes like Cropland and Shrubland were more difficult to classify due to class imbalance. These findings highlight the effectiveness of zonal statistics in producing more representative training features and improving model stability and accuracy in land use classification tasks.
Downloads
Copyright (c) 2026 Gede Awantara, Kusman Sadik, Agus Mohamad Soleh, Cici Suhaeni

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.




















