Improving Land Use Classification Accuracy Using Zonal Statistics And Supervised Machine Learning

  • Gede Awantara Departement of Statistics and Data Science, IPB University
  • Kusman Sadik
  • Agus Mohamad Soleh Departement of Statistics and Data Science, IPB University
  • Cici Suhaeni Departement of Statistics and Data Science, IPB University
Keywords: geospatial analysis, land use classification, Sentinel-2, supervised machine learning, zonal statistics

Abstract

This study aims to improve land use classification accuracy by integrating zonal statistics with supervised machine learning using Sentinel-2 imagery. Two classification models were developed: Model A based on single-pixel values and Model B using aggregated zonal statistics derived from polygon shapefile data. Two algorithms, Random Forest and Classification and Regression Trees (CART), were implemented and evaluated through 5-fold cross-validation. The results show that Model B consistently outperformed Model A, with the best performance achieved by Random Forest Model B, reaching an overall accuracy of 73.74% and a kappa coefficient of 0.5999. Class-wise evaluation based on F1-score revealed strong performance in dominant classes such as Residential, Water Bodies, and Rice Fields, while underrepresented classes like Cropland and Shrubland were more difficult to classify due to class imbalance. These findings highlight the effectiveness of zonal statistics in producing more representative training features and improving model stability and accuracy in land use classification tasks.

Downloads

Download data is not yet available.
Published
2026-06-01