GEOGRAPHICALLY WEIGHTED MACHINE LEARNING MODEL FOR ADDRESSING SPATIAL HETEROGENEITY OF PUBLIC HEALTH DEVELOPMENT INDEX IN JAVA ISLAND
Abstract
Random Forest (RF) machine learning models have emerged as a prominent algorithm, addressing problems arising from the sole use of decision trees, such as overfitting and instability. However, conventional RF has global coverage that may need to capture spatial variations better. Based on the analysis of the level of public health development, the relationship between the level of health development and risk factors can vary spatially. We use a modified RF algorithm called Geographically Weighted Random Forest (GW-RF) to address this challenge. GW-RF, as a tree-based non-parametric machine learning model, can help explore and visualize relationships between the Public Health Development Index (PHDI) as response variables and factors that are indicators at the district level. GW-RF output is compared with global output, which is RF in 2018 using the percentage of the population with access to clean/decent water (X1), consumption of eggs and milk per capita per week (X2), number of healthcare facilities per 1000 people (X3), number of doctors per 1000 people (X4), pure participation rate ratio female/male (X5), percentage of households that have hand washing facilities with soap and water (X6) as independent variables. Our results show that the non-parametric GW-RF model shows high potential for explaining spatial heterogeneity and predicting PHDI versus a global model when including six major risk factors. However, some of these predictions mean little. Findings of spatial heterogeneity using GW-RF show the need to consider local factors in approaches to increasing PHDI values. Spatial analysis of PHDI provides valuable information for determining geographic targets for areas whose PHDI values need to be improved.
Downloads
References
C. F. Cockx K, Incorporating spatial non-stationarity to improve dasymetric mapping of popu lation. Appl Geogr, 2015.
G. B. Hengl T, Nussbaum M, Wright MN, Heuvelink GBM, Random forest as a generic frame work for predictive modeling of spatial and spatio-temporal variables. PeerJ, 2018.
B. Liu, J., Khattak, A.J., Wali, “Do safety performance functions used for predicting crash frequency vary across space? Applying geographically weighted regressions to account for spatial heterogeneity,” Accid. Anal, vol. Prev. 109, pp. 132–142, 2017, doi: http://dx.doi.org/10.1016/j.aap.2017.10.012.
S. Quiñones, A. Goyal, and Z. U. Ahmed, “Geographically weighted machine learning model for untangling spatial heterogeneity of type 2 diabetes mellitus (T2D) prevalence in the USA,” Sci. Rep., vol. 11, no. 1, pp. 1–13, 2021, doi: 10.1038/s41598-021-85381-5.
S. Santos, F., Graw, V., Bonilla, “A geographically weighted random forest approach for evaluate forest change drivers in the Northern Ecuadorian Amazon,” PLoS One, vol. 14, no. 12, p. e0226224, 2019.
K. W. Fotheringham AS, Yang W, Multiscale geographically weighted regression (MGWR). Ann Am Assoc Geogr., 2017.
S. Georganos and S. Kalogirou, “A Forest of Forests: A Spatially Weighted and Computationally Efficient Formulation of Geographical Random Forests,” ISPRS Int. J. Geo-Information, vol. 11, no. 9, 2022, doi: 10.3390/ijgi11090471.
S. Wang, K. Gao, L. Zhang, B. Yu, and S. M. Easa, “Geographically weighted machine learning for modeling spatial heterogeneity in traffic crash frequency and determinants in US,” Accid. Anal. Prev., vol. 199, no. March, 2024, doi: 10.1016/j.aap.2024.107528.
R. Labonte and G. Laverack, Capacity Building in Health Promotion, Part I: For Whom? And for What Purpose? vol. 11: Crit. Public Health, 2001.
W. H. O. (WHO), World Health Statistics 2021: Monitoring Health for SDGs. 2021.
D. H. Tjandrarini and dkk, Indeks Pembangunan Kesehatan Masyarakat 2018. Jakarta: Lembaga Penerbit Badan Penelitian dan Pengembangan Kesehatan, 2019.
A. M. H. Putri, “Perhatian! Indeks Ketahanan Kesehatan RI Masih Jauh di Bawah,” CNBC Indonesia, 2023, 2023. https://www.cnbcindonesia.com/research/20230315085601-128-421755/perhatian-indeks-ketahanan-kesehatan-ri-masih-jauh-di-bawah (accessed Nov. 28, 2023).
World Health Organization (WHO), Water, Sanitation, Hygiene, and Health. 2022.
J. Y. Shin, P. Xun, Y. Nakamura, and K. He, “Egg consumption in relation to risk of cardiovascular disease and diabetes: a systematic review and meta-analysis.,” Am. J. Clin. Nutr., vol. 98, no. 1, pp. 146–159, 2013, doi: 10.3945/ajcn.112.051318.
D. Feskanich, W. C. Willett, and G. A. Colditz, “Calcium, vitamin D, milk consumption, and hip fractures: a prospective study among postmenopausal women,” Am. J. Clin. Nutr., vol. 77, no. 2, pp. 504–511, 2003, doi: 10.1093/ajcn/77.2.504.
World Health Organization (WHO), “Everybody’s business: Strengthening health systems to improve health outcomes: WHO’s framework for action.,” World Health Organization (WHO), 2016. https://www.who.int/healthsystems/strategy/everybodys_business.pdf (accessed Nov. 28, 2023).
Q. S. Wardhani, S. S. Handajani, and I. Susanto, “Masyarakat Jawa Timur dengan metode,” J. Apl. Stat. dan Komputasi, vol. 14, no. 2, pp. 1–12, 2022, [Online]. Available: https://doi.org/10.34123/jurnalasks.v14i2.333
M. Fathurahman, Purhadi, Sutikno, and V. Ratnasari, “Geographically Weighted Multivariate Logistic Regression Model and Its Application,” Abstr. Appl. Anal., vol. 2020, 2020, doi: 10.1155/2020/8353481.
U. K. Krismayanto and E. Pasaribu, “Analisis Regresi Spasial Indeks Pembangunan Kesehatan Masyarakat dan Paradoks Simpson Kabupaten/Kota di Pulau Sumatera Tahun 2018,” Semin. Nas. Off. Stat., vol. 2022, no. 1, pp. 1037–1052, 2022, doi: 10.34123/semnasoffstat.v2022i1.1330.
Breiman L., Random forests. Machine Learn, 2001.
S. Georganos et al., “Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling,” Geocarto Int., vol. 36, no. 2, pp. 121–136, 2021, doi: 10.1080/10106049.2019.1595177.
S. N. Khan, D. Li, and M. Maimaitijiang, “A Geographically Weighted Random Forest Approach to Predict Corn Yield in the US Corn Belt,” Remote Sens., vol. 14, no. 12, pp. 1–21, 2022, doi: 10.3390/rs14122843.
Y. Luo, J. Yan, and S. McClure, “Distribution of the environmental and socioeconomic risk factors on COVID-19 death rate across continental USA: a spatial nonlinear analysis,” Environ. Sci. Pollut. Res., vol. 28, no. 6, pp. 6587–6599, 2021, doi: 10.1007/s11356-020-10962-2.
H. Ilma, K. A. Notodiputro, and B. Sartono, “Association Rules in Random Forest for the Most Interpretable Model,” BAREKENG J. Ilmu Mat. dan Terap., vol. 17, no. 1, pp. 0185–0196, 2023, doi: 10.30598/barekengvol17iss1pp0185-0196.
C. M. Fotheringham AS, Brunsdon C, Geographically weighted regression: the analysis of spa tially varying relationships. Hoboken, NJ: John Wiley & Sons., 2003.
C. M. Brunsdon C, Fotheringham S, Geographically weighted regression. J Royal Stat Soc D, 1998.
K. S. Georganos S, Abdi AM, Tenenbaum DE, “Examining the NDVI-rainfall relationship in the semi-arid Sahel using geographically weighted regression,” J Arid Env., no. 146, pp. 64–74, 2017.
G. S. Kalogirou S, SpatialML. R Foundation for Statistical Computing., 2018.
D. Wu, Y. Zhang, and Q. Xiang, “Geographically weighted random forests for macro-level crash frequency prediction,” Accid. Anal. Prev., vol. 194, no. November 2023, p. 107370, 2024, doi: 10.1016/j.aap.2023.107370.
Copyright (c) 2024 Muhammad Azis Suprayogi, Bagus Sartono, Khairil Anwar Notodiputro
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.