MIXED-EFFECTS MODELS WITH GENERALIZED RANDOM FOREST: IMPROVED FOOD INSECURITY ANALYSIS

  • Herlin Fransiska Statistics and Data Science Study Program, School of Data Science, Mathematics, and Informatics, IPB University, Indonesia https://orcid.org/0000-0002-7983-5590
  • Agus Mohamad Soleh Statistics and Data Science Study Program, School of Data Science, Mathematics, and Informatics, IPB University, Indonesia https://orcid.org/0000-0002-2732-1985
  • Khairil Anwar Notodiputro Statistics and Data Science Study Program, School of Data Science, Mathematics, and Informatics, IPB University, Indonesia https://orcid.org/0000-0003-2892-4689
  • Erfiani Erfiani Statistics and Data Science Study Program, School of Data Science, Mathematics, and Informatics, IPB University, Indonesia https://orcid.org/0000-0001-5502-7321
Keywords: Food insecurity, Generalized random forest, Mixed-effects models, Prediction

Abstract

Food insecurity is a complex issue that requires a deep understanding of its influencing factors. Accurate predictions are crucial for effective interventions. Machine learning is well-suited to the large and complex data available in the big data era. However, machine learning generally does not accommodate hierarchical or clustered data structures, making them challenging for machine learning modeling. One model that accommodates hierarchical data structures is the mixed-effects model. This study introduces a novel approach to predict food insecurity by integrating mixed-effects models and a generalized random forest. Mixed-effects models capture variations in hierarchical or clustered data, such as differences between regions, and the generalized random forest, as extended and developed from the traditional random forest, is integrated to model fixed effects and improve prediction accuracy. The empirical data used were the food insecurity data from 2021 in West Java, Indonesia. The results show that mixed-effects models with a generalized random forest significantly improve the prediction accuracy compared to other models. The average performance measure shows GMEGRF is a good model and has a balanced accuracy value of 0.6789709, which is the highest result compared to other methods. This methodological advancement offers a new robust model for understanding and potentially mitigating food insecurity, ultimately informing efforts towards SDG 2 (Zero Hunger).

Downloads

Download data is not yet available.

References

K. P. Myers and J. L. Temple, “TRANSLATIONAL SCIENCE APPROACHES FOR FOOD INSECURITY RESEARCH,” Appetite, vol. 200, p. 107513, Sep. 2024, doi: https://doi.org/10.1016/j.appet.2024.107513.

G. Nica-Avram, J. Harvey, G. Smith, A. Smith, and J. Goulding, “IDENTIFYING FOOD INSECURITY IN FOOD SHARING NETWORKS VIA MACHINE LEARNING,” J Bus Res, vol. 131, pp. 469–484, Jul. 2021, doi: https://doi.org/10.1016/j.jbusres.2020.09.028.

A. H. Villacis, S. Badruddoza, A. K. Mishra, and J. Mayorga, “THE ROLE OF RECALL PERIODS WHEN PREDICTING FOOD INSECURITY: A MACHINE LEARNING APPLICATION IN NIGERIA,” Glob Food Sec, vol. 36, p. 100671, Mar. 2023, doi: https://doi.org/10.1016/j.gfs.2023.100671.

C. Gao, C. J. Fei, B. A. McCarl, and D. J. Leatham, “IDENTIFYING VULNERABLE HOUSEHOLDS USING MACHINE-LEARNING,” Sustainability (Switzerland), vol. 12, no. 15, Aug. 2020, doi: https://doi.org/10.3390/su12156002.

S. Gholami et al., “FOOD SECURITY ANALYSIS AND FORECASTING: A MACHINE LEARNING CASE STUDY IN SOUTHERN MALAWI,” Data Policy, vol. 4, no. 3, Oct. 2022, doi: https://doi.org/10.1017/dap.2022.25.

J. J. L. Westerveld et al, “FORECASTING TRANSITIONS IN THE STATE OF FOOD SECURITY WITH MACHINE LEARNING USING TRANSFERABLE FEATURES,” Science of The Total Environment, vol. 786, p. 147366, Sep. 2021, doi: https://doi.org/10.1016/j.scitotenv.2021.147366.

X. Shu and Y. Ye, “KNOWLEDGE DISCOVERY: METHODS FROM DATA MINING AND MACHINE LEARNING,” Soc Sci Res, vol. 110, p. 102817, Feb. 2023, doi: https://doi.org/10.1016/j.ssresearch.2022.102817.

A. Hajjem, F. Bellavance, and D. Larocque, “MIXED EFFECTS REGRESSION TREES FOR CLUSTERED DATA,” Stat Probab Lett, vol. 81, no. 4, pp. 451–459, Apr. 2011, doi: https://doi.org/10.1016/j.spl.2010.12.003.

A. Hajjem, F. Bellavance, and D. Larocque, “MIXED-EFFECTS RANDOM FOREST FOR CLUSTERED DATA,” J Stat Comput Simul, vol. 84, no. 6, pp. 1313–1328, Jun. 2014, doi: https://doi.org/10.1080/00949655.2012.741599.

A. Hajjem, D. Larocque, and F. Bellavance, “GENERALIZED MIXED EFFECTS REGRESSION TREES,” Stat Probab Lett, vol. 126, pp. 114–118, Jul. 2017, doi: https://doi.org/10.1016/j.spl.2017.02.033.

J. Hu and S. Szymczak, “A REVIEW ON LONGITUDINAL DATA ANALYSIS WITH RANDOM FOREST,” Brief Bioinform, vol. 24, no. 2, pp. 1–11, Mar. 2023, doi: https://doi.org/10.1093/bib/bbad002.

P. Krennmair and T. Schmid, “FLEXIBLE DOMAIN PREDICTION USING MIXED EFFECTS RANDOM FORESTS,” J R Stat Soc Ser C Appl Stat, vol. 71, no. 5, pp. 1865–1894, Nov. 2022, doi: https://doi.org/10.1111/rssc.12600.

M. Pellagatti, C. Masci, F. Ieva, and A. M. Paganoni, “GENERALIZED MIXED-EFFECTS RANDOM FOREST: A FLEXIBLE APPROACH TO PREDICT UNIVERSITY STUDENT DROPOUT,” Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 14, no. 3, pp. 241–257, Jun. 2021, doi: https://doi.org/10.1002/sam.11505.

R. J. Sela and J. S. Simonoff, “RE-EM TREES: A DATA MINING APPROACH FOR LONGITUDINAL AND CLUSTERED DATA,” Mach Learn, vol. 86, no. 2, pp. 169–207, Feb. 2012, doi: https://doi.org/10.1007/s10994-011-5258-3.

J. L. Speiser et al, “BIMM TREE: A DECISION TREE METHOD FOR MODELING CLUSTERED AND LONGITUDINAL BINARY OUTCOMES,” Commun Stat Simul Comput, vol. 49, no. 4, pp. 1004–1023, Apr. 2020, doi: https://doi.org/10.1080/03610918.2018.1490429.

L. Fontana, C. Masci, F. Ieva, and A. M. Paganoni, “PERFORMING LEARNING ANALYTICS VIA GENERALISED MIXED-EFFECTS TREES,” Data (Basel), vol. 6, no. 7, p. 74, Jul. 2021, doi: https://doi.org/10.3390/data6070074.

D. Kusumaningrum et al, “FOUR-PARAMETER BETA MIXED MODELS WITH SURVEY AND SENTINEL 2A SATELLITE DATA FOR PREDICTING PADDY PRODUCTIVITY,” Smart Agricultural Technology, vol. 9, Dec. 2024, doi: https://doi.org/10.1016/j.atech.2024.100525.

P. C. Chen, M. M. Yu, J. C. Shih, C. C. Chang, and S. H. Hsu, “A REASSESSMENT OF THE GLOBAL FOOD SECURITY INDEX BY USING A HIERARCHICAL DATA ENVELOPMENT ANALYSIS APPROACH,” Eur J Oper Res, vol. 272, no. 2, pp. 687–698, Jan. 2019, doi: https://doi.org/10.1016/j.ejor.2018.06.045.

L. Breiman, “RANDOM FORESTS,” Mach Learn, vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: https://doi.org/10.1023/A:1010933404324.

S. W. Raudenbush and A. S. Bryk, “HIERARCHICAL LINEAR MODELS: APPLICATIONS AND DATA ANALYSIS METHODS,” Applications and data analysis methods (Vol. 1), 2002.doi: https://doi.org/10.3758/s13428-017-0971-x

M. Fokkema, N. Smits, A. Zeileis, T. Hothorn, and H. Kelderman, “DETECTING TREATMENT-SUBGROUP INTERACTIONS IN CLUSTERED DATA WITH GENERALIZED LINEAR MIXED-EFFECTS MODEL TREES,” Behav Res Methods, vol. 50, no. 5, pp. 2016–2034, 2018, doi: 10.3758/s13428-017-0971-x.

S. Athey, J. Tibshirani, and S. Wager, “GENERALIZED RANDOM FORESTS,” https://doi.org/10.1214/18-AOS1709, vol. 47, no. 2, pp. 1148–1178, Apr. 2019, doi: https://doi.org/10.1214/18-AOS1709.

E. Zhou and D. Lee, “GENERATIVE ARTIFICIAL INTELLIGENCE, HUMAN CREATIVITY, AND ART,” PNAS Nexus, vol. 3, no. 3, Mar. 2024, doi: https://doi.org/10.1093/pnasnexus/pgae052.

H. Fransiska, A. M. Soleh, K. A. Notodiputro, and Erfiani, “EVALUATION OF MACHINE LEARNING MODELS BASED ON HOUSEHOLD FOOD INSECURITY DATA IN INDONESIA,” in BIO Web of Conferences, EDP Sciences, Apr. 2025. doi: https://doi.org/10.1051/bioconf/202517102011.

S. García, S. Ramírez-Gallego, J. Luengo, J. M. Benítez, and F. Herrera, “BIG DATA PREPROCESSING: METHODS AND PROSPECTS,” Big Data Anal, vol. 1, no. 1, Dec. 2016, doi: https://doi.org/10.1186/s41044-016-0014-0.

I. K. Nti, O. Nyarko-Boateng, and J. Aning, “PERFORMANCE OF MACHINE LEARNING ALGORITHMS WITH DIFFERENT K VALUES IN K-FOLD CROSSVALIDATION,” International Journal of Information Technology and Computer Science, vol. 13, no. 6, pp. 61–71, Dec. 2021, doi: https://doi.org/10.5815/ijitcs.2021.06.05.

G. Y. Lee, L. Alzamil, B. Doskenov, and A. Termehchy, “A SURVEY ON DATA CLEANING METHODS FOR IMPROVED MACHINE LEARNING MODEL PERFORMANCE,” Sep. 2021, [Online]. Available: http://arxiv.org/abs/2109.07127

P. Agasthi et al “PREDICTION OF PERMANENT PACEMAKER IMPLANTATION AFTER TRANSCATHETER AORTIC VALVE REPLACEMENT: THE ROLE OF MACHINE LEARNING,” World J Cardiol, vol. 15, no. 3, pp. 95–105, Mar. 2023, doi: https://doi.org/10.4330/wjc.v15.i3.95.

D. Krstinić, M. Braović, L. Šerić, and D. Božić-Štulić, “MULTI-LABEL CLASSIFIER PERFORMANCE EVALUATION WITH CONFUSION MATRIX,” ACADEMY AND INDUSTRY RESEARCH COLLABORATION CENTER (AIRCC), Jun. 2020, pp. 01–14. doi: https://doi.org/10.5121/csit.2020.100801.

S. H. Hasanah et al, “GOJEK DATA ANALYSIS THROUGH TEXT MINING USING SUPPORT VECTOR MACHINE (SVM) AND K-NEAREST NEIGHBOR (KNN),” BAREKENG: J. Math. & App, vol. 19, no. 2, pp. 889–0902, 2025, doi: https://doi.org/10.30598/barekengvol19iss2pp889-902.

M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: MULTI-LABEL CONFUSION MATRIX,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: https://doi.org/10.1109/ACCESS.2022.3151048.

I. Sriliana, S. Nugroho, W. Agwil, and E. D. Sihombing, “EVALUATION OF MULTIVARIATE ADAPTIVE REGRESSION SPLINES ON IMBALANCED DATASET FOR POVERTY CLASSIFICATION IN BENGKULU PROVINCE,” Barekeng, vol. 19, no. 2, pp. 1143–1156, Jun. 2025, doi: https://doi.org/10.30598/barekengvol19iss2pp1143-1156.

H. A. Salman, A. Kalakech, and A. Steiti, “RANDOM FOREST ALGORITHM OVERVIEW,” Babylonian Journal of Machine Learning, vol. 2024, pp. 69–79, Jun. 2024, doi: https://doi.org/10.58496/BJML/2024/007.

Published
2026-01-26
How to Cite
[1]
H. Fransiska, A. M. Soleh, K. A. Notodiputro, and E. Erfiani, “MIXED-EFFECTS MODELS WITH GENERALIZED RANDOM FOREST: IMPROVED FOOD INSECURITY ANALYSIS”, BAREKENG: J. Math. & App., vol. 20, no. 2, pp. 1111–1124, Jan. 2026.