ASSOCIATION RULES IN RANDOM FOREST FOR THE MOST INTERPRETABLE MODEL

Hafizah Ilma; Khairil Anwar Notodiputro; Bagus Sartono

doi:10.30598/barekengvol17iss1pp0185-0196

Hafizah Ilma Department of Statistics, IPB University, Indonesia
Khairil Anwar Notodiputro Department of Statistics, IPB University, Indonesia
Bagus Sartono Department of Statistics, IPB University, Indonesia

DOI: https://doi.org/10.30598/barekengvol17iss1pp0185-0196

Keywords: Interpretable model, random forest, rule extraction, association rule

Abstract

Random forest is one of the most popular ensemble methods and has many advantages. However, random forest is a "black-box" model, so the model is difficult to interpret. This study discusses the interpretation of random forest with association rules technique using rules extracted from each decision tree in the random forest model. This analysis involves simulation and empirical data, to determine the factors that affect the poverty status of households in Tasikmalaya. The empirical data was sourced from Badan Pusat Statistik (BPS), the National Socio-Economic Survey (SUSENAS) data for West Java Province in 2019. The results obtained are based on simulation data, the association rules technique can extract the set of rules that characterize the target variable. The application of interpretable random forest to empirical data shows that the rules that most distinguish the poverty status of households in Tasikmalaya are house wall materials and the main source of drinking water, house wall materials and cooking fuel, as well as house wall materials and motorcycle ownership.

Downloads

Download data is not yet available.

References

I. Nirmala, “Prediction of Undergraduate Student’s Completion Status Using Missforest Imputation in Random Forest and XGBoost Models,” IPB University, 2021.

G. Biau and E. Scornet, “A Random Forest Guided Tour,” TEST, 2016.

K. G. Liakos, P. Busato, D. Moshou, S. Pearson, and D. Bochtis, “Machine Learning in Agriculture: A Review,” Sensors (Switzerland), vol. 18, no. 8, pp. 1–29, 2018, doi: 10.3390/s18082674.

Y. Cao, H. Li, and Y. Yang, “Combining Random Forest and Multicollinearity Modeling for Index Tracking,” Commun. Stat. Comput., 2022.

I. Gupta, V. Sharma, S. Kaur, and A. K. Singh, “PCA-RF: An Efficient Parkinson’s Disease Prediction Model based on Random Forest Classification,” arXiv Prepr. arXiv2203.11287 Search..., 2022.

A. I. Weinberg and M. Last, “Selecting a Representative Decision Tree from an Ensemble of Decision-Tree Models for Fast Big Data Classification,” J. Big Data, vol. 6, no. 1, 2019, doi: 10.1186/s40537-019-0186-3.

T. Miller, “Explanation in Artificial Intelligence: Insights from The Social Sciences,” Artif. Intell., vol. 267, 2019, doi: 10.1016/j.artint.2018.07.007.

T. Hastie, R. Tibshirani, G. James, and D. Witten, An introduction to statistical learning (2nd ed.), vol. 102. 2021.

H. Deng, “Interpreting Tree Ensembles with inTrees,” Int. J. Data Sci. Anal., vol. 7, no. 4, pp. 277–287, 2019, doi: 10.1007/s41060-018-0144-8.

J. Jiménez-Luna, F. Grisoni, and G. Schneider, “Drug Discovery with Explainable Artificial Intelligence,” Nat. Mach. Intell., 2020.

I. Narayanan et al., “SSD Failures in Datacenters: What? When? and Why?,” Proc. 9th ACM Int. Syst. Storage Conf., 2016.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. Elsevier, 2012. doi: 10.1016/C2009-0-61819-5.

G. Bakirli and D. Birant, “DTreeSim: A New Approach to Compute Decision Tree Similarity Using re-mining,” Turkish J. Electr. Eng. Comput. Sci., 2017.

H. Deng, X. Guan, and V. Khotilovich, “Package ‘inTrees,’” 2022.

S. Eskandarian, P. Bahrami, and P. Kazemi, “A Comprehensive Data Mining Approach to Estimate The Rate of Penetration: Application of Neural Network, Rule Based Models and Feature Ranking,” J. Pet. Sci. Eng., vol. 156, no. June, pp. 605–615, 2017, doi: 10.1016/j.petrol.2017.06.039.

J. Szlęk, A. Pacławski, R. Lau, R. Jachowicz, P. Kazemi, and A. Mendyk, “Empirical Search for Factors Affecting Mean Particle Size of PLGA Microspheres Containing Macromolecular Drugs,” Comput. Methods Programs Biomed., vol. 134, 2016, doi: 10.1016/j.cmpb.2016.07.006.

Gallego-Ortiz, M. C., and A.L, “Using Quantitative Features Extracted from t2-weighted MRI to Improve Breast MRI Computeraided Diagnosis (CAD),” PLoS ONE 12(11), 2017.

A. F. ́andez, S. G. ́ıa, F. Herrera, and N. V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress andChallenges, Marking the 15-year Anniversary,” J. Artif. Intell. Res., 2018.

H. Han, X. Guo, and H. Yu, “Variable Selection Using Mean Decrease Accuracy and Mean Decrease Gini based on Random Forest,” Proc. IEEE Int. Conf. Softw. Eng. Serv. Sci. ICSESS, vol. 0, pp. 219–224, 2016, doi: 10.1109/ICSESS.2016.7883053.

ASSOCIATION RULES IN RANDOM FOREST FOR THE MOST INTERPRETABLE MODEL

Abstract

Downloads

References

Editorial Office

Contact Info