ASSOCIATION RULES IN RANDOM FOREST FOR THE MOST INTERPRETABLE MODEL
Abstract
Random forest is one of the most popular ensemble methods and has many advantages. However, random forest is a "black-box" model, so the model is difficult to interpret. This study discusses the interpretation of random forest with association rules technique using rules extracted from each decision tree in the random forest model. This analysis involves simulation and empirical data, to determine the factors that affect the poverty status of households in Tasikmalaya. The empirical data was sourced from Badan Pusat Statistik (BPS), the National Socio-Economic Survey (SUSENAS) data for West Java Province in 2019. The results obtained are based on simulation data, the association rules technique can extract the set of rules that characterize the target variable. The application of interpretable random forest to empirical data shows that the rules that most distinguish the poverty status of households in Tasikmalaya are house wall materials and the main source of drinking water, house wall materials and cooking fuel, as well as house wall materials and motorcycle ownership.
Downloads
References
I. Nirmala, “Prediction of Undergraduate Student’s Completion Status Using Missforest Imputation in Random Forest and XGBoost Models,” IPB University, 2021.
G. Biau and E. Scornet, “A Random Forest Guided Tour,” TEST, 2016.
K. G. Liakos, P. Busato, D. Moshou, S. Pearson, and D. Bochtis, “Machine Learning in Agriculture: A Review,” Sensors (Switzerland), vol. 18, no. 8, pp. 1–29, 2018, doi: 10.3390/s18082674.
Y. Cao, H. Li, and Y. Yang, “Combining Random Forest and Multicollinearity Modeling for Index Tracking,” Commun. Stat. Comput., 2022.
I. Gupta, V. Sharma, S. Kaur, and A. K. Singh, “PCA-RF: An Efficient Parkinson’s Disease Prediction Model based on Random Forest Classification,” arXiv Prepr. arXiv2203.11287 Search..., 2022.
A. I. Weinberg and M. Last, “Selecting a Representative Decision Tree from an Ensemble of Decision-Tree Models for Fast Big Data Classification,” J. Big Data, vol. 6, no. 1, 2019, doi: 10.1186/s40537-019-0186-3.
T. Miller, “Explanation in Artificial Intelligence: Insights from The Social Sciences,” Artif. Intell., vol. 267, 2019, doi: 10.1016/j.artint.2018.07.007.
T. Hastie, R. Tibshirani, G. James, and D. Witten, An introduction to statistical learning (2nd ed.), vol. 102. 2021.
H. Deng, “Interpreting Tree Ensembles with inTrees,” Int. J. Data Sci. Anal., vol. 7, no. 4, pp. 277–287, 2019, doi: 10.1007/s41060-018-0144-8.
J. Jiménez-Luna, F. Grisoni, and G. Schneider, “Drug Discovery with Explainable Artificial Intelligence,” Nat. Mach. Intell., 2020.
I. Narayanan et al., “SSD Failures in Datacenters: What? When? and Why?,” Proc. 9th ACM Int. Syst. Storage Conf., 2016.
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. Elsevier, 2012. doi: 10.1016/C2009-0-61819-5.
G. Bakirli and D. Birant, “DTreeSim: A New Approach to Compute Decision Tree Similarity Using re-mining,” Turkish J. Electr. Eng. Comput. Sci., 2017.
H. Deng, X. Guan, and V. Khotilovich, “Package ‘inTrees,’” 2022.
S. Eskandarian, P. Bahrami, and P. Kazemi, “A Comprehensive Data Mining Approach to Estimate The Rate of Penetration: Application of Neural Network, Rule Based Models and Feature Ranking,” J. Pet. Sci. Eng., vol. 156, no. June, pp. 605–615, 2017, doi: 10.1016/j.petrol.2017.06.039.
J. Szlęk, A. Pacławski, R. Lau, R. Jachowicz, P. Kazemi, and A. Mendyk, “Empirical Search for Factors Affecting Mean Particle Size of PLGA Microspheres Containing Macromolecular Drugs,” Comput. Methods Programs Biomed., vol. 134, 2016, doi: 10.1016/j.cmpb.2016.07.006.
Gallego-Ortiz, M. C., and A.L, “Using Quantitative Features Extracted from t2-weighted MRI to Improve Breast MRI Computeraided Diagnosis (CAD),” PLoS ONE 12(11), 2017.
A. F. ́andez, S. G. ́ıa, F. Herrera, and N. V. Chawla, “SMOTE for Learning from Imbalanced Data: Progress andChallenges, Marking the 15-year Anniversary,” J. Artif. Intell. Res., 2018.
H. Han, X. Guo, and H. Yu, “Variable Selection Using Mean Decrease Accuracy and Mean Decrease Gini based on Random Forest,” Proc. IEEE Int. Conf. Softw. Eng. Serv. Sci. ICSESS, vol. 0, pp. 219–224, 2016, doi: 10.1109/ICSESS.2016.7883053.
Copyright (c) 2023 Hafizah Ilma, Khairil Anwar Notodiputro, Bagus Sartono
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.