APPLICATION AND PERFORMANCE COMPARISON OF MULTI-OUTPUT MACHINE LEARNING FOR NUMERICAL-NUMERICAL AND NUMERICAL-CATEGORICAL OUTPUTS

Karin Joan; Robyn Irawan; Benny Yong

doi:10.30598/barekengvol19iss2pp1421-1432

Karin Joan Center for Mathematics and Society, Faculty of Science, Parahyangan Catholic University, Indonesia https://orcid.org/0009-0007-5898-9236
Robyn Irawan Center for Mathematics and Society, Faculty of Science, Parahyangan Catholic University, Indonesia https://orcid.org/0000-0003-0970-9174
Benny Yong Center for Mathematics and Society, Faculty of Science, Parahyangan Catholic University, Indonesia https://orcid.org/0000-0003-0567-7304

DOI: https://doi.org/10.30598/barekengvol19iss2pp1421-1432

Keywords: Logistic Regression, Multi-Output Machine Learning, Multivariate Regression Tree, Multivariate Random Forest, Multi-Output Neural Network

Abstract

Multi-Output Machine Learning is an advancement of traditional machine learning, designed to predict multiple output variables simultaneously while considering the relationships between these output variables. Multi-Output Machine Learning is essential as a decision support tool because decision-making in many problems generally considers multiple factors. The use of Multi-Output Machine Learning is more advantageous than conventional machine learning in terms of time efficiency, addressing data limitations, and ease of maintenance. These benefits will significantly impact cost savings for industries utilizing Big Data. The models used in this research include Multivariate Regression Tree, Multivariate Random Forest, and Multi-Output Neural Network. The Multivariate Regression Tree and Multivariate Random Forest are developed by modifying the splitting function using Mahalanobis distance. The topological changes introducing shared and private hidden layers are the key development of the Multi-Output Neural Network. The prediction results indicated a trade-off in error between two output variables when comparing the Multivariate Regression Tree and Multivariate Random Forest with their single output counterparts. Meanwhile, the Multi-Output Neural Network model successfully improved the prediction results for both output variables. This research also introduces Mixed Multi-Output Machine Learning, which can predict numerical and categorical output variables. The Mixed Multi-Output Machine Learning model utilizes the logit values from the Logistic Regression model to extend the range of prediction results beyond the 0 to 1 interval. Multi-Output Neural Network is the sole model that produces predictions with relatively small errors and high accuracy values.

Downloads

Download data is not yet available.

References

R. Sharma, S. Mithas and A. Kankanhalli, "Transforming decision-making processes: a research agenda for understanding the impact of business analytics on organisations," European Journal of Information Systems, vol. 23, no. 4, pp. 433-441, 2014.

G. Li, W. Tian and H. a. C. B. Zhang, "Building energy models at different time scales based on multi-output machine learning," Buildings, vol. 12, no. 12, p. 2109, 2022.

H. Zare Abyaneh, "Evaluation of multivariate linear regression and artificial neural networks in prediction of water quality parameters," Journal of Environmental Health Science and Engineering, vol. 12, pp. 1-8, 2014.

L. Schmid, A. Gerharz, A. Groll and M. Pauly, "Tree-based ensembles for multi-output regression: Comparing multivariate approaches with separate univariate ones," Computational Statistics & Data Analysis, vol. 179, p. 107628, 2023.

L. Cui, X. Xie, Z. Shen, R. Lu and H. Wang, "Prediction of the healthcare resource utilization using multi-output regression models," IISE Transactions on Healthcare Systems Engineering, vol. 8, no. 4, pp. 291-302, 2018.

L. He, S. C. Madathil, G. Servis and M. T. Khasawneh, "Neural network-based multi-task learning for inpatient flow classification and length of stay prediction," Applied Soft Computing, vol. 108, p. 107483, 2021.

V. G. Costa and C. E. Pedreira, "Recent advances in decision trees: An updated survey," Artificial Intelligence Review, vol. 56, no. 5, pp. 4765-4800, 2023.

G. O. Ndong, J. Villerd, I. Cousin and O. Therond, "Using a multivariate regression tree to analyze trade-offs between ecosystem services: Application to the main cropping area in France," Science of The Total Environment, vol. 764, p. 142815, 2021.

S. J. Rigatti, "Random forest," Journal of Insurance Medicine, vol. 47, no. 1, pp. 31-39, 2017.

S. Gharsalli, B. Emile, H. Laurent, X. Desquesnes and D. Vivet, "Random forest-based feature selection for emotion recognition," in 2015 International Conference on Image Processing Theory, Tools and Applications (IPTA), Orléans, 2015.

M. López, O. Antonio, A. Montesinos López and J. Crossa, Multivariate statistical machine learning methods for genomic prediction, Springer Nature, 2022.

M. Heidarpanah, F. Hooshyaripor and M. Fazeli, "Daily electricity price forecasting using artificial intelligence models in the Iranian electricity market," Energy, vol. 263, p. 126011, 2023.

C. C. Aggarwal, "The Backpropagation Algorithm," in Neural networks and deep learning, Springer, 2018, pp. 29-71.

J. Wu and Y. Sun, "Evolving deep parallel neural networks for multi-task learning," in International Conference on Algorithms and Architectures for Parallel Processing, Xiamen, 2021.

G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning: with Application in R, 2nd ed., New York: Springer Nature, 2021.

APPLICATION AND PERFORMANCE COMPARISON OF MULTI-OUTPUT MACHINE LEARNING FOR NUMERICAL-NUMERICAL AND NUMERICAL-CATEGORICAL OUTPUTS

Abstract

Downloads

References

Editorial Office

Contact Info