COMPARATIVE STUDY OF SURVIVAL SUPPORT VECTOR MACHINE AND RANDOM SURVIVAL FOREST IN SURVIVAL DATA
Survival analysis is a statistical procedure in analyzing data with the response variable is time until an event occurs (time-to-event). In the last few years, many classification approaches have been developed in machine learning, but only a few considered the presence of time-to-event variable. Random Survival Forest and Survival Support Vector Machine are machine learning approach which is a nonparametric classification method when dealing with large data and a response variable of survival time. Random Survival Forest is tree based method that using boostrapping algorithm, and Survival Support Vector Machine using hybrid approaches between regression and ranking constrain. The data used in this study is generated data in the form of right-censored survival data. This study uses the RandomForestSRC and SurvivalSVM packages on R software. This study aimed to compare the performance of the Survival Support Vector Machine and Random Survival Forest methods using simulation studies. Simulation results on right-censored survival data using binary predictor variables scenario indicate that the Survival Support Vector Machine (SSVM) method with Radial Basic Function Kernel (RBF Kernel) has the best model performance on data with small volumes, whereas when the data volume becomes larger, the method that has the best performance is Survival Support Vector Machine using Additive Kernel. Meanwhile, Random Survival Forest is a method that has the best performance for all conditions in mixed predictor variables scenario. Method, proportion of censored data and size of data are factors that affect the model performance.
J. In and D. K. Lee, “Survival analysis: Part I-analysis of time-to-event,” Korean J Anesthesiol, vol. 71, no. 3, pp. 182–191, 2018.
J. P. Klein and M. L. Moeschberger, Survival analysis: techniques for censored and truncated data, vol. 1230. Springer, 2003.
P. Schober and T. R. Vetter, “Survival analysis and interpretation of time-to-event data: the tortoise and the hare,” Anesth Analg, vol. 127, no. 3, p. 792, 2018.
A. J. Turkson, “Perspectives on Hazard Rate Functions: Concepts; Properties; Theories; Methods; Computations; and Application to Real-Life Data,” Open Access Library Journal, vol. 9, no. 1, pp. 1–23, 2022.
R. Mokarram, M. Emadi, A. H. Rad, and M. J. Nooghabi, “A comparison of parametric and semi-parametric survival models with artificial neural networks,” Communications in Statistics-Simulation and Computation, vol. 47, no. 3, pp. 738–746, 2018.
C. J. K. Fouodo, I. R. König, C. Weihs, A. Ziegler, and M. N. Wright, “Support Vector Machines for Survival Analysis with R.,” R Journal, vol. 10, no. 1, 2018.
G. Xia and W. Jin, “Model of customer churn prediction on support vector machine,” Systems Engineering-Theory & Practice, vol. 28, no. 1, pp. 71–77, 2008.
R. Sudharsan, “SVM Based Churn Analysis for Telecommunication,” International Journal of Advanced Research in Engineering and Technology, vol. 11, no. 6, 2020.
H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and M. S. Lauer, “Random survival forests,” 2008.
I. Ullah, B. Raza, A. K. Malik, M. Imran, S. U. Islam, and S. W. Kim, “A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector,” IEEE access, vol. 7, pp. 60134–60149, 2019.
V. van Belle, K. Pelckmans, J. A. K. Suykens, and S. van Huffel, “Survival SVM: a practical scalable algorithm.,” in ESANN, 2008, pp. 89–94.
A. Hadanny et al., “Machine learning-based prediction of 1-year mortality for acute coronary syndrome✰,” J Cardiol, vol. 79, no. 3, pp. 342–351, 2022.
C. Khotimah, S. W. Purnami, and D. D. Prastyo, “Additive survival least square support vector machines and feature selection on health data in Indonesia,” in 2018 International Conference on Information and Communications Technology (ICOIACT), 2018, pp. 326–331.
M. Saadati and A. Bagheri, “Comparison of Survival Forests in Analyzing First Birth Interval,” Jorjani Biomedicine Journal, vol. 7, no. 3, pp. 11–23, 2019.
J. B. Nasejje, H. Mwambi, K. Dheda, and M. Lesosky, “A comparison of the conditional inference survival forest model to random survival forests based on a simulation study as well as on two applications with time-to-event data,” BMC Med Res Methodol, vol. 17, no. 1, pp. 1–17, 2017.
F. Wan, “Simulating survival data with predefined censoring rates for proportional hazards models,” Stat Med, vol. 36, no. 5, pp. 838–854, 2017.
H. H. Dukalang, “Analisis regresi Cox proportional hazard pada pemodelan waktu tunggu mendapatkan pekerjaan,” Jambura Journal of Mathematics, vol. 1, no. 1, pp. 36–42, 2019.
C. E. Smith and R. Cribbie, “Factorial ANOVA with unbalanced data: a fresh look at the types of sums of squares,” Journal of Data Science, vol. 12, no. 3, pp. 385–403, 2014.
S. Bai, X. Ji, B. Dai, Y. Pu, and W. Qin, “An Integrated Model for the Geohazard Accident Duration on a Regional Mountain Road Network Using Text Data,” Sustainability, vol. 14, no. 19, p. 12429, 2022.
S. Banerjee, S. Mitra, and L. O. Hall, “Analysis of MRI Biomarkers for Brain Cancer Survival Prediction,” arXiv preprint arXiv:2109.02785, 2021.
Copyright (c) 2023 Ni Gusti Ayu Putu Puteri Suantari, Anwar Fitrianto, Bagus Sartono
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.