COMPARATIVE STUDY OF SURVIVAL SUPPORT VECTOR MACHINE AND RANDOM SURVIVAL FOREST IN SURVIVAL DATA

  • Ni Gusti Ayu Putu Puteri Suantari Department of Statistics, Faculty of Mathematics and Natural Science, IPB University, Indonesia
  • Anwar Fitrianto Department of Statistics, Faculty of Mathematics and Natural Science, IPB University, Indonesia
  • Bagus Sartono Department of Statistics, Faculty of Mathematics and Natural Science, IPB University, Indonesia
Keywords: Survival Support Vector Machine, Random Survival Forest, Survival Analysis, Machine Learning

Abstract

Survival analysis is a statistical procedure in analyzing data with the response variable is time until an event occurs (time-to-event). In the last few years, many classification approaches have been developed in machine learning, but only a few considered the presence of time-to-event variable. Random Survival Forest and Survival Support Vector Machine are machine learning approach which is a nonparametric classification method when dealing with large data and a response variable of survival time. Random Survival Forest is tree based method that using boostrapping algorithm, and Survival Support Vector Machine using hybrid approaches between regression and ranking constrain. The data used in this study is generated data in the form of right-censored survival data. This study uses the RandomForestSRC and SurvivalSVM packages on R software. This study aimed to compare the performance of the Survival Support Vector Machine and Random Survival Forest methods using simulation studies. Simulation results on right-censored survival data using binary predictor variables scenario indicate that the Survival Support Vector Machine (SSVM) method with Radial Basic Function Kernel (RBF Kernel) has the best model performance on data with small volumes, whereas when the data volume becomes larger, the method that has the best performance is Survival Support Vector Machine using Additive Kernel. Meanwhile, Random Survival Forest is a method that has the best performance for all conditions in mixed predictor variables scenario. Method, proportion of censored data and size of data are factors that affect the model performance.

Downloads

Download data is not yet available.

References

J. In and D. K. Lee, “Survival analysis: Part I-analysis of time-to-event,” Korean J Anesthesiol, vol. 71, no. 3, pp. 182–191, 2018.

J. P. Klein and M. L. Moeschberger, Survival analysis: techniques for censored and truncated data, vol. 1230. Springer, 2003.

P. Schober and T. R. Vetter, “Survival analysis and interpretation of time-to-event data: the tortoise and the hare,” Anesth Analg, vol. 127, no. 3, p. 792, 2018.

A. J. Turkson, “Perspectives on Hazard Rate Functions: Concepts; Properties; Theories; Methods; Computations; and Application to Real-Life Data,” Open Access Library Journal, vol. 9, no. 1, pp. 1–23, 2022.

R. Mokarram, M. Emadi, A. H. Rad, and M. J. Nooghabi, “A comparison of parametric and semi-parametric survival models with artificial neural networks,” Communications in Statistics-Simulation and Computation, vol. 47, no. 3, pp. 738–746, 2018.

C. J. K. Fouodo, I. R. König, C. Weihs, A. Ziegler, and M. N. Wright, “Support Vector Machines for Survival Analysis with R.,” R Journal, vol. 10, no. 1, 2018.

G. Xia and W. Jin, “Model of customer churn prediction on support vector machine,” Systems Engineering-Theory & Practice, vol. 28, no. 1, pp. 71–77, 2008.

R. Sudharsan, “SVM Based Churn Analysis for Telecommunication,” International Journal of Advanced Research in Engineering and Technology, vol. 11, no. 6, 2020.

H. Ishwaran, U. B. Kogalur, E. H. Blackstone, and M. S. Lauer, “Random survival forests,” 2008.

I. Ullah, B. Raza, A. K. Malik, M. Imran, S. U. Islam, and S. W. Kim, “A churn prediction model using random forest: analysis of machine learning techniques for churn prediction and factor identification in telecom sector,” IEEE access, vol. 7, pp. 60134–60149, 2019.

V. van Belle, K. Pelckmans, J. A. K. Suykens, and S. van Huffel, “Survival SVM: a practical scalable algorithm.,” in ESANN, 2008, pp. 89–94.

A. Hadanny et al., “Machine learning-based prediction of 1-year mortality for acute coronary syndrome✰,” J Cardiol, vol. 79, no. 3, pp. 342–351, 2022.

C. Khotimah, S. W. Purnami, and D. D. Prastyo, “Additive survival least square support vector machines and feature selection on health data in Indonesia,” in 2018 International Conference on Information and Communications Technology (ICOIACT), 2018, pp. 326–331.

M. Saadati and A. Bagheri, “Comparison of Survival Forests in Analyzing First Birth Interval,” Jorjani Biomedicine Journal, vol. 7, no. 3, pp. 11–23, 2019.

J. B. Nasejje, H. Mwambi, K. Dheda, and M. Lesosky, “A comparison of the conditional inference survival forest model to random survival forests based on a simulation study as well as on two applications with time-to-event data,” BMC Med Res Methodol, vol. 17, no. 1, pp. 1–17, 2017.

F. Wan, “Simulating survival data with predefined censoring rates for proportional hazards models,” Stat Med, vol. 36, no. 5, pp. 838–854, 2017.

H. H. Dukalang, “Analisis regresi Cox proportional hazard pada pemodelan waktu tunggu mendapatkan pekerjaan,” Jambura Journal of Mathematics, vol. 1, no. 1, pp. 36–42, 2019.

C. E. Smith and R. Cribbie, “Factorial ANOVA with unbalanced data: a fresh look at the types of sums of squares,” Journal of Data Science, vol. 12, no. 3, pp. 385–403, 2014.

S. Bai, X. Ji, B. Dai, Y. Pu, and W. Qin, “An Integrated Model for the Geohazard Accident Duration on a Regional Mountain Road Network Using Text Data,” Sustainability, vol. 14, no. 19, p. 12429, 2022.

S. Banerjee, S. Mitra, and L. O. Hall, “Analysis of MRI Biomarkers for Brain Cancer Survival Prediction,” arXiv preprint arXiv:2109.02785, 2021.

Published
2023-09-30
How to Cite
[1]
N. Suantari, A. Fitrianto, and B. Sartono, “COMPARATIVE STUDY OF SURVIVAL SUPPORT VECTOR MACHINE AND RANDOM SURVIVAL FOREST IN SURVIVAL DATA”, BAREKENG: J. Math. & App., vol. 17, no. 3, pp. 1495-1502, Sep. 2023.