IDENTIFYING IMPORTANT GENES IN OVARIAN CANCER FROM HIGH-DIMENSIONAL MICROARRAY DATA USING SIFS-CART METHOD

  • Ni Kadek Emik Sapitri Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Brawijaya, Indonesia https://orcid.org/0009-0001-4404-0931
  • Umu Sa'adah Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Brawijaya, Indonesia
  • Nur Shofianah Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Brawijaya, Indonesia
Keywords: CART, Important Genes, Machine Learning, Microarray Data, Ovarian Cancer, SIFS

Abstract

Ovarian cancer can be identified from microarray data using machine learning. Many studies only focus on improving the machine learning classification algorithms to achieve higher performance. The purpose of classification is not only to obtain high performance but also to seek new knowledge from the results. This research focuses on both. By using a hybrid Supervised Infinite Feature Selection (SIFS) method with Classification and Regression Tree (CART) or SIFS-CART, this research aims to predict ovarian cancer and identify potential genes for ovarian cancer cases. The data used is the OVA_ovary dataset. SIFS in the best SIFS-CART model reduced 10935 genes in the initial OVA_ovary dataset to 1000 genes. Then, CART was built with these 1000 genes. Based on the balanced accuracy (BA) metric for imbalanced microarray data, the best SIFS-CART model achieves 85.7% BA in training and 83.2% in testing. The optimal CART in the best SIFS-CART model only needs four genes from 1000 selected genes to build it. Those genes are STAR, WT1, PEG3, and ASPN. Based on studies of several pieces of literature in the medical field, it can be concluded that STAR, WT1, and PEG3 play an important role in ovarian cancer cases. However, the relationship between ASPN and ovarian cancer in more detail has not been studied by medical researchers.

Downloads

Download data is not yet available.

References

C. Slatnik and E. Duff, “Ovarian cancer: Ensuring early diagnosis,” Nurse Pract., vol. 40, no. 9, pp. 47–54, 2015.

A. B. Harsono, “Kanker Ovarium: ‘The Silent Killer,’” Indones. J. Obstet. Gynecol. Sci., vol. 3, no. 1, pp. 1–6, 2020.

M. Y. Rochayani, U. Sa’adah, and A. B. Astuti, “Two-stage Gene Selection and Classification for a High-Dimensional Microarray Data,” J. Online Inform., vol. 5, no. 1, pp. 9–18, 2020.

F. Han, D. Tang, Y. Sun, Z. Cheng, J. Jiang, and Q. Li, “A hybrid gene selection method based on gene scoring strategy and improved particle swarm optimization,” BMC Bioinformatics, vol. 20, no. 8, pp. 1–13, 2019.

T. N. Nuklianggraita, Adiwijaya, and A. Aditsania, “On the Feature Selection of Microarray Data for Cancer Detection based on Random Forest Classifier,” Infotel, vol. 12, no. 3, pp. 89–96, 2020.

A. Lacalamita, E. Piccinno, V. Scalavino, R. Bellotti, G. Giannelli, and G. Serino, “A Gene-Based Machine Learning Classifier Associated to the Colorectal Adenoma-Carcinoma Sequence.,” Biomedicines, vol. 9, no. 12, 2021.

X. Qin, S. Zhang, D. Yin, D. Chen, and X. Dong, “Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm,” Math. Biosci. Eng., vol. 19, no. 12, pp. 13747–13781, 2022.

M. Rostami, S. Forouzandeh, K. Berahmand, M. Soltani, M. Shahsavari, and M. Oussalah, “Gene selection for microarray data classification via multi-objective graph theoretic-based method,” Artif. Intell. Med., vol. 123, p. 102228, 2022.

C. Lai and H. Huang, “A gene selection algorithm using simplified swarm optimization with multi-filter ensemble technique,” Appl. Soft Comput. J., vol. 100, p. 106994, 2021.

M. Y. Rochayani, U. Sa’adah, and A. B. Astuti, “Simulation Study of Imbalanced Classification on High-Dimensional Gene Expression Data,” Sci. J. Informatics, vol. 10, no. 1, pp. 45–54, 2023.

G. Roffo, S. Melzi, U. Castellani, A. Vinciarelli, and M. Cristani, “Infinite Feature Selection: A Graph-based Feature Filtering Approach,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 12, pp. 4396–4410, 2021.

A. Abdellatif, H. Abdellatef, J. Kanesan, C.-O. Chow, J. H. Chuah, and H. M. Gheni, “Improving the Heart Disease Detection and Patients’ Survival Using Supervised Infinite Feature Selection and Improved Weighted Random Forest,” IEEE Access, vol. 10, pp. 67363–67372, 2022.

X. Tang, S. X. D. Tan, and H. Chen, “SVM Based Intrusion Detection Using Nonlinear Scaling Scheme,” in 4th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT), 2018, pp. 1–4.

U. Sa’adah, M. Y. Rochayani, and A. B. Astuti, “Knowledge discovery from gene expression dataset using bagging lasso decision tree,” Indones. J. Electr. Eng. Comput. Sci., vol. 21, no. 2, pp. 1151–1159, 2020.

B. C. Ross, “Mutual Information between Discrete and Continuous Data Sets,” PLoS One, vol. 9, no. 2, pp. 1–5, 2014.

T. Wong, “Performance evaluation of classi fi cation algorithms by k-fold and leave-one-out cross validation,” Pattern Recognit., vol. 48, pp. 2839–2846, 2015.

Y. Song and Y. Lu, “Decision tree methods: applications for classification and prediction,” Shanghai Arch. Psychiatry, vol. 27, no. 2, pp. 130–135, 2015.

S. Singh and P. Gupta, “Comparative Study ID3, CART and C4.5 Decision Tree Algorithm: A Survey,” Int. J. Adv. Inf. Sci. Technol., vol. 27, no. 27, pp. 97–103, 2014.

G. Kunapuli, Ensemble Methods for Machine Learning, 6th ed. New York: Manning Publications, 2022.

B. R. Kiran and J. Serra, “Cost-complexity pruning of random forests,” Artif. Intell. Med., pp. 222–232, 2017.

D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 6, pp. 1–13, 2020.

N. A. Al-thanoon, O. S. Qasim, and Z. Y. Algamal, “Tuning parameter estimation in SCAD-support vector machine using firefly algorithm with application in gene selection and cancer classification,” Comput. Biol. Med., vol. 103, pp. 262–268, 2018.

A. M. Alharthi, M. H. Lee, and Z. Y. Algamal, “Gene selection and classification of microarray gene expression data based on a new adaptive L1 -norm elastic net penalty,” Informatics Med. Unlocked, vol. 24, p. 100622, 2021.

M. Grandini, E. Bagli, and G. Visani, “Metrics for multi-class classification: An overview,” arXiv, pp. 1–17, 2020.

N. K. E. Sapitri, U. Sa’adah, and N. Shofianah, “Knowledge Discovery from Confusion Matrix of Pruned CART in Imbalanced Microarray Data Ovarian Cancer Classification,” Sci. J. Informatics, vol. 11, no. 1, pp. 227–236, 2024.

D. Chicco, N. Tötsch, and G. Jurman, “The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation,” BioData Min., vol. 14, pp. 1–22, 2021.

P. R. Manna, C. L. Stetson, A. T. Slominski, and K. Pruitt, “Role of the steroidogenic acute regulatory protein in health and disease,” Endocrine, vol. 51, pp. 7–21, 2016.

Z. Liu et al., “High levels of wilms’ tumor 1 (WT1) expression were associated with aggressive clinical features in ovarian cancer,” Anticancer Res., vol. 34, pp. 2331–2340, 2014.

E. T. Taube et al., “Wilms tumor protein 1 (WT1) - Not only a diagnostic but also a prognostic marker in high-grade serous ovarian carcinoma,” Gynecol. Oncol., vol. 140, pp. 494–502, 2016.

M. Zhang and J. Zhang, “PEG3 mutation is associated with elevated tumor mutation burden and poor prognosis in breast cancer,” Biosci. Rep., vol. 40, pp. 1–9, 2020.

M. Li, Q. Sun, and X. Wang, “Transcriptional landscape of human cancers,” Oncotarget, vol. 8, no. 21, pp. 34534–34551, 2017.

A. Singh, S. Gupta, and M. Sachan, “Epigenetic biomarkers in the management of ovarian cancer: Current prospectives,” Front. Cell Dev. Biol., vol. 7, pp. 1–35, 2019.

K. Jiang, H. Liu, D. Xie, and Q. Xiao, “Differentially expressed genes ASPN, COL1A1, FN1, VCAN and MUC5AC are potential prognostic biomarkers for gastric cancer,” Oncol. Lett., vol. 17, pp. 3191–3202, 2019.

L. Wang and J. Sun, “ASPN Is a Potential Biomarker and Associated with Immune Infiltration in Endometriosis,” Genes (Basel)., vol. 13, p. 1352, 2022.

Published
2024-07-31
How to Cite
[1]
N. Sapitri, U. Sa’adah, and N. Shofianah, “IDENTIFYING IMPORTANT GENES IN OVARIAN CANCER FROM HIGH-DIMENSIONAL MICROARRAY DATA USING SIFS-CART METHOD”, BAREKENG: J. Math. & App., vol. 18, no. 3, pp. 1909-1918, Jul. 2024.