GENE SELECTION FOR TYPE 2 DIABETES MELLITUS (T2DM) DISEASE USING MULTIPLE SUPPORT VECTOR MACHINE – RECURSIVE FEATURE ELIMINATION (MSVM-RFE) ALGORITHM

Keywords: Type 2 diabetes mellitus, Gene expression, Multiple Support Vector, Machine-Recursive Feature Elimination

Abstract

Gene selection is essential for improving classification performance and interpretability in high-dimensional microarray data. This study applies a Multiple Support Vector Machine–Recursive Feature Elimination (MSVM-RFE) framework for gene selection in Type 2 Diabetes Mellitus (T2DM). Experiments were conducted on a GEO microarray dataset comprising 118 samples (73 controls and 45 T2DM cases) with 25,770 genes. MSVM-RFE employs multiple linear SVM models within a 10-fold cross-validation scheme as feature selection to enhance accuracy and was evaluated under different train–test splits, with and without SMOTE resampling. The selected gene subsets were classified using SVM with linear, RBF, and polynomial kernels. The best configuration achieved 95.67% accuracy, with high sensitivity, specificity, and AUROC, using fewer than 100 genes. These results demonstrate that MSVM-RFE provides a robust and effective gene selection strategy for T2DM microarray analysis.

 

Downloads

Download data is not yet available.

References

C. Shi, “DNA MICROARRAY TECHNOLOGY PRINCIPLES AND APPLICATIONS IN GENETIC RESEARCH,” 2024, doi: https://doi.org/10.54097/a9b7d148

Q. Chen, Z. Meng, and R. Su, “WERFE: A GENE SELECTION ALGORITHM BASED ON RECURSIVE FEATURE ELIMINATION AND ENSEMBLE STRATEGY,” Front. Bioeng. Biotechnol., vol. 8, May 2020, doi: https://doi.org/10.3389/fbioe.2020.00496

X. Zhang, I. Jonassen, and A. Goksøyr, “MACHINE LEARNING APPROACHES FOR BIOMARKER DISCOVERY USING GENE EXPRESSION DATA,” in Bioinformatics, Exon Publications, pp. 53–64, 2021, doi: https://doi.org/10.36255/exonpublications.bioinformatics.2021.ch4

N. Pudjihartono, T. Fadason, A. W. Kempa-Liehr, and J. M. O’Sullivan, “A REVIEW OF FEATURE SELECTION METHODS FOR MACHINE LEARNING-BASED DISEASE RISK PREDICTION,” Frontiers Media SA, 2022, doi: https://doi.org/10.3389/fbinf.2022.927312

Z. Rustam and S. A. A. Kharis, “COMPARISON OF SUPPORT VECTOR MACHINE RECURSIVE FEATURE ELIMINATION AND KERNEL FUNCTION AS FEATURE SELECTION USING SUPPORT VECTOR MACHINE FOR LUNG CANCER CLASSIFICATION,” in Journal of Physics: Conference Series, Institute of Physics Publishing, Jan. 2020, doi: https://doi.org/10.1088/1742-6596/1442/1/012027

Y. Zhang et al., “EXPLORING POTENTIAL DIAGNOSTIC MARKERS AND THERAPEUTIC TARGETS FOR TYPE 2 DIABETES MELLITUS WITH MAJOR DEPRESSIVE DISORDER THROUGH BIOINFORMATICS AND IN VIVO EXPERIMENTS,” Sci. Rep., vol. 15, no. 1, Dec. 2025, doi: https://doi.org/10.1038/s41598-025-01175-z

S. Al-Azani, O. S. Alkhnbashi, E. Ramadan, and M. Alfarraj, “GENE EXPRESSION-BASED CANCER CLASSIFICATION FOR HANDLING THE CLASS IMBALANCE PROBLEM AND CURSE OF DIMENSIONALITY,” Int. J. Mol. Sci., vol. 25, no. 4, Feb. 2024, doi: https://doi.org/10.3390/ijms25042102

J. Yang, J. Zhou, Z. Zhu, X. Ma, and Z. Ji, “ITERATIVE ENSEMBLE FEATURE SELECTION FOR MULTICLASS CLASSIFICATION OF IMBALANCED MICROARRAY DATA,” Journal of Biological Research (Greece), vol. 23, 2016, doi: https://doi.org/10.1186/s40709-016-0045-8

R. F. W. Pratama, S. W. Purnami, and S. P. Rahayu, “BOOSTING SUPPORT VECTOR MACHINES FOR IMBALANCED MICROARRAY DATA,” in Procedia Computer Science, Elsevier B.V., pp. 174–183, 2018, doi: https://doi.org/10.1016/j.procs.2018.10.517

Y. Lee, M. Cappellato, and B. Di Camillo, “MACHINE LEARNING-BASED FEATURE SELECTION TO SEARCH STABLE MICROBIAL BIOMARKERS: APPLICATION TO INFLAMMATORY BOWEL DISEASE,” Gigascience, vol. 12, 2023, doi: https://doi.org/10.1093/gigascience/giad083

I. Guyon, J. Weston, and S. Barnhill, “GENE SELECTION FOR CANCER CLASSIFICATION USING SUPPORT VECTOR MACHINES,” 2002, doi: https://doi.org/10.1023/A:1012487302797

H. Sanz, C. Valim, E. Vegas, J. M. Oller, and F. Reverter, “SVM-RFE: SELECTION AND VISUALIZATION OF THE MOST RELEVANT FEATURES THROUGH NON-LINEAR KERNELS,” BMC Bioinformatics, vol. 19, no. 1, Nov. 2018, doi: https://doi.org/10.1186/s12859-018-2451-4

K. Yan and D. Zhang, “FEATURE SELECTION AND ANALYSIS ON CORRELATED GAS SENSOR DATA WITH RECURSIVE FEATURE ELIMINATION,” Sens. Actuators B Chem., vol. 212, pp. 353–363, 2015, doi: https://doi.org/10.1016/j.snb.2015.02.025

D. Yang and X. Zhu, “GENE CORRELATION GUIDED GENE SELECTION FOR MICROARRAY DATA CLASSIFICATION,” Biomed Res. Int., vol. 2021, 2021, doi: https://doi.org/10.1155/2021/6490118

N. N. M. Hasri, N. H. Wen, C. W. Howe, M. S. Mohamad, S. Deris, and S. Kasim, “IMPROVED SUPPORT VECTOR MACHINE USING MULTIPLE SVM-RFE FOR CANCER CLASSIFICATION,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 7, no. 4–2 Special Issue, pp. 1589–1594, 2017, doi: https://doi.org/10.18517/ijaseit.7.4-2.3394

Y. Zhang, Q. Deng, W. Liang, and X. Zou, “AN EFFICIENT FEATURE SELECTION STRATEGY BASED ON MULTIPLE SUPPORT VECTOR MACHINE TECHNOLOGY WITH GENE EXPRESSION DATA,” Biomed Res. Int., vol. 2018, 2018, doi: https://doi.org/10.1155/2018/7538204

J. Taneera et al., “GNAS GENE IS AN IMPORTANT REGULATOR OF INSULIN SECRETORY CAPACITY IN PANCREATIC Β-CELLS,” Gene, vol. 715, Oct. 2019, doi: https://doi.org/10.1016/j.gene.2019.144028

L. Wang et al., “ASSOCIATED FACTORS AND PRINCIPAL PATHOPHYSIOLOGICAL MECHANISMS OF TYPE 2 DIABETES MELLITUS,” Front. Endocrinol. (Lausanne)., vol. 16, 2025, doi: https://doi.org/10.3389/fendo.2025.1499565

L. Morán-Fernández, V. Bolón-Canedo, and A. Alonso-Betanzos, DATA COMPLEXITY MEASURES FOR ANALYZING THE EFFECT OF SMOTE OVER MICROARRAYS. [Online]. Available: http://www.i6doc.com/en/.

R. L. Seal et al., “GENENAMES.ORG: THE HGNC RESOURCES IN 2023,” Nucleic Acids Res., vol. 51, no. D1, pp. D1003–D1009, Jan. 2023, doi: https://doi.org/10.1093/nar/gkac888

E. A. Bruford, B. Braschi, P. Denny, T. E. M. Jones, R. L. Seal, and S. Tweedie, “GUIDELINES FOR HUMAN GENE NOMENCLATURE,”, Nature Research, Aug. 01, 2020, doi: https://doi.org/10.1038/s41588-020-0669-3

M. Athoillah, E. Purnaningrum, and R. K. Putri, “MODIFIED MULTI-KERNEL SUPPORT VECTOR MACHINE FOR MASK DETECTION,” [Online]. Available: https://github.com/prajnasb, 2022, doi: https://doi.org/10.21512/commit.v16i2.7873

P. Jin, L. Lu, Y. Tang, and G. E. Karniadakis, “QUANTIFYING THE GENERALIZATION ERROR IN DEEP LEARNING IN TERMS OF DATA DISTRIBUTION AND NEURAL NETWORK SMOOTHNESS,” Neural Networks, vol. 130, pp. 85–99, Oct. 2020, doi: https://doi.org/10.1016/j.neunet.2020.06.024

S. Wolf, D. Melo, K. M. Garske, L. F. Pallares, A. J. Lea, and J. F. Ayroles, “CHARACTERIZING THE LANDSCAPE OF GENE EXPRESSION VARIANCE IN HUMANS,” PLoS Genet., vol. 19, no. 7 July, Jul. 2023, doi: https://doi.org/10.1371/journal.pgen.1010833

Y. Xie, Z. Jing, H. Pan, X. Xu, and Q. Fang, “REDEFINING THE HIGH VARIABLE GENES BY OPTIMIZED LOESS REGRESSION WITH POSITIVE RATIO,” BMC Bioinformatics, vol. 26, no. 1, Dec. 2025, doi: https://doi.org/10.1186/s12859-025-06112-5

C. Savas and F. Dovis, “THE IMPACT OF DIFFERENT KERNEL FUNCTIONS ON THE PERFORMANCE OF SCINTILLATION DETECTION BASED ON SUPPORT VECTOR MACHINES,” Sensors (Switzerland), vol. 19, no. 23, Dec. 2019, doi: https://doi.org/10.3390/s19235219

D. Aryo Anggoro and D. Permatasari, “PERFORMANCE COMPARISON OF THE KERNELS OF SUPPORT VECTOR MACHINE ALGORITHM FOR DIABETES MELLITUS CLASSIFICATION,” [Online]. Available: www.ijacsa.thesai.org. 2023, doi: https://doi.org/10.14569/IJACSA.2023.0140163

R. Trevethan, “SENSITIVITY, SPECIFICITY, AND PREDICTIVE VALUES: FOUNDATIONS, PLIABILITIES, AND PITFALLS IN RESEARCH AND PRACTICE,” Front. Public Health, vol. 5, Nov. 2017, doi: https://doi.org/10.3389/fpubh.2017.00307

A. M. Rahmani et al., “MACHINE LEARNING (ML) IN MEDICINE: REVIEW, APPLICATIONS, AND CHALLENGES,” MDPI, Nov. 01, 2021, doi: https://doi.org/10.3390/math9222970

Q. An, S. Rahman, J. Zhou, and J. J. Kang, “A COMPREHENSIVE REVIEW ON MACHINE LEARNING IN HEALTHCARE INDUSTRY: CLASSIFICATION, RESTRICTIONS, OPPORTUNITIES AND CHALLENGES,” May 01, MDPI. 2023, doi: https://doi.org/10.3390/s23094178

J. Zhu et al., “PROCESSING IMBALANCED MEDICAL DATA AT THE DATA LEVEL WITH ASSISTED-REPRODUCTION DATA AS AN EXAMPLE,” BioData Min., vol. 17, no. 1, Dec. 2024, doi: https://doi.org/10.1186/s13040-024-00384-y

Published
2026-04-08
How to Cite
[1]
A. K. G. Basir, A. Husain, and A. F. A. Basir, “GENE SELECTION FOR TYPE 2 DIABETES MELLITUS (T2DM) DISEASE USING MULTIPLE SUPPORT VECTOR MACHINE – RECURSIVE FEATURE ELIMINATION (MSVM-RFE) ALGORITHM”, BAREKENG: J. Math. & App., vol. 20, no. 3, pp. 2665-2680, Apr. 2026.