APPLICATION OF SUPPORT VECTOR MACHINE FOR CLASS IMBALANCE LEARNING TO PREDICT ANTICANCER COMPOUNDS OF MEDICINAL PLANTS IN WEST SULAWESI

  • Hikmah Hikmah Department of Statistics, Faculty of Mathematics and Natural Sciences, University of West Sulawesi , Indonesia
  • Nur Hilal A Syahrir Department of Statistics, Faculty of Mathematics and Natural Sciences, University of West Sulawesi , Indonesia
  • Putri Indi Rahayu Department of Statistics, Faculty of Mathematics and Natural Sciences, University of West Sulawesi , Indonesia
Keywords: ADASYN, DB-SMOTE, SMOTE, Undersampling, Oversampling, Imbalance

Abstract

Indonesian medicinal plants, such as turmeric and soursop, have shown promising anticancer properties through their bioactive compounds, like curcumin and extracts from soursop. Despite many extensive studies on medicinal plants in Indonesia, research revealing the activity of natural products in West Sulawesi is still limited, and the studies focus mainly on ethnobotanical research. In this work, we propose a machine-learning approach to predict the anticancer activity of compounds in medicinal plants in West Sulawesi by leveraging high throughput-screening data, especially molecular information from a public database. We applied Support Vector Machine (SVM) with five sampling techniques to address data imbalance. We also evaluated the performance in selecting the best combination in handling class imbalance learning in our dataset. The result shows that undersampling and ADSYN methods can improve the prediction of anticancer activity. Based on the two methods of balancing data, we have ten potential anticancer compounds from three medicinal plants in West Sulawesi.

 

Downloads

Download data is not yet available.

References

B. Andinata, A. Bachtiar, P. Oktamianti, J. R. Partahi, and M. S. A. Dini, “A Comparison of Cancer Incidences Between Dharmais Cancer Hospital and GLOBOCAN 2020: A Descriptive Study of Top 10 Cancer Incidences,” Indonesian Journal of Cancer, vol. 17, no. 2, pp. 119–122, 2023.

A. Zia, T. Farkhondeh, A. M. Pourbagher-Shahri, and S. Samarghandian, “The role of curcumin in aging and senescence: Molecular mechanisms,” Biomedicine & Pharmacotherapy, vol. 134, p. 111119, 2021.

S. Ilango et al., “A review on annona muricata and its anticancer activity,” Cancers (Basel), vol. 14, no. 18, p. 4539, 2022.

G. M. Nurdin, A. P. Sari, and H. Herni, “Identifikasi Tumbuhan Obat Masyarakat Desa Pao-Pao Kabupaten Polewali Mandar Provinsi Sulawesi Barat,” Biosfer: Jurnal Biologi dan Pendidikan Biologi, vol. 7, no. 1, pp. 20–29, 2022.

H. Hastuti, I. Lestari, M. Yunus, and A. Hasyim, “Inventarisasi Tumbuhan Berkhasiat Obat di Desa Pokkang, Kec. Kalukku, Kabupaten Mamuju, Provinsi Sulawesi Barat,” Jurnal Biosense, vol. 5, no. 01, pp. 41–54, 2022.

H. Alang, S. Rosalia, and A. D. R. Ainulia, “Inventarisasi tumbuhan obat sebagai upaya swamedikasi oleh masyarakat suku mamasa di Sulawesi Barat,” Quagga: Jurnal Pendidikan Dan Biologi, vol. 14, no. 1, pp. 77–87, 2022.

R. Zhang, X. Li, X. Zhang, H. Qin, and W. Xiao, “Machine learning approaches for elucidating the biological effects of natural products,” Nat Prod Rep, vol. 38, no. 2, pp. 346–361, 2021.

S. Syamsiah, H. Karim, A. F. Arsal, and S. Sondok, “Kajian Etnobotani dalam Pemanfaatan Tumbuhan Obat Tradisional di Kecamatan Pana Kabupaten Mamasa, Sulawesi Barat,” Jurnal Bionature, vol. 22, no. 2, pp. 1–12, 2021.

S. Kim et al., “PubChem 2019 update: improved access to chemical data,” Nucleic Acids Res, vol. 47, no. D1, pp. D1102–D1109, 2019.

A. Rácz, D. Bajusz, and K. Héberger, “Life beyond the Tanimoto coefficient: Similarity measures for interaction fingerprints,” J Cheminform, vol. 10, no. 1, pp. 1–12, 2018, doi: 10.1186/s13321-018-0302-y.

R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine learning with oversampling and undersampling techniques: overview study and experimental results,” in 2020 11th international conference on information and communication systems (ICICS), IEEE, 2020, pp. 243–248.

S. Liu and K. Zhang, “Under-sampling and feature selection algorithms for S2SMLP,” IEEE Access, vol. 8, pp. 191803–191814, 2020.

J. Mathew, C. K. Pang, M. Luo, and W. H. Leong, “Classification of imbalanced data by oversampling in kernel space of support vector machines,” IEEE Trans Neural Netw Learn Syst, vol. 29, no. 9, pp. 4065–4076, 2017.

A. Ishaq et al., “Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques,” IEEE access, vol. 9, pp. 39707–39716, 2021.

D. Elreedy, A. F. Atiya, and F. Kamalov, “A theoretical distribution analysis of synthetic minority oversampling technique (SMOTE) for imbalanced learning,” Mach Learn, pp. 1–21, 2023.

C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, “DBSMOTE: density-based synthetic minority over-sampling technique,” Applied Intelligence, vol. 36, pp. 664–684, 2012.

C.-K. Ma and Y.-J. Park, “A new instance density-based synthetic minority oversampling method for imbalanced classification problems,” Engineering Optimization, vol. 54, no. 10, pp. 1743–1757, 2022.

J. Brandt and E. Lanzén, “A comparative review of SMOTE and ADASYN in imbalanced data classification,” 2021.

D. A. Pisner and D. M. Schnyer, “Support vector machine,” in Machine learning, Elsevier, 2020, pp. 101–121.

A. Patle and D. S. Chouhan, “SVM kernel functions for classification,” in 2013 International conference on advances in technology and engineering (ICATE), IEEE, 2013, pp. 1–9.

N. W. S. Wardhani, M. Y. Rochayani, A. Iriany, A. D. Sulistyono, and P. Lestantyo, “Cross-validation metrics for evaluating classification performance on imbalanced data,” in 2019 international conference on computer, control, informatics and its applications (IC3INA), IEEE, 2019, pp. 14–18.

M. J. Nime et al., “Studies on Antioxidant and Antineoplastic Potentials of Oldenlandia corymbosa Linn. Leaves,” Journal of Fundamental and Applied Pharmaceutical Science, vol. 3, no. 2, p. 84, 2023.

M. Zebeaman, M. G. Tadesse, R. K. Bachheti, A. Bachheti, R. Gebeyhu, and K. K. Chaubey, “Plants and Plant-Derived Molecules as Natural Immunomodulators,” Biomed Res Int, vol. 2023, 2023.

Published
2024-03-01
How to Cite
[1]
H. Hikmah, N. A Syahrir, and P. Rahayu, “APPLICATION OF SUPPORT VECTOR MACHINE FOR CLASS IMBALANCE LEARNING TO PREDICT ANTICANCER COMPOUNDS OF MEDICINAL PLANTS IN WEST SULAWESI”, BAREKENG: J. Math. & App., vol. 18, no. 1, pp. 0141-0150, Mar. 2024.