IMPLEMENTATION OF FEATURE IMPORTANCE XGBOOST ALGORITHM TO DETERMINE THE ACTIVE COMPOUNDS OF SEMBUNG LEAVES (BLUMEA BALSAMIFERA)

Keywords: Sembung Leaves, Feature Importance, XGBoost, Active Compound

Abstract

Sembung is a medicinal plant native to Indonesia that grows optimally in tropical climates. The secondary metabolite compounds found in the leaves of sembung are biopharmaceutical active ingredients. Fourier Transform Infrared (FTIR) spectroscopy can identify the functional compounds in sembung leaves by analyzing unique peaks in the spectrum, which correspond to specific functional groups of the compounds. In this research, 35 observations were made with 1,866 explanatory variables (wavelengths). Data in which the number of explanatory variables surpasses the number of observations is known as high-dimensional data. One method that can handle high-dimensional problems is to select important variables that affect the objective variable. The XGBoost algorithm can calculate the feature importance score that affects the goal variable so that it does not have to include all variables in the modeling, this can overcome problems in high-dimensional data. The results of the calculation of feature importance found Lignin Skeletal Band, CH, and CH2 aliphatic Stretching Group, C=C, C=N, C–H in ring structure, DNA and RNA backbones, NH2 Aminoacidic Group, C=O Ester Fatty Acid that the active compounds contained in the leaves of sembung.

Downloads

Download data is not yet available.

References

S. Suboh, I. Abdul, S. Milleana, and S. Akmar, “A Systematic Review of Anomaly Detection within High Dimensional and Multivariate Data,” JOIV Int. J. Inform. Vis., vol. 7, no. March, 2023.

J. I. Daoud, “Multicollinearity and Regression Analysis,” J. Phys. Conf. Ser., vol. 949, 2017, doi: doi :10.1088/1742-6596/949/1/012009.

S. Wahjuni, I. Bagus, P. Manuaba, and N. M. Puspawati, “Peningkatan Kesejahteraan Masyarakat Dimasa Pandemi Covid 19 dengan Pelatihn Pengemasan Produk Loloh Daun Sembung (Blumea Balsamifera) di Banjar Dinas Apit Yeh Kaja, Desa Manggis Kabupaten Karangasem,” J. Pengabdi. Kpd. Masy. Fak. Ekon. dan Bisnis UNMAS Denpasar, vol. 1, no. 3, pp. 230–236, 2021.

W. Wardah and E. S. Kuncari, “Kajian Etnobotani Pakundalang (Blumea balsamifera (L.) DC.) sebagai Solusi Alternatif untuk Kemandirian Kesehatan Masyarakat Banggai Kepulauan, Sulawesi Tengah,” J. Trop. Ethnobiol., vol. III, no. 2, 2020, doi: https://doi.org/10.46359/jte.v3i2.51.

R. A. Pratiwi and A. B. D. Nandiyanto, “How to Read and Interpret UV-VIS Spectrophotometric Results in Determining the Structure of Chemical Compounds,” Indones. J. Educ. Res. Technol., vol. 2, no. 1, pp. 1–20, 2022.

K. Kusnaeni, A. M. Soleh, F. M. Afendi, and B. Sartono, “Function Group Selection of Sembung Leaves (Blumea Balsamifera) Significant To Antioxidants Using Overlapping Group Lasso,” BAREKENG J. Ilmu Mat. dan Terap., vol. 16, no. 2, pp. 721–728, 2022, doi: 10.30598/barekengvol16iss2pp721-728.

S. D. Cahya, B. Sartono, I. Indahwati, and E. Purnaningrum, “Performance of LAD-LASSO and WLAD-LASSO on High Dimensional Regression in Handling Data Containing Outliers,” JTAM (Jurnal Teor. dan Apl. Mat., vol. 6, no. 4, p. 844, 2022, doi: 10.31764/jtam.v6i4.8968.

R. Rochayati, K. Sadik, B. Sartono, and E. Purnaningrum, “Study on the performance of Robust LASSO in determining important variables data with outliers,” J. Nat., vol. 23, no. 1, pp. 9–15, 2023, doi: 10.24815/jn.v23i1.26279.

M. M. Hassan et al., “A comparative assessment of machine learning algorithms with the Least Absolute Shrinkage and Selection Operator for breast cancer detection and prediction,” Decis. Anal. J., vol. 7, p. 100245, 2023, doi: 10.1016/j.dajour.2023.100245.

A. Embark, R. Y. Haggag, S. Aboul, and F. Saleh, “A Framework for Feature Selection Using XGBoost for Prediction Banking Risk,” 2020.

Q. Zhu, X. Yu, Y. Zhao, and D. Li, “Customer churn prediction based on LASSO and Random Forest models,” in IOP Conference Series: Materials Science and Engineering, Nov. 2019, vol. 631, no. 5. doi: 10.1088/1757-899X/631/5/052008.

M. Saarela and S. Jauhiainen, “Comparison of feature importance measures as explanations for classification models,” SN Appl. Sci., vol. 3, no. 2, pp. 1–12, 2021, doi: 10.1007/s42452-021-04148-9.

J. Wu et al., “Prediction and Screening Model for Products Based on Fusion Regression and XGBoost Classification,” Comput. Intell. Neurosci., pp. 1–14, 2022, doi: 10.1155/2022/4987639.

T. Doherty et al., “A comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimator,” BMC Bioinformatics, vol. 24, no. 178, pp. 1–30, 2023, doi: 10.1186/s12859-023-05282-4.

D. Boldini, F. Grisoni, D. Kuhn, L. Friedrich, and S. A. Sieber, “Practical guidelines for the use of gradient boosting for molecular property prediction,” J. Cheminform., vol. 15, no. 73, pp. 1–13, 2023, doi: 10.1186/s13321-023-00743-7.

S. Roopashree, J. Anitha, S. Challa, T. R. Mahesh, V. K. Venkatesan, and S. Guluwadi, “Mapping of soil suitability for medicinal plants using machine learning methods,” Sci. Rep., pp. 1–17, 2024, doi: 10.1038/s41598-024-54465-3.

Y. Chen and J. Kirchmair, “Cheminformatics in Natural Product-based Drug Discovery,” Mol. Inform., vol. 39, no. 12, pp. e2000171–e2000171, Dec. 2020, doi: 10.1002/minf.202000171.

X. Y. Liew, N. Hameed, and J. Clos, “An investigation of XGBoost-based algorithm for breast cancer classification,” Mach. Learn. with Appl., vol. 6, no. April, p. 100154, 2021, doi: 10.1016/j.mlwa.2021.100154.

G. Haixiang, L. Yijing, J. Shang, G. Mingyun, and H. Yuanyue, “Learning from class-imbalanced data : Review of methods and applications,” Expert Syst. Appl., vol. 73, pp. 220–239, 2017, doi: 10.1016/j.eswa.2016.12.035.

G. Douzas, F. Bacao, and F. Last, “Improving Imbalanced Learning Through a Heuristic Oversampling Method Based on K-Means and SMOTE Georgios,” Inf. Sci. (Ny)., 2018, doi: 10.1016/j.ins.2018.06.056.

P. Mooijman, C. Catal, B. Tekinerdogan, and A. Lommen, “The effects of data balancing approaches : A case study,” Appl. Soft Comput., vol. 132, 2023, doi: https://doi.org/10.1016/j.asoc.2022.109853.

D. Yu, J. Hu, Z. Tang, H. Shen, J. Yang, and J. Yang, “Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling,” Neurocomputing, vol. 104, pp. 180–190, 2013, doi: 10.1016/j.neucom.2012.10.012.

S. Anne and A. Gueye, “CNN and XGBoost for Automatic Segmentation of Stroke Lesions International Conference on Industry Sciences and Computer Science Innovation using CT Data,” Procedia, vol. 237, pp. 72–79, 2024, doi: 10.1016/j.procs.2024.05.081.

D. Koutsandreas, E. Spiliotis, and F. Petropoulos, “On the selection of forecasting accuracy measures,” J. Oper. Res. Soc., vol. 0, no. 0, pp. 1–18, 2021, doi: 10.1080/01605682.2021.1892464.

W. Wang and Y. Lu, “Analysis of the Mean Absolute Error ( MAE ) and the Root Mean Square Error ( RMSE ) in Assessing Rounding Model,” IOP Conf. Ser. Mater. Sci. Eng., vol. 324, 2018, doi: 10.1088/1757-899X/324/1/012049.

A. Alsahaf, N. Petkov, V. Shenoy, and G. Azzopardi, “A framework for feature selection through boosting,” Expert Syst. Appl., vol. 187, no. February 2021, p. 115895, 2022, doi: 10.1016/j.eswa.2021.115895.

M. Mecozzi and E. Sturchio, “Computer Assisted Examination of Infrared and Near Infrared Spectra to Assess Structural and Molecular Changes in Biological Samples Exposed to Pollutants: A Case of Study,” J. Imaging, vol. 3, no. 1, Mar. 2017.

Published
2025-01-13
How to Cite
[1]
K. Kusnaeni, N. F. Adhalia, and A. K. Zulfattah, “IMPLEMENTATION OF FEATURE IMPORTANCE XGBOOST ALGORITHM TO DETERMINE THE ACTIVE COMPOUNDS OF SEMBUNG LEAVES (BLUMEA BALSAMIFERA)”, BAREKENG: J. Math. & App., vol. 19, no. 1, pp. 675-686, Jan. 2025.