IMPLEMENTATION OF FEATURE IMPORTANCE XGBOOST ALGORITHM TO DETERMINE THE ACTIVE COMPOUNDS OF SEMBUNG LEAVES (BLUMEA BALSAMIFERA)
Abstract
Sembung is a medicinal plant native to Indonesia that grows optimally in tropical climates. The secondary metabolite compounds found in the leaves of sembung are biopharmaceutical active ingredients. Fourier Transform Infrared (FTIR) spectroscopy can identify the functional compounds in sembung leaves by analyzing unique peaks in the spectrum, which correspond to specific functional groups of the compounds. In this research, 35 observations were made with 1,866 explanatory variables (wavelengths). Data in which the number of explanatory variables surpasses the number of observations is known as high-dimensional data. One method that can handle high-dimensional problems is to select important variables that affect the objective variable. The XGBoost algorithm can calculate the feature importance score that affects the goal variable so that it does not have to include all variables in the modeling, this can overcome problems in high-dimensional data. The results of the calculation of feature importance found Lignin Skeletal Band, CH, and CH2 aliphatic Stretching Group, C=C, C=N, C–H in ring structure, DNA and RNA backbones, NH2 Aminoacidic Group, C=O Ester Fatty Acid that the active compounds contained in the leaves of sembung.
Downloads
References
S. Suboh, I. Abdul, S. Milleana, and S. Akmar, “A Systematic Review of Anomaly Detection within High Dimensional and Multivariate Data,” JOIV Int. J. Inform. Vis., vol. 7, no. March, 2023.
J. I. Daoud, “Multicollinearity and Regression Analysis,” J. Phys. Conf. Ser., vol. 949, 2017, doi: doi :10.1088/1742-6596/949/1/012009.
S. Wahjuni, I. Bagus, P. Manuaba, and N. M. Puspawati, “Peningkatan Kesejahteraan Masyarakat Dimasa Pandemi Covid 19 dengan Pelatihn Pengemasan Produk Loloh Daun Sembung (Blumea Balsamifera) di Banjar Dinas Apit Yeh Kaja, Desa Manggis Kabupaten Karangasem,” J. Pengabdi. Kpd. Masy. Fak. Ekon. dan Bisnis UNMAS Denpasar, vol. 1, no. 3, pp. 230–236, 2021.
W. Wardah and E. S. Kuncari, “Kajian Etnobotani Pakundalang (Blumea balsamifera (L.) DC.) sebagai Solusi Alternatif untuk Kemandirian Kesehatan Masyarakat Banggai Kepulauan, Sulawesi Tengah,” J. Trop. Ethnobiol., vol. III, no. 2, 2020, doi: https://doi.org/10.46359/jte.v3i2.51.
R. A. Pratiwi and A. B. D. Nandiyanto, “How to Read and Interpret UV-VIS Spectrophotometric Results in Determining the Structure of Chemical Compounds,” Indones. J. Educ. Res. Technol., vol. 2, no. 1, pp. 1–20, 2022.
K. Kusnaeni, A. M. Soleh, F. M. Afendi, and B. Sartono, “Function Group Selection of Sembung Leaves (Blumea Balsamifera) Significant To Antioxidants Using Overlapping Group Lasso,” BAREKENG J. Ilmu Mat. dan Terap., vol. 16, no. 2, pp. 721–728, 2022, doi: 10.30598/barekengvol16iss2pp721-728.
S. D. Cahya, B. Sartono, I. Indahwati, and E. Purnaningrum, “Performance of LAD-LASSO and WLAD-LASSO on High Dimensional Regression in Handling Data Containing Outliers,” JTAM (Jurnal Teor. dan Apl. Mat., vol. 6, no. 4, p. 844, 2022, doi: 10.31764/jtam.v6i4.8968.
R. Rochayati, K. Sadik, B. Sartono, and E. Purnaningrum, “Study on the performance of Robust LASSO in determining important variables data with outliers,” J. Nat., vol. 23, no. 1, pp. 9–15, 2023, doi: 10.24815/jn.v23i1.26279.
M. M. Hassan et al., “A comparative assessment of machine learning algorithms with the Least Absolute Shrinkage and Selection Operator for breast cancer detection and prediction,” Decis. Anal. J., vol. 7, p. 100245, 2023, doi: 10.1016/j.dajour.2023.100245.
A. Embark, R. Y. Haggag, S. Aboul, and F. Saleh, “A Framework for Feature Selection Using XGBoost for Prediction Banking Risk,” 2020.
Q. Zhu, X. Yu, Y. Zhao, and D. Li, “Customer churn prediction based on LASSO and Random Forest models,” in IOP Conference Series: Materials Science and Engineering, Nov. 2019, vol. 631, no. 5. doi: 10.1088/1757-899X/631/5/052008.
M. Saarela and S. Jauhiainen, “Comparison of feature importance measures as explanations for classification models,” SN Appl. Sci., vol. 3, no. 2, pp. 1–12, 2021, doi: 10.1007/s42452-021-04148-9.
J. Wu et al., “Prediction and Screening Model for Products Based on Fusion Regression and XGBoost Classification,” Comput. Intell. Neurosci., pp. 1–14, 2022, doi: 10.1155/2022/4987639.
T. Doherty et al., “A comparison of feature selection methodologies and learning algorithms in the development of a DNA methylation-based telomere length estimator,” BMC Bioinformatics, vol. 24, no. 178, pp. 1–30, 2023, doi: 10.1186/s12859-023-05282-4.
D. Boldini, F. Grisoni, D. Kuhn, L. Friedrich, and S. A. Sieber, “Practical guidelines for the use of gradient boosting for molecular property prediction,” J. Cheminform., vol. 15, no. 73, pp. 1–13, 2023, doi: 10.1186/s13321-023-00743-7.
S. Roopashree, J. Anitha, S. Challa, T. R. Mahesh, V. K. Venkatesan, and S. Guluwadi, “Mapping of soil suitability for medicinal plants using machine learning methods,” Sci. Rep., pp. 1–17, 2024, doi: 10.1038/s41598-024-54465-3.
Y. Chen and J. Kirchmair, “Cheminformatics in Natural Product-based Drug Discovery,” Mol. Inform., vol. 39, no. 12, pp. e2000171–e2000171, Dec. 2020, doi: 10.1002/minf.202000171.
X. Y. Liew, N. Hameed, and J. Clos, “An investigation of XGBoost-based algorithm for breast cancer classification,” Mach. Learn. with Appl., vol. 6, no. April, p. 100154, 2021, doi: 10.1016/j.mlwa.2021.100154.
G. Haixiang, L. Yijing, J. Shang, G. Mingyun, and H. Yuanyue, “Learning from class-imbalanced data : Review of methods and applications,” Expert Syst. Appl., vol. 73, pp. 220–239, 2017, doi: 10.1016/j.eswa.2016.12.035.
G. Douzas, F. Bacao, and F. Last, “Improving Imbalanced Learning Through a Heuristic Oversampling Method Based on K-Means and SMOTE Georgios,” Inf. Sci. (Ny)., 2018, doi: 10.1016/j.ins.2018.06.056.
P. Mooijman, C. Catal, B. Tekinerdogan, and A. Lommen, “The effects of data balancing approaches : A case study,” Appl. Soft Comput., vol. 132, 2023, doi: https://doi.org/10.1016/j.asoc.2022.109853.
D. Yu, J. Hu, Z. Tang, H. Shen, J. Yang, and J. Yang, “Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling,” Neurocomputing, vol. 104, pp. 180–190, 2013, doi: 10.1016/j.neucom.2012.10.012.
S. Anne and A. Gueye, “CNN and XGBoost for Automatic Segmentation of Stroke Lesions International Conference on Industry Sciences and Computer Science Innovation using CT Data,” Procedia, vol. 237, pp. 72–79, 2024, doi: 10.1016/j.procs.2024.05.081.
D. Koutsandreas, E. Spiliotis, and F. Petropoulos, “On the selection of forecasting accuracy measures,” J. Oper. Res. Soc., vol. 0, no. 0, pp. 1–18, 2021, doi: 10.1080/01605682.2021.1892464.
W. Wang and Y. Lu, “Analysis of the Mean Absolute Error ( MAE ) and the Root Mean Square Error ( RMSE ) in Assessing Rounding Model,” IOP Conf. Ser. Mater. Sci. Eng., vol. 324, 2018, doi: 10.1088/1757-899X/324/1/012049.
A. Alsahaf, N. Petkov, V. Shenoy, and G. Azzopardi, “A framework for feature selection through boosting,” Expert Syst. Appl., vol. 187, no. February 2021, p. 115895, 2022, doi: 10.1016/j.eswa.2021.115895.
M. Mecozzi and E. Sturchio, “Computer Assisted Examination of Infrared and Near Infrared Spectra to Assess Structural and Molecular Changes in Biological Samples Exposed to Pollutants: A Case of Study,” J. Imaging, vol. 3, no. 1, Mar. 2017.
Copyright (c) 2025 Kusnaeni Kusnaeni, Nurul Fuady Adhalia, Abdul Khaliq Zulfattah
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.