AGGLOMERATIVE HIERARCHICAL CLUSTERING ANALYSIS IN PREDICTING ANTIBACTERIAL ACTIVITY OF COMPOUND BASED ON CHEMICAL STRUCTURE SIMILARITY
Abstract
Resistance to antibiotics is increasing to alarmingly high levels. As antibiotics are less effective, more infections are becoming more complex and often impossible to treat. Numerous antibiotics discovered in marine organisms show that the marine environment, which accounts for over half of the world's biodiversity, is a massive source for novel antibiotics and that this resource must be explored to identify next-generation antibiotics. This research aimed to predict antibacterial activity in marine compounds using a computational approach to reduce the cost and time of finding marine organisms, extracting, and testing numerous unknown marine compounds' bioactivities. We used a simple unsupervised learning approach to predict the biological activity of marine compounds using agglomerative hierarchical clustering. We mixed antibiotic drug data in DrugBank Database and chemical compound data from marine organisms in literature to compile our dataset. We applied five linkage methods in our dataset and compared the best method by assessing internal validation measurement. We found that the Ward with squared dissimilarity matrix is the best method in the dataset, and ten compounds from 73 compounds of the marine compound are determined as potential marine compounds which have antibacterial activity.
Downloads
References
L. Verstraete, B. Van den Bergh, N. Verstraeten, and J. Michiels, “Ecology and evolution of antibiotic persistence,” Trends in Microbiology, vol. 30, no. 5. pp. 466–479, 2022, doi: 10.1016/j.tim.2021.10.001.
L. Sarvananda and A. D Premarathne, "The Growing Of Antibiotic Resistance: A Short Viewpoint," Pharm. Pharmacol. Res., vol. 5, no. 3, pp. 01–02, 2022, doi: 10.31579/2693-7247/068.
H. Li et al., "Discovery of Marine Natural Products as Promising Antibiotics against Pseudomonas aeruginosa," Mar. Drugs, vol. 20, no. 3, 2022, doi: 10.3390/md20030192.
T. B. Ng, R. Chi, F. Cheung, J. H. Wong, and A. A. Bekhit, "Antibacterial products of marine organisms," 2015, doi: 10.1007/s00253-015-6553-x.
G. Zhang, J. Li, T. Zhu, Q. Gu, and D. Li, "Advanced tools in marine natural drug discovery," Curr. Opin. Biotechnol., vol. 42, pp. 13–23, 2016, doi: 10.1016/j.copbio.2016.02.021.
J. J. Irwin, G. Gaskins, T. Sterling, M. M. Mysinger, and M. J. Keiser, "Predicted Biological Activity of Purchasable Chemical Space," J. Chem. Inf. Model., vol. 58, no. 1, pp. 148–164, 2018, doi: 10.1021/acs.jcim.7b00316.
P. V. Pogodin, A. A. Lagunin, A. V. Rudik, D. S. Druzhilovskiy, D. A. Filimonov, and V. V. Poroikov, "AntiBac-Pred: A Web Application for Predicting Antibacterial Activity of Chemical Compounds," J. Chem. Inf. Model., vol. 59, no. 11, pp. 4513–4518, 2019, doi: 10.1021/acs.jcim.9b00436.
V. Periwal et al., "Bioactivity assessment of natural compounds using machine learning models trained on target similarity between drugs," PLoS Comput. Biol., vol. 18, no. 4, p. e1010029, 2022.
S. Kim et al., "PubChem in 2021 : new data content and improved web interfaces," vol. 49, no. November 2020, pp. 1388–1395, 2021, doi: 10.1093/nar/gkaa971.
D. S. Wishart et al., "DrugBank 5.0: a major update to the DrugBank database for 2018," Nucleic Acids Res., vol. 46, no. D1, pp. D1074–D1082, 2018.
Y. Cao, T. Backman, K. Horan, and T. Girke, "ChemmineR: Cheminformatics Toolkit for R." Citeseer, 2014.
N. Hanif, A. Murni, C. Tanaka, and J. Tanaka, “Marine natural products from Indonesian waters,” Mar. Drugs, vol. 17, no. 6, 2019, doi: 10.3390/md17060364.
L. Billard and E. Diday, "Agglomerative Hierarchical Clustering," Clustering Methodology for Symbolic Data. pp. 261–316, 2019, doi: 10.1002/9781119010401.ch8.
A. M. Jarman, "Hierarchical cluster analysis: Comparison of single linkage, complete linkage, average linkage and centroid linkage method," Georg. South. Univ., 2020.
S. Miyamoto, R. Abe, Y. Endo, and J.-I. Takeshita, "Ward method of hierarchical clustering for non-Euclidean similarity measures," in 2015 7th International Conference of Soft Computing and Pattern Recognition (SoCPaR), 2015, pp. 60–63.
T. Gupta and S. P. Panda, "Clustering validation of CLARA and k-means using silhouette & DUNN measures on Iris dataset," in 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon), 2019, pp. 10–13.
J. Baarsch and M. E. Celebi, "Investigation of internal validity measures for K-means clustering," in Proceedings of the international multiconference of engineers and computer scientists, 2012, vol. 1, pp. 14–16.
Copyright (c) 2022 Siswanto Siswanto, Nur Hilal A Syahrir
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.