COMPARISON OF ANN METHOD AND LOGISTIC REGRESSION METHOD ON SINGLE NUCLEOTIDE POLYMORPHISM GENETIC DATA

  • Adi Setiawan Department of Mathematics and Data Science, Faculty of Science and Mathematics, Universitas Kristen Satya Wacana, Indonesia
  • Rachel Wulan Nirmalasari Wijaya Department of Mathematics and Data Science, Faculty of Science and Mathematics, Universitas Kristen Satya Wacana, Indonesia
Keywords: classification, single nucleotide polymorphism, neural network

Abstract

This study aims to determine the goodness of classification using the ANN method on Asthma genetic data in the R program package, namely SNPassoc. SNP genetic data was transformed using codominant genetic traits, namely for genetic data AA, AC, CC were given a score of 0, 0.5 and 1, respectively, while CC, CT and TT were scored 0, 0.5 and 1, respectively. The scoring is based on the smallest alphabetical order given a low score. The average accuracy, precision, recall and F1 score were determined using the neural network method if the genetic code was used with variations in the proportion of test data 10%, 20%, 30% and 40% and repeated B = 1000 times. The results obtained were compared with the logistic regression method. If 20% test data is used and the ANN method is used, the accuracy, precision, recall and F1 scores are 0.7756, 0.7844, 0.9844 and 0.8728, respectively. When all information from various countries is used in the Asthma genetic data, the logistic regression method gives higher average accuracy, precision and F1 scores than the ANN method, but the average recall is the opposite. When a separate analysis is performed for each country, the logistic regression method gives higher accuracy, precision, recall and F1 scores in the ANN method compared to the logistic regression method.

Downloads

Download data is not yet available.

References

H. Sartor et al., “The Association of Single Nucleotide Polymorphisms (SNPs) with Breast Density and Breast Cancer Survival: the Malmö Diet and Cancer Study,” Acta Radiologica, vol. 61, no. 10, pp. 1326–1334, 2020.

M. N. Mikhail, A. Y. Sayed, M. S. Mabrouk, and A. M. Eldeib, “Investigation of Genome-Wide Association SNPs and Alzheimer ’ s Disease,” American Journal of Biomedical Engineering, vol. 10, no. 1, pp. 1–8, 2020.

D. saber Morgan, R. A. Mohamed, M. M. Abdelkhalek, and A. A. Mohamed, “Detection of Single Nucleotide Polymorphism (SNP) (rs34819629) and its Association with Pediatric Type 1 Diabetes Mellitus Dalia,” Egyptian Journal of Medical Research (EJMR), vol. 3, no. 2, pp. 185–195, 2022.

Y. Tursinawati, R. F. Hakim, A. Rohmani, A. Kartikadewi, and F. Sandra, “CAPN10 SNP-19 is Asociated with Susceptibility of Type 2 Diabetes Mellitus: A Javanese Case-Control Study,” Indonesian Biomedical Journal, vol. 12, no. 2, pp. 109–114, 2020.

Y. C. Kim et al., “Genome-Wide Association Study Identifies Eight Novel Loci for Susceptibility of Scrub Typhus and Highlights Immune-Related Signaling Pathways in Its Pathogenesis,” Cells, vol. 10, no. 3, 2021.

C. A. C. Montañez, P. Fergus, A. C. Montañez, A. Hussain, D. Al-Jumeily, and C. Chalmers, “Deep Learning Classification of Polygenic Obesity using Genome Wide Association Study SNPs,” 2018.

Y. Tomita et al., “Artificial Neural Network Approach for Selection of Susceptible Single Nucleotide Polymorphisms and Construction of Prediction Model on Childhood Allergic Asthma,” BMC Bioinformatics, vol. 5, no. 1, 2004.

J. Stangierski, D. Weiss, and A. Kaczmarek, “Multiple Regression Models and Artificial Neural Network (ANN) as Prediction Tools of Changes in Overall Quality during the Storage of Spreadable Processed Gouda Cheese,” European Food Research and Technology, vol. 245, no. 11, pp. 2539–2547, 2019.

V. Gahlaut, V. Jaiswal, S. Singh, H. S. Balyan, and P. K. Gupta, “Multi-Locus Genome Wide Association Mapping for Yield and Its Contributing Traits in Hexaploid Wheat under Different Water Regimes,” Scientific Reports, vol. 9, no. 1, 2019.

P. Schober and T. R. Vetter, “Linear Regression in Medical Research,” Anesthesia & Analgesia, vol. 132, no. 1, pp. 108–109, 2019.

S. Ghosal, S. Sengupta, M. Majumder, and B. Sinha, “Linear Regression Analysis to Predict the Number of Deaths in India due to SARS-CoV-2 at 6 Weeks from Day 0 (100 Cases - March 14th 2020),” Diabetes and Metabolic Syndrome: Clinical Research and Reviews, vol. 14, no. 4, pp. 311–315, 2020.

S. I. Bangdiwala, “Regression: Binary Logistic,” International Journal of Injury Control and Safety Promotion, vol. 25, no. 3, pp. 336–338, 2018.

N. Srimaneekarn, A. Hayter, W. Liu, and C. Tantipoj, “Binary Response Analysis using Logistic Regression in Dentistry,” International Journal of Dentistry, vol. 2022, 2022.

D. Graupe, Principles of Artificial Neural Networks, 3rd ed. Jurong East: World Scientific Publishing Co. Pte. Ltd. All, 2013.

M. L. Minsky and S. A. Papert, Perceptrons, Reissue of the 1988 Expanded Edition with a new foreword by Léon Bottou: An Introduction to Computational Geometry. Cambridge: MIT Press, 2017.

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Internal Representations by Error Propagation,” San Diego La, 1986.

J. R. González et al., “SNPassoc: An R Package to Perform Whole Genome Association Studies,” Bioinformatics, vol. 23, no. 5, pp. 644–645, 2007.

J. Gaudillo et al., “Machine Learning Approach to Single Nucleotide Polymorphism-based Asthma Prediction,” PLOS ONE Journal, vol. 14, no. 12, 2019.

H. Soumare, S. Rezgui, N. Gmati, and A. Benkahla, “New Neural Network Classification Method for Individuals Ancestry Prediction from SNPs data,” BioData Mining, vol. 14, no. 1, 2021.

F. J. Shaikh and D. S. Rao, “Prediction of Cancer Disease using Machine Learning Approach,” Materials Today: Proceedings, vol. 50, pp. 40–47, 2022.

L. Besic, I. Muhovic, A. Asic, A. Catic, L. Gurbeta, and A. Badnjevic, “Application of Neural Networks to the Prediction of a Phenotypic Trait of Pacific Lampreys based on Single Nucleotide Polymorphism (SNP) Genetic Markers,” Biomedical Research and Clinical Practice, vol. 2, no. 5, pp. 1–7, 2017.

P. Nanglia, S. Kumar, A. N. Mahajan, P. Singh, and D. Rathee, “A Hybrid Algorithm for Lung Cancer Classification using SVM and Neural Networks,” ICT Express, vol. 7, no. 3, pp. 335–341, 2021.

Published
2023-04-16
How to Cite
[1]
A. Setiawan and R. Wijaya, “COMPARISON OF ANN METHOD AND LOGISTIC REGRESSION METHOD ON SINGLE NUCLEOTIDE POLYMORPHISM GENETIC DATA”, BAREKENG: J. Math. & App., vol. 17, no. 1, pp. 0197-0210, Apr. 2023.