PERFORMANCE COMPARISON OF DECISION TREE AND LOGISTIC REGRESSION METHODS FOR CLASSIFICATION OF SNP GENETIC DATA

  • Adi Setiawan Department of Data Science, Faculty of Science and Mathematics, Satya Wacana Christian University, Indonesia
  • Febi Setivani Department of Mathematics, Faculty of Science and Mathematics, Satya Wacana Christian University, Indonesia
  • Tundjung Mahatma Department of Mathematics, Faculty of Science and Mathematics, Satya Wacana Christian University, Indonesia
Keywords: Accuracy, Decision Tree, Logistic Regression

Abstract

This research was conducted to compare the accuracy when decision tree and logistic regression methods are used on some data. Decision tree is one method of classification techniques in data mining. In the decision tree method, very large data samples will be represented as smaller rules, and logistic regression is a method that aims to determine the effect of an independent variable on other variables, namely dichotomous dependent variables. Both algorithms were written and analyzed using R software to see which method is better between the decision tree method and the logistic regression method applied to SNP (Single Nucleotide Polymorphism) genetic data, namely Asthma data. SNP Genetic Data was obtained from R software with the package name "SNPassoc" and the data name "asthma". Asthma data has 57 features, namely Country, Gender, Age, BMI, Smoke, Case control, and SNP (Single Nucleotide Polymorphism) genetic code. Comparative analysis was carried out based on the results of the accuracy values obtained in the two methods. Variations in the proportion of the test data used were 40%, 30%, 20% and 10% and were simulated 1000 times on the grounds of obtaining a better accuracy value. The results obtained show that the decision tree method obtains an accuracy value of 0.5793, 0.5777, 0.5745, 0.5526, respectively, while the logistic regression method is 0.7696, 0.7729, 0.7763, 0.7788, respectively and they are achieved at the proportion of test data of 40%, 30%, 20%, 10%. Thus it can be concluded that in this case the logistic regression method is better than the decision tree method in classifying Asthma data.

Downloads

Download data is not yet available.

References

H. Sulistiani and A. A. Aldino, “Decision Tree C4.5 Algorithm for Tuition Aid Grant Program Classification (Case Study: Department of Information System, Universitas Teknokrat Indonesia),” Edutic - Sci. J. Informatics Educ., vol. 7, no. 1, pp. 40–50, 2020, doi: 10.21107/edutic.v7i1.8849.

S. A. Zega, “Penggunaan Pohon Keputusan untuk Klasifikasi Tingkat Kualitas Mahasiwa Berdasarkan Jalur Masuk Kuliah,” Semin. Nas. Apl. Teknol. Inf. Yogyakarta, pp. 7–13, 2014.

V. Anestiviya, A. Ferico, and O. Pasaribu, “Analisis Pola Menggunakan Metode C4.5 Untuk Peminatan Jurusan Siswa Berdasarkan Kurikulum (Studi Kasus : Sman 1 Natar),” J. Teknol. dan Sist. Inf., vol. 2, no. 1, pp. 80–85, 2021, [Online]. Available: http://jim.teknokrat.ac.id/index.php/JTSI

A. Tangkelayuk, “The Klasifikasi Kualitas Air Menggunakan Metode KNN, Naïve Bayes, dan Decision Tree,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 9, no. 2, pp. 1109–1119, 2022, doi: 10.35957/jatisi.v9i2.2048.

P. Juwita, Sugiman, and P. Hendikawati, “Ketepatan Klasifikasi Metode Rergresi Logistik dan Metode Chaid dengan Pembobotan Sampel,” vol. 45, no. 1, pp. 1–8, 2021, [Online]. Available: https://journal.unnes.ac.id/nju/index.php/JM/article/view/32699/12083

E. D. Anggara, A. Widjaja, and B. R. Suteja, “Prediksi Kinerja Pegawai sebagai Rekomendasi Kenaikan Golongan dengan Metode Decision Tree dan Regresi Logistik,” J. Tek. Inform. dan Sist. Inf., vol. 8, no. 1, pp. 218–234, 2022, doi: 10.28932/jutisi.v8i1.4479.

F. Hajjej, M. A. Alohali, M. Badr, and M. A. Rahman, “A Comparison of Decision Tree Algorithms in the Assessment of Biomedical Data,” vol. 2022, pp. 1–9, 2022, doi: 10.1155/2023/9810245.

Y. Liu and S. Yang, “Application of Decision Tree-Based Classification Algorithm on Content Marketing,” J. Math., vol. 2022, 2022, doi: 10.1155/2022/6469054.

A. Arista, “Comparison Decision Tree and Logistic Regression Machine Learning Classification Algorithms to determine Covid-19,” Sinkron, vol. 7, no. 1, pp. 59–65, 2022, doi: 10.33395/sinkron.v7i1.11243.

Marji and S. Handoyo, “Performance of Ridge Logistic Regression and Decision Tree in the Binary Classification,” J. Theor. Appl. Inf. Technol., vol. 100, no. 15, pp. 4698–4709, 2022.

Y. Mardi, “Data Mining : Klasifikasi Menggunakan Algoritma C4.5,” Edik Inform., vol. 2, no. 2, pp. 213–219, 2017, doi: 10.22202/ei.2016.v2i2.1465.

A. Setiawan, “Perbandingan Penggunaan Jarak Manhattan, Jarak Euclid, dan Jarak Minkowski dalam Klasifikasi Menggunakan Metode KNN pada Data Iris,” J. Sains dan Edukasi Sains, vol. 5, no. 1, pp. 28–37, 2022, doi: 10.24246/juses.v5i1p28-37.

F. Dwi Meliani Achmad, Budanis, Slamat, “Klasifikasi Data Karyawan Untuk Menentukan Jadwal Kerja Menggunakan Metode Decision Tree,” J. IPTEK, vol. 16, no. 1, pp. 18–23, 2012, [Online]. Available: http://jurnal.itats.ac.id/wp-content/uploads/2013/06/3.-BUDANIS-FINAL-hal-17-23.pdf

Robianto ; Sampe Hotlan Sitorus ; Uray Ristian, “Penerapan Metode Decision Tree Untuk Mengklasifikasikan Mutu Buah Jeruk BerdasarkanFitur Warna Dan Ukuran,” J. Komput. dan Apl., vol. 9, no. 01, pp. 76–86, 2021.

S. Bahri and A. Lubis, “Metode Klasifikasi Decision Tree Untuk Memprediksi Juara English Premier League,” J. Sintaksis, vol. 2, no. 1, pp. 63–70, 2020.

P. B. N. Setio, D. R. S. Saputro, and Bowo Winarno, “Klasifikasi Dengan Pohon Keputusan Berbasis Algoritme C4.5,” Prism. Pros. Semin. Nas. Mat., vol. 3, pp. 64–71, 2020.

F. A. Novianti and S. W. Purnami, “Analisis Diagnosis Pasien Kanker Payudara Menggunakan Regresi Logistik dan Support Vector Machine (SVM) Berdasarkan Hasil Mamografi,” J. SAINS dan Seni ITS, vol. 1, no. 1, pp. D147–D152, 2012.

A. Z. Z. Hariro, I. Sabina, and M. Jannah, “Analisis Regresi Pada Pembelajaran Statistik Ilmu Sosial,” J. Bakti Sos., vol. 1, no. 1, pp. 7–13, 2022, [Online]. Available: https://jurnal.asrypersadaquality.com/index.php/baktisosial/article/view/130/171

M. P. LaValley, “Logistic regression,” Circulation, vol. 117, no. 18, pp. 2395–2399, 2008, doi: 10.1161/CIRCULATIONAHA.106.682658.

J. M. Hilbe, Practical guide to logistic regression. 2016. doi: 10.18637/jss.v071.b03.

E. Sofha, H. Yasin, and R. Rahmawati, “Klasifikasi Data Berat Bayi Lahir Menggunakan Probabilistic Neural Network dan Regresi Logistik (Studi Kasus di Rumah Sakit Islam Sultan Agung Semarang Tahun 2014),” J. Gaussian, vol. 4, pp. 815–824, 2015, [Online]. Available: http://ejournal-s1.undip.ac.id/index.php/gaussian

S. R. Diaprina and S. Suhartono, “Analisis Klasifikasi Kredit Menggunakan Regresi Logistik Biner Dan Radial Basis Function Network di Bank ‘X’ Cabang Kediri,” J. Sains dan Seni ITS, vol. 3, no. 2, pp. D218–D223, 2014, [Online]. Available: https://ejurnal.its.ac.id/index.php/sains_seni/article/view/8139%0Ahttps://ejurnal.its.ac.id

R. Ariadni and I. Arieshanti, “Implementasi Metode Pohon Keputusan untuk Klasifikasi Data dengan Nilai Fitur yang Tidak Pasti,” ResearchGate, no. June, pp. 3–5, 2015.

M. Y. Firmansyah, “Penerapan Algoritma Itterative Dechotomiser 3 ( ID3 ) Untuk Klasifikasi Penyakit Tifoid,” vol. 3, pp. 1–6, 2019.

Published
2024-03-01
How to Cite
[1]
A. Setiawan, F. Setivani, and T. Mahatma, “PERFORMANCE COMPARISON OF DECISION TREE AND LOGISTIC REGRESSION METHODS FOR CLASSIFICATION OF SNP GENETIC DATA”, BAREKENG: J. Math. & App., vol. 18, no. 1, pp. 0403-0412, Mar. 2024.