APPLICATION OF ADASYN OVERSAMPLING TECHNIQUE ON K-NEAREST NEIGHBOR ALGORITHM

  • Herina Marlisa Statistics Study Program, Faculty of Mathematics and Natural Sciences, Universitas Tanjungpura, Indonesia
  • Neva Satyahadewi Statistics Study Program, Faculty of Mathematics and Natural Sciences, Universitas Tanjungpura, Indonesia https://orcid.org/0000-0001-8103-1797
  • Nurfitri Imro'ah Statistics Study Program, Faculty of Mathematics and Natural Sciences, Universitas Tanjungpura, Indonesia
  • Naomi Nessyana Debataraja Statistics Study Program, Faculty of Mathematics and Natural Sciences, Universitas Tanjungpura, Indonesia
Keywords: Adaptive Synthetic, Classification, Imbalanced Data, Diabetes Mellitus

Abstract

The K-Nearest Neighbor Algorithm is a commonly used data mining algorithm for classification due to its effectiveness with large datasets and noise. However, class imbalance may impact classification results, where data with unbalanced classes may classify new data based on the majority class and ignore minority class data. The research analyzed whether applying the Adaptive Synthetic (ADASYN) oversampling technique in the K-Nearest Neighbor Algorithm can handle data imbalance problems. The study looks at the resulting accuracy, specificity, and sensitivity values. ADASYN oversamples the minority class data based on the model's difficulty level of data learning using distribution weights. This research uses the Pima Indian Diabetes Dataset from the Kaggle website. The dependent variable was diabetes mellitus status, while the independent variables were number of pregnancies, glucose levels, diastolic blood pressure, insulin levels, Body Mass Index (BMI), and age. The study found that the accuracy, specificity, and sensitivity values were 72.88%, 73.42%, and 71.79%, respectively. Based on the results of the analysis, it can be concluded that using ADASYN in the K-Nearest Neighbor Algorithm to classify diabetes mellitus in Pima Indian women is good enough to address imbalanced data. It is shown that the ADASYN oversampling technique can help the K-Nearest Neighbor Algorithm to classify new data without ignoring the data of the minority class.

Downloads

Download data is not yet available.

References

M. A. Muslim et al., Data Mining Algoritma C4.5. Semarang: UNNES Repository, 2019. Accessed: May 15, 2023. [Online]. Available: http://lib.unnes.ac.id/id/eprint/33080

Indrayanti, D. Sugianti, and M. A. Al Karomi, “Optimasi Parameter K pada Algoritma K-Nearest Neighbor untuk Klasifikasi Penyakit Diabetes Melitus,” Prosiding SNATIF, pp. 823–829, 2017, Accessed: Nov. 02, 2023. [Online]. Available: https://jurnal.umk.ac.id/index.php/SNA/article/view/1456

N. M. Putry and B. N. Sari, “Komparasi Algoritma KNN dan Naïve Bayes untuk Klasifikasi Diagnosis Penyakit Diabetes Mellitus,” EVOLUSI : Jurnal Sains dan Manajemen, vol. 10, no. 1, Sep. 2022, doi: 10.31294/evolusi.v10i1.12514.

F. Yunita, “Sistem Klasifikasi Penyakit Diabetes Mellitus Menggunakan Metode K-Nearest Neighbor (K-NN),” Selodang Mayang: Jurnal Ilmiah Badan Perencanaan Pembangunan Daerah Kabupaten Indragiri Hilir, vol. 2, no. 1, pp. 223–230, Apr. 2016, doi: https://doi.org/10.47521/selodangmayang.v2i1.10.

T. Tanti, P. Sirait, and A. Andri, “Optimalisasi Kinerja Klasifikasi Melalui Seleksi Fitur dan AdaBoost dalam Penanganan Ketidakseimbangan Kelas,” Jurnal Media Informatika Budidarma, vol. 5, no. 4, p. 1377, Oct. 2021, doi: 10.30865/mib.v5i4.3280.

D. V. Ramadhanti, R. Santoso, and T. Widiharih, “Perbandingan SMOTE dan ADASYN pada Data Imbalance untuk Klasifikasi Rumah Tangga Miskin di Kabupaten Temanggung dengan Algoritma K-Nearest Neighbor,” Jurnal Gaussian, vol. 11, no. 4, pp. 499–505, Feb. 2022, doi: 10.14710/j.gauss.11.4.499-505.

World Health Organization, Classification of Diabetes Mellitus. 2019. Accessed: Nov. 01, 2023. [Online]. Available: https://www.who.int/publications/i/item/classification-of-diabetes-mellitus

American Diabetes Association, “Standards of Medical Care in Diabetes—2015,” Diabetes Care, vol. 38, no. Supplement_1, pp. S4–S4, Jan. 2015, doi: 10.2337/dc15-S003.

International Diabetes Federation, IDF Diabetes Atlas, 10th edn. Brussels, 10th ed. Belgium: International Diabetes Federation, 2021. [Online]. Available: www.diabetesatlas.org

U. I. Lestari, A. Y. Nadhiroh, and C. Novia, “Penerapan Metode K-Nearest Neighbor Untuk Sistem Pendukung Keputusan Identifikasi Penyakit Diabetes Melitus,” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 8, no. 4, pp. 2071–2082, Dec. 2021, doi: 10.35957/jatisi.v8i4.1235.

R. R. Santoso, R. Megasari, and Y. A. Hambali, “Implementasi Metode Machine Learning Menggunakan Algoritma Evolving Artificial Neural Network Pada Kasus Prediksi Diagnosis Diabetes,” JATIKOM: Jurnal Aplikasi dan Teori Ilmu Komputer, vol. 3, no. 2, pp. 85–97, Sep. 2020, Accessed: Nov. 07, 2023. [Online]. Available: https://ejournal.upi.edu/index.php/JATIKOM/article/view/27885

N. G. Ramadhan, “Comparative Analysis of ADASYN-SVM and SMOTE-SVM Methods on the Detection of Type 2 Diabetes Mellitus,” Scientific Journal of Informatics, vol. 8, no. 2, pp. 276–282, Nov. 2021, doi: 10.15294/sji.v8i2.32484.

A. O. Durahim, “Comparison of Sampling Techniques for Imbalanced Learning,” Yönetim Bilişim Sistemleri Dergisi, vol. 2, no. 2, pp. 181–191, 2016, [Online]. Available: http://dergipark.ulakbim.gov.tr/ybs/

S. Hasmita, F. Nhita, D. Saepudin, and A. Aditsania, “Chili Commodity Price Forecasting in Bandung Regency using the Adaptive Synthetic Sampling (ADASYN) and K-Nearest Neighbor (KNN) Algorithms,” in 2019 International Conference on Information and Communications Technology (ICOIACT), IEEE, Jul. 2019, pp. 434–438. doi: 10.1109/ICOIACT46704.2019.8938525.

D. T. Larose and C. D. Larose, Discovering Knowledge in Data: An Introduction to Data Mining, Second. Canada: John Wiley & Sons, Inc., 2014. doi: 10.1002/9781118874059.

W. Hidayat, M. Ardiansyah, and A. Setyanto, “Pengaruh Algoritma ADASYN dan SMOTE terhadap Performa Support Vector Machine pada Ketidakseimbangan Dataset Airbnb,” Edumatic: Jurnal Pendidikan Informatika, vol. 5, no. 1, pp. 11–20, Jun. 2021, doi: 10.29408/edumatic.v5i1.3125.

A. M. Halim, M. Dwifebri, and F. Nhita, “Handling Imbalanced Data Sets Using SMOTE and ADASYN to Improve Classification Performance of Ecoli Data Sets,” Building of Informatics, Technology and Science (BITS), vol. 5, no. 1, Jun. 2023, doi: 10.47065/bits.v5i1.3647.

M. M. Baharuddin, H. Azis, and T. Hasanuddin, “Analisis Performa Metode K-Nearest Neighbor untuk Identifikasi Jenis Kaca,” ILKOM Jurnal Ilmiah, vol. 11, no. 3, pp. 269–274, Dec. 2019, doi: 10.33096/ilkom.v11i3.489.269-274.

A. Sugesti, Moch. A. Mukid, and Tarno, “Perbandingan Kinerja Mutual K-Nearest Neighbor (MKNN) dan K-Nearest Neighbor (KNN) dalam Analisis Klasifikasi Kelayakan Kredit,” Jurnal Gaussian, vol. 8, no. 3, pp. 366–376, 2019, doi: https://doi.org/10.14710/j.gauss.8.3.366-376.

D. Normawati and S. A. Prayogi, “Implementasi Naïve Bayes Classifier dan Confusion Matrix pada Analisis Sentimen Berbasis Teks Pada Twitter,” J-SAKTI (Jurnal Sains Komputer dan Informatika), vol. 5, no. 2, pp. 697–711, 2021, doi: http://dx.doi.org/10.30645/j-sakti.v5i2.369.

Y. Crismayella, N. Satyahadewi, and H. Perdana, “Algoritma Adaboost pada Metode Decision Tree untuk Klasifikasi Kelulusan Mahasiswa,” Jambura Journal of Mathematics, vol. 5, no. 2, pp. 278–288, Aug. 2023, doi: 10.34312/jjom.v5i2.18790.

Published
2024-07-31
How to Cite
[1]
H. Marlisa, N. Satyahadewi, N. Imro’ah, and N. Debataraja, “APPLICATION OF ADASYN OVERSAMPLING TECHNIQUE ON K-NEAREST NEIGHBOR ALGORITHM”, BAREKENG: J. Math. & App., vol. 18, no. 3, pp. 1829-1838, Jul. 2024.