PERFORMANCE COMPARISON OF K-MEDOIDS AND DENSITY BASED SPATIAL CLUSTERING OF APPLICATION WITH NOISE USING SILHOUETTE COEFFICIENT TEST
Abstract
Cluster analysis is a technique for grouping objects in a database based on their similar characteristics. The grouping results are said to be good if each cluster is homogeneous, and can be validated using the silhouette coefficient test. However, the presence of outliers in the data can affect the grouping results, so methods that are robust to outliers are used, such as K-Medoids and Density-Based Spatial Clustering of Applications with Noise. The purpose of this study is to compare the results and performance of the two methods using the silhouette coefficient test on data on human development indicators in South Sulawesi Province in 2021. The results of the analysis show that K-Medoids produced 2 groups, namely the districts/cities group which has indicators of human development that consist of 21 districts/cities, and the high group, which consists of 3 districts/cities, while Density-Based Spatial Clustering of Application with Noise produces 1 group that has the same characteristics, which consists of 19 districts/cities, and the remaining 5 districts/cities are identified as noise. Based on the silhouette coefficient test, K-Medoids have a greater value than Density-Based Spatial Clustering of Application with Noise, namely 0,635 and 0,544, respectively, so that K-Medoids have better performance.
Downloads
References
Badan Pusat Statistik Provinsi Sulawesi Selatan, Indeks Pembangunan Manusia Provinsi Sulawesi Selatan 2021. Sulawesi Selatan: Badan Pusat Statistik Provinsi Sulawesi Selatan, 2021.
N. Arsih, N. Hajarisman, and S. Darwis, “Metode Pengclusteran Berbasis Densitas Menggunakan Algoritma DBSCAN,” Prosiding Statistika, pp. 153–163, 2016.
I. Daqiqil Id, “Modifikasi DBSCAN (Density-Based Spatial Clustering with Noise) Pada Objek 3 Dimensi,” Jurnal Komputer Terapan , vol. 3, no. 1, pp. 41–52, May 2017, [Online]. Available: https://jurnal.pcr.ac.id/index.php/jkt/article/view/954
Suyanto, Data Mining Untuk Klasifikasi dan Klasterisasi Data. Bandung: Penerbit INFORMATIKA, 2017.
R. Hidayati, A. Zubair, A. H. Pratama, and L. Indana, “Analisis Silhouette Coefficient pada 6 Perhitungan Jarak K-Means Clustering,” Techno. Com, vol. 20, no. 2, pp. 186–197, 2021.
M. M. Putri, C. Dewi, E. Permata Siam, G. Asri Wijayanti, N. Aulia, and R. Nooraeni, “Comparison of DBSCAN and K-Means Clustering for Grouping the Village Status in Central Java 2020 Komparasi DBSCAN dan K-Means Clustering pada Pengelompokan Status Desa di Jawa Tengah Tahun 2020,” vol. 17, no. 3, pp. 394–404, 2021.
N. A. Raja, “Implemetasi Algoritma Centroid Linkage dan K-Medoids dalam Mengelompokkan Kabupaten/Kota di Sulawesi Selatan berdasarkan Indikator Pendidikan,” Universitas Hasanuddin, Sulawesi Selatan, 2020.
J. F. Hair Jr, W. C. Black, B. J. Babin, and R. E. Anderson, Multivariate Data Analysis (Seventh Edition). 2014.
C. Zhu, C. U. Idemudia, and W. Feng, “Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques,” Inform Med Unlocked, vol. 17, 2019.
M. A. Bari and W. B. Kindzierski, “Ambient volatile organic compounds (VOCs) in Calgary, Alberta: Sources and screening health risk assessment,” Science of the Total Environment, vol. 631, pp. 627–640, 2018.
R. Remesan and J. Mathew, Hydrological data driven modelling: A case study approach. 2015.
P. Arora, Deepali, and S. Varshney, “Analysis of K-Means and K-Medoids Algorithm for Big Data,” in Physics Procedia, 2016.
Athifaturrofifah, R. Goejantoro, and D. Yuniarti, “Perbandingan Pengelompokan K-Means dan K-Medoids Pada Data Potensi Kebakaran Hutan/Lahan Berdasarkan Persebaran Titik Panas (Studi Kasus: Data Titik Panas di Indonesia Pada 28 April 2018),” Jurnal EKSPONENSIAL, vol. 10, no. 2, 2019.
G. Pu, L. Wang, J. Shen, and F. Dong, “A hybrid unsupervised clustering-based anomaly detection method,” Tsinghua Sci Technol, vol. 26, no. 2, pp. 146–153, 2021.
E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, “DBSCAN revisited, revisited: why and how you should (still) use DBSCAN,” ACM Transactions on Database Systems (TODS), vol. 42, no. 3, pp. 1–21, 2017.
M. Hahsler, M. Piekenbrock, and D. Doran, “Dbscan: Fast density-based clustering with R,” J Stat Softw, vol. 91, 2019.
I. Cordova and T. S. Moh, “DBSCAN on Resilient Distributed Datasets,” in Proceedings of the 2015 International Conference on High Performance Computing and Simulation, HPCS 2015, 2015.
N. Rahmah and I. S. Sitanggang, “Determination of Optimal Epsilon (Eps) Value on DBSCAN Algorithm to Clustering Data on Peatland Hotspots in Sumatra,” in IOP Conference Series: Earth and Environmental Science, 2016.
A. I. Khurun’in, “Pengelompokan Kabupaten/Kota di Provinsi Jawa Barat Berdasarkan Tingkat Sebaran Pengangguran Menggunakan Metode Density Based Spatial Clustering Algorithm with Noise (DBSCAN),” UIN Sunan Ampel Surabaya, Surabaya, 2021.
H. Řezanková, “Different approaches to the silhouette coefficient calculation in cluster evaluation,” in 21st International Scientific Conference AMSE Applications of Mathematics and Statistics in Economics, 2018, pp. 1–10.
Copyright (c) 2023 Taufiq Akbar, Georgina Maria Tinungki, Siswanto Siswanto
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.