PERFORMANCE COMPARISON OF K-MEDOIDS AND DENSITY BASED SPATIAL CLUSTERING OF APPLICATION WITH NOISE USING SILHOUETTE COEFFICIENT TEST

Taufiq Akbar; Georgina Maria Tinungki; Siswanto Siswanto

doi:10.30598/barekengvol17iss3pp1605-1616

Taufiq Akbar Department of Statistics, Faculty of Mathematics and Natural Sciences, Hasanuddin University, Indonesia
Georgina Maria Tinungki Department of Statistics, Faculty of Mathematics and Natural Sciences, Hasanuddin University, Indonesia
Siswanto Siswanto Department of Statistics, Faculty of Mathematics and Natural Sciences, Hasanuddin University, Indonesia

DOI: https://doi.org/10.30598/barekengvol17iss3pp1605-1616

Keywords: Cluster Analysis, DBSCAN, K-Medoids, Silhouette Coefficient

Abstract

Cluster analysis is a technique for grouping objects in a database based on their similar characteristics. The grouping results are said to be good if each cluster is homogeneous, and can be validated using the silhouette coefficient test. However, the presence of outliers in the data can affect the grouping results, so methods that are robust to outliers are used, such as K-Medoids and Density-Based Spatial Clustering of Applications with Noise. The purpose of this study is to compare the results and performance of the two methods using the silhouette coefficient test on data on human development indicators in South Sulawesi Province in 2021. The results of the analysis show that K-Medoids produced 2 groups, namely the districts/cities group which has indicators of human development that consist of 21 districts/cities, and the high group, which consists of 3 districts/cities, while Density-Based Spatial Clustering of Application with Noise produces 1 group that has the same characteristics, which consists of 19 districts/cities, and the remaining 5 districts/cities are identified as noise. Based on the silhouette coefficient test, K-Medoids have a greater value than Density-Based Spatial Clustering of Application with Noise, namely 0,635 and 0,544, respectively, so that K-Medoids have better performance.

Downloads

Download data is not yet available.

References

Badan Pusat Statistik Provinsi Sulawesi Selatan, Indeks Pembangunan Manusia Provinsi Sulawesi Selatan 2021. Sulawesi Selatan: Badan Pusat Statistik Provinsi Sulawesi Selatan, 2021.

N. Arsih, N. Hajarisman, and S. Darwis, “Metode Pengclusteran Berbasis Densitas Menggunakan Algoritma DBSCAN,” Prosiding Statistika, pp. 153–163, 2016.

I. Daqiqil Id, “Modifikasi DBSCAN (Density-Based Spatial Clustering with Noise) Pada Objek 3 Dimensi,” Jurnal Komputer Terapan , vol. 3, no. 1, pp. 41–52, May 2017, [Online]. Available: https://jurnal.pcr.ac.id/index.php/jkt/article/view/954

Suyanto, Data Mining Untuk Klasifikasi dan Klasterisasi Data. Bandung: Penerbit INFORMATIKA, 2017.

R. Hidayati, A. Zubair, A. H. Pratama, and L. Indana, “Analisis Silhouette Coefficient pada 6 Perhitungan Jarak K-Means Clustering,” Techno. Com, vol. 20, no. 2, pp. 186–197, 2021.

M. M. Putri, C. Dewi, E. Permata Siam, G. Asri Wijayanti, N. Aulia, and R. Nooraeni, “Comparison of DBSCAN and K-Means Clustering for Grouping the Village Status in Central Java 2020 Komparasi DBSCAN dan K-Means Clustering pada Pengelompokan Status Desa di Jawa Tengah Tahun 2020,” vol. 17, no. 3, pp. 394–404, 2021.

N. A. Raja, “Implemetasi Algoritma Centroid Linkage dan K-Medoids dalam Mengelompokkan Kabupaten/Kota di Sulawesi Selatan berdasarkan Indikator Pendidikan,” Universitas Hasanuddin, Sulawesi Selatan, 2020.

J. F. Hair Jr, W. C. Black, B. J. Babin, and R. E. Anderson, Multivariate Data Analysis (Seventh Edition). 2014.

C. Zhu, C. U. Idemudia, and W. Feng, “Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques,” Inform Med Unlocked, vol. 17, 2019.

M. A. Bari and W. B. Kindzierski, “Ambient volatile organic compounds (VOCs) in Calgary, Alberta: Sources and screening health risk assessment,” Science of the Total Environment, vol. 631, pp. 627–640, 2018.

R. Remesan and J. Mathew, Hydrological data driven modelling: A case study approach. 2015.

P. Arora, Deepali, and S. Varshney, “Analysis of K-Means and K-Medoids Algorithm for Big Data,” in Physics Procedia, 2016.

Athifaturrofifah, R. Goejantoro, and D. Yuniarti, “Perbandingan Pengelompokan K-Means dan K-Medoids Pada Data Potensi Kebakaran Hutan/Lahan Berdasarkan Persebaran Titik Panas (Studi Kasus: Data Titik Panas di Indonesia Pada 28 April 2018),” Jurnal EKSPONENSIAL, vol. 10, no. 2, 2019.

G. Pu, L. Wang, J. Shen, and F. Dong, “A hybrid unsupervised clustering-based anomaly detection method,” Tsinghua Sci Technol, vol. 26, no. 2, pp. 146–153, 2021.

E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, “DBSCAN revisited, revisited: why and how you should (still) use DBSCAN,” ACM Transactions on Database Systems (TODS), vol. 42, no. 3, pp. 1–21, 2017.

M. Hahsler, M. Piekenbrock, and D. Doran, “Dbscan: Fast density-based clustering with R,” J Stat Softw, vol. 91, 2019.

I. Cordova and T. S. Moh, “DBSCAN on Resilient Distributed Datasets,” in Proceedings of the 2015 International Conference on High Performance Computing and Simulation, HPCS 2015, 2015.

N. Rahmah and I. S. Sitanggang, “Determination of Optimal Epsilon (Eps) Value on DBSCAN Algorithm to Clustering Data on Peatland Hotspots in Sumatra,” in IOP Conference Series: Earth and Environmental Science, 2016.

A. I. Khurun’in, “Pengelompokan Kabupaten/Kota di Provinsi Jawa Barat Berdasarkan Tingkat Sebaran Pengangguran Menggunakan Metode Density Based Spatial Clustering Algorithm with Noise (DBSCAN),” UIN Sunan Ampel Surabaya, Surabaya, 2021.

H. Řezanková, “Different approaches to the silhouette coefficient calculation in cluster evaluation,” in 21st International Scientific Conference AMSE Applications of Mathematics and Statistics in Economics, 2018, pp. 1–10.

PERFORMANCE COMPARISON OF K-MEDOIDS AND DENSITY BASED SPATIAL CLUSTERING OF APPLICATION WITH NOISE USING SILHOUETTE COEFFICIENT TEST

Abstract

Downloads

References

Most read articles by the same author(s)

Editorial Office

Contact Info