COMPARISON OF K-MEANS AND GAUSSIAN MIXTURE MODEL IN PROFILING AREAS BY POVERTY INDICATORS
Abstract
The Covid-19 pandemic has led to income degradation of the Indonesia population which potentially triggers poverty. According to the Indonesian Central Statistics Agency, the Province of Central Java is one of the areas that is most affected by Covid-19 especially on the economic aspect. In 2020, the percentage of poor people has increased by 0.6% from 2019. If this condition is ignored for the long term, it will have a negative impact on hampering national development. As a first step in designing a strategy for mitigating the impact of poverty, it is necessary to carry out an appropriate profiling of the areas affected on the economic aspect based on poverty indicators. This study compares the K-Means Clustering and Gaussian Mixture Model (GMM) in providing the best data grouping based on clustering indexes, including: connectivity, Dunn, and silhouette. GMM is a generalization of K-Means clustering to include information about the covariance structure of the data as well as latent Gaussian centers. We used poverty indicators data from Central Statistics Agency of Central Java, such as poverty line, percentage of poor population, poverty depth index, and poverty severity index. The results obtained from this study indicate that the GMM gives the best results with the 3 clusters, with the number of members for the first, second, third is 10, 19, and 6 respectively.
Downloads
References
T. Tambunan, Perekonomian Indonesia (Teori dan Temuan Empiris). Jakarta: Ghalia Indonesia, 2001.
F. A. Hafiez, “Ini 5 Provinsi Penyumbang Kasus Covid-19 Terbanyak,” medcom.id, Mar. 02, 2022.
N. I. Febianto and N. D. Palasara, “Analisis Clustering K-Means Pada Data Informasi Kemiskinan Ddi Jawa Barat Tahun 2018,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 8, no. 2, pp. 130–140, 2019.
K. Aprilia and F. Sembiring, “Analisis Garis Kemiskinan Makanan Menggunakan Metode Algoritma K-Means Clustering,” in Seminar Nasional Sistem Informasi dan Manajemen Informatika, 2021, pp. 1–10.
D. Widyadhan, R. B. Hastuti, I. Kharisudin, and F. Fauzi, “Perbandingan Analisis Klaster K-Means dan Average Linkage untuk Pengklasteran Kemiskinan di Provinsi Jawa Tengah,” in PRISMA: Prosiding Seminar Nasional Matematika, 2021, pp. 584–594.
S. A. Prabawa, “Perbandingan Algoritma K-Means dan Gaussian Mixture Model untuk Pengelompoka Berita pada Kompas.com,” Universitas Multimedia Nusantara, Tangerang, 2021.
J. F. Hair, W. C. Black, B. J. Babin, and R. E. Anderson, Multivariate Data Analysis, 7th Edition. New York City: Pearson Education Limited, 2013.
S. Yamin, L. A. Rachmach, and H. Kurniawan, Regresi Dan Korelasi dalam genggaman Anda: Aplikasi dengan Software SPSS, Eviews, MINITAB, dan STATGRAPHICS. Jakarta: Salemba Empat, 2011.
J. I. Daoud, “Multicollinearity and Regression Analysis,” Journal of Physics: Conference Series 949, pp. 1–6, 2017.
R. A. Johnson and D. W. Wichern, Applied Multivariate Statistical Analysis 6th edition. United States of America: Pearson Education Inc., 2007.
J. F. J. Hair, W. C. Black, B. J. Babin, R. E. Anderson, and R. L. Tatham, Multivariate Data Analysis 6th edition. New Jersey: Pearson Education, 2006.
E. Irwansyah and M. Faisal, Advanced Clustering: Teori dan Aplikasi. Yogyakarta: Deepublish, 2015.
J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967, pp. 281–297.
J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A K-Means Clustering Algorithm,” J R Stat Soc Ser C Appl Stat, vol. 28, no. 1, pp. 100–108, 1979.
M. Wahyudi, Masitha, R. Saragih, and Solikhun, Data Mining: Penerapan Algoritma K-Means Clustering dan K-Medoids Clustering. Medan: Yayasan Kita Menulis, 2020.
G. L. McLachlan, K. E. Basford, and M. Dekker, “Mixture Models: Inference and Applications to Clustering,” J Am Stat Assoc, vol. 84, no. 405, pp. 337–338, 1989.
L. Scrucca, “Identifying connected components in Gaussian finite mixture models for clustering,” Comput Stat Data Anal, vol. 93, pp. 5–17, 2016.
E. Genge, “Analysis of Massive Emigration from Poland: The Model-Based Clustering Approach,” Argumenta Oeconomica Cracoviensia, vol. 16, pp. 37–49, 2017.
N. Shen and B. Gonz´alez, “Bayesian Information Criterion for Linear Mixed-effects Models,” 2021.
BPS Provinsi Jawa Tengah, “Kemiskinan dan Ketimpangan,” BPS Provinsi Jawa Tengah.
Copyright (c) 2023 Zumrotul Wahidah, Dina Tri Utari
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.