TEXT CLUSTERING ONLINE LEARNING OPINION DURING COVID-19 PANDEMIC IN INDONESIA USING TWEETS
Abstract
To prevent the spread of corona virus, restriction of social activities are implemented including school activities which reaps the pros and cons in community. Opinions about online learning are widely conveyed mainly on Twitter. Tweets obtained can be used to extract information using text clustering to group topics about online learning during pandemic in Indonesia. K-Means is often used and has good performance in text clustering area. However, the problem of high dimensionality in textual data can result in difficult computations so that a sampling method is proposed. This paper aims to examine whether a sampling method to cluster tweets can result to an efficient clustering than using the whole dataset. After pre-processing, five sample sizes are selected from 28300 tweets which are 250, 500, 2500, 10000 and 20000 to conduct K-Means clustering. Results showed that from 10 iterations, three main cluster topics appeared 90%-100% in sample size of 2500, 10000 and 20000. Meanwhile sample size of 250 and 500 tend to produced 20%-60% appearance of the three main cluster topics. This means that around 8% to 35% of tweets used can yield representative clusters and efficient computation which is four times faster than using entire dataset.
Downloads
References
D. F. Murad, R. Hassan, Y. Heryadi, B. D. Wijanarko, dan Titan, “The Impact of the COVID-19 Pandemic in Indonesia (Face to face versus Online Learning),” 2020, doi: 10.1109/ICVEE50212.2020.9243202.
A. B. Santosa, “Potret Pendidikan di Tahun Pandemi : Dampak COVID-19 Terhadap Disparitas Pendidikan di Indonesia,” CSIS Comment., hal. 1–5, 2020.
F. Khairani, A. Kurnia, M. N. Aidi, dan S. Pramana, “Predictions of Indonesia Economic Phenomena Based on Online News Using Random Forest,” SinkrOn, vol. 7, no. 2, hal. 532–540, Apr 2022, doi: 10.33395/sinkron.v7i2.11401.
R. Sharda, D. Delen, dan E. Turban, Business Intelligence, Analytics, and Data Science: A Managerial Perspective, Fourth. Vivar, Malaysia: Pearson, 2017.
M. W. Berry dan J. Kogan, Text mining: Applications and Theory. UK: John Wiley & Sons, 2010.
G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, dan R. A. Nisbet, Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. USA: Academic Press, 2012.
K. Nur’aini, I. Najahaty, L. Hidayati, H. Murfi, dan S. Nurrohmah, “Combination of singular value decomposition and K-means clustering methods for topic detection on Twitter,” in 2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Okt 2015, hal. 123–128, doi: 10.1109/ICACSIS.2015.7415168.
R. L. Scheaffer, W. M. III, R. L. Ott, dan K. G. Gerow, Elementary Survey Sampling, Seventh. USA: Cengage Learning, 2012.
N. Garg dan R. Rani, “Analysis and visualization of Twitter data using k-means clustering,” in 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), Jun 2017, hal. 670–675, doi: 10.1109/ICCONS.2017.8250547.
M. Sholehhudin, M. Fauzi Ali, dan S. Adinugroho, “Implementasi Metode Text Mining dan K-Means Clustering untuk Pengelompokan Dokumen Skripsi ( Studi Kasus : Universitas Brawijaya ),” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 11, hal. 5518–5524, 2018.
E. Yulian, “Text Mining dengan K-Means Clustering pada Tema LGBT dalam Arsip Tweet Masyarakat Kota Bandung,” J. Mat. “MANTIK,” vol. 4, no. 1, hal. 53–58, Mei 2018, doi: 10.15642/mantik.2018.4.1.53-58.
J. Rejito, A. Atthariq, dan A. S. Abdullah, “Application of text mining employing k-means algorithms for clustering tweets of Tokopedia,” J. Phys. Conf. Ser., vol. 1722, no. 1, hal. 012019, Jan 2021, doi: 10.1088/1742-6596/1722/1/012019.
A. F. Hidayatullah dan M. R. Ma’arif, “Pre-processing Tasks in Indonesian Twitter Messages,” J. Phys. Conf. Ser., vol. 801, no. 1, hal. 012072, Jan 2017, doi: 10.1088/1742-6596/801/1/012072.
H. Kim, S. M. Jang, S.-H. Kim, dan A. Wan, “Evaluating Sampling Methods for Content Analysis of Twitter Data,” Soc. Media + Soc., vol. 4, no. 2, Apr 2018, doi: 10.1177/2056305118772836.
C. D. Manning dan H. Schütze, Foundations of Statistical Natural Language Processing. London: The MIT Press, 1999.
R. Feldman dan J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. New York: Cambridge University Press, 2007.
J. Žižka, F. Dařena, dan A. Svoboda, Text Mining with Machine Learning. Florida: CRC Press, 2020.
Copyright (c) 2022 Maulida Fajrining Tyas, Anang Kurnia, Agus Mohamad Soleh
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.