• Maulida Fajrining Tyas Department of Statistics, Faculty of Mathematics and Natural Sciences, IPB University
  • Anang Kurnia Department of Statistics, Faculty of Mathematics and Natural Sciences, IPB University
  • Agus Mohamad Soleh Department of Statistics, Faculty of Mathematics and Natural Sciences, IPB University
Keywords: Text Clustering, Sampling, K-Means, Twitter, Online Learning


To prevent the spread of corona virus, restriction of social activities are implemented including school activities which reaps the pros and cons in community. Opinions about online learning are widely conveyed mainly on Twitter. Tweets obtained can be used to extract information using text clustering to group topics about online learning during pandemic in Indonesia. K-Means is often used and has good performance in text clustering area. However, the problem of high dimensionality in textual data can result in difficult computations so that a sampling method is proposed. This paper aims to examine whether a sampling method to cluster tweets can result to an efficient clustering than using the whole dataset. After pre-processing, five sample sizes are selected from 28300 tweets which are 250, 500, 2500, 10000 and 20000 to conduct K-Means clustering. Results showed that from 10 iterations, three main cluster topics appeared 90%-100% in sample size of 2500, 10000 and 20000. Meanwhile sample size of 250 and 500 tend to produced 20%-60% appearance of the three main cluster topics. This means that around 8% to 35% of tweets used can yield representative clusters and efficient computation which is four times faster than using entire dataset.


Download data is not yet available.


D. F. Murad, R. Hassan, Y. Heryadi, B. D. Wijanarko, dan Titan, “The Impact of the COVID-19 Pandemic in Indonesia (Face to face versus Online Learning),” 2020, doi: 10.1109/ICVEE50212.2020.9243202.

A. B. Santosa, “Potret Pendidikan di Tahun Pandemi : Dampak COVID-19 Terhadap Disparitas Pendidikan di Indonesia,” CSIS Comment., hal. 1–5, 2020.

F. Khairani, A. Kurnia, M. N. Aidi, dan S. Pramana, “Predictions of Indonesia Economic Phenomena Based on Online News Using Random Forest,” SinkrOn, vol. 7, no. 2, hal. 532–540, Apr 2022, doi: 10.33395/sinkron.v7i2.11401.

R. Sharda, D. Delen, dan E. Turban, Business Intelligence, Analytics, and Data Science: A Managerial Perspective, Fourth. Vivar, Malaysia: Pearson, 2017.

M. W. Berry dan J. Kogan, Text mining: Applications and Theory. UK: John Wiley & Sons, 2010.

G. Miner, D. Delen, J. Elder, A. Fast, T. Hill, dan R. A. Nisbet, Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. USA: Academic Press, 2012.

K. Nur’aini, I. Najahaty, L. Hidayati, H. Murfi, dan S. Nurrohmah, “Combination of singular value decomposition and K-means clustering methods for topic detection on Twitter,” in 2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Okt 2015, hal. 123–128, doi: 10.1109/ICACSIS.2015.7415168.

R. L. Scheaffer, W. M. III, R. L. Ott, dan K. G. Gerow, Elementary Survey Sampling, Seventh. USA: Cengage Learning, 2012.

N. Garg dan R. Rani, “Analysis and visualization of Twitter data using k-means clustering,” in 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), Jun 2017, hal. 670–675, doi: 10.1109/ICCONS.2017.8250547.

M. Sholehhudin, M. Fauzi Ali, dan S. Adinugroho, “Implementasi Metode Text Mining dan K-Means Clustering untuk Pengelompokan Dokumen Skripsi ( Studi Kasus : Universitas Brawijaya ),” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 11, hal. 5518–5524, 2018.

E. Yulian, “Text Mining dengan K-Means Clustering pada Tema LGBT dalam Arsip Tweet Masyarakat Kota Bandung,” J. Mat. “MANTIK,” vol. 4, no. 1, hal. 53–58, Mei 2018, doi: 10.15642/mantik.2018.4.1.53-58.

J. Rejito, A. Atthariq, dan A. S. Abdullah, “Application of text mining employing k-means algorithms for clustering tweets of Tokopedia,” J. Phys. Conf. Ser., vol. 1722, no. 1, hal. 012019, Jan 2021, doi: 10.1088/1742-6596/1722/1/012019.

A. F. Hidayatullah dan M. R. Ma’arif, “Pre-processing Tasks in Indonesian Twitter Messages,” J. Phys. Conf. Ser., vol. 801, no. 1, hal. 012072, Jan 2017, doi: 10.1088/1742-6596/801/1/012072.

H. Kim, S. M. Jang, S.-H. Kim, dan A. Wan, “Evaluating Sampling Methods for Content Analysis of Twitter Data,” Soc. Media + Soc., vol. 4, no. 2, Apr 2018, doi: 10.1177/2056305118772836.

C. D. Manning dan H. Schütze, Foundations of Statistical Natural Language Processing. London: The MIT Press, 1999.

R. Feldman dan J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. New York: Cambridge University Press, 2007.

J. Žižka, F. Dařena, dan A. Svoboda, Text Mining with Machine Learning. Florida: CRC Press, 2020.

How to Cite
M. Tyas, A. Kurnia, and A. Soleh, “TEXT CLUSTERING ONLINE LEARNING OPINION DURING COVID-19 PANDEMIC IN INDONESIA USING TWEETS”, BAREKENG: J. Math. & App., vol. 16, no. 3, pp. 939-948, Sep. 2022.