A PRELIMINARY STUDY OF SENTIMENT ANALYSIS ON COVID-19 NEWS: LESSON LEARNED FROM DATA ACQUISITION, PRE-PROCESSING, AND DESCRIPTIVE ANALYTICS

  • Rahmatin Nur Amalia Statistics and Data Science Department, Mathematics and Natural Sciences Faculty, IPB University, Indonesia
  • Kusman Sadik Statistics and Data Science Department, Mathematics and Natural Sciences Faculty, IPB University, Indonesia
  • Khairil Anwar Notodiputro Statistics and Data Science Department, Mathematics and Natural Sciences Faculty, IPB University, Indonesia
Keywords: Sentiment analysis, web scarping, systematic sampling, stratified sampling, topic modeling, Word Representation

Abstract

Sentiment analysis is a method used to analyze opinions and feelings. The goal of sentiment analysis is to determine whether a document contains a positive or negative emotion. Along with the spread of Covid-19 cases, news related to Covid-19 has often become a trending topic in the mass media. Conducting sentiment analysis using all news becomes more challenging because it might take time and cost. Therefore, the sampling method is needed to obtain representative news for the analysis. Web scraping was employed to obtain the news article about Covid-19 in Indonesia. In order to select the representative news, two-step sampling was employed by using stratified and systematic random sampling. According to the topic modelling results using lambda 0.6, news articles are grouped into three topics: updating Covid-19 cases, vaccination, and government policy. In addition, based on the number of positive and negative words, news articles are grouped into news dominated by positive words, news dominated by negative words, and news with the same number of positive and negative words. Methods for representing text in numerical form have been developed. Some of them use tf-idf weighting and word embedding. It does not pay attention to word order or meaning, only based on the frequency of words both locally and globally. Furthermore, this method will form a vector size as large as the number of unique words in the document, so it is less effective when many documents are used. Meanwhile, the vector size generated from the word2vec method is not as much as the number of unique words in the corpus. In addition, word2vec considers the context of the words in the corpus.

Downloads

Download data is not yet available.

References

M. Anandarajan, C. Hill, and T. Nolan, Practical Text Analytics, vol. 2. in Advances in Analytics and Data Science, vol. 2. Cham: Springer International Publishing, 2019. doi: 10.1007/978-3-319-95663-3.

E. Cambria, “Affective Computing and Sentiment Analysis,” IEEE Intell Syst, vol. 31, no. 2, pp. 102–107, Mar. 2016, doi: 10.1109/MIS.2016.31.

F. F. Rachman and S. Pramana, “Analisis Sentimen Pro and Kontra Masyarakat Indonesia tentang Vaksin COVID-19 pada Media Sosial Twitter,” 2020.

M. I. Abidin, K. A. Notodiputro, and B. Sartono, “Improving Classification Model Performances using an Active Learning Method to Detect Hate Speech in Twitter,” Indonesian Journal of Statistics and Its Applications, vol. 5, no. 1, pp. 26–38, Mar. 2021, doi: 10.29244/ijsa.v5i1p26-38.

H. Raza, M. Faizan, A. Hamza, A. Mushtaq, and N. Akhtar, “Scientific Text Sentiment Analysis using Machine Learning Techniques,” 2019. [Online]. Available: www.ijacsa.thesai.org

O. Somantri and D. Apriliani, “Support Vector Machine Berbasis Feature Selection Untuk Sentiment Analysis Kepuasan Pelanggan Terhadap Pelayanan Warung and Restoran Kuliner Kota Tegal,” Jurnal Teknologi Informasi and Ilmu Komputer, vol. 5, no. 5, p. 537, Oct. 2018, doi: 10.25126/jtiik.201855867.

R. S. Chaulagain, S. Pandey, S. R. Basnet, and S. Shakya, “Cloud Based Web Scraping for Big Data Applications,” in 2017 IEEE International Conference on Smart Cloud (SmartCloud), IEEE, Nov. 2017, pp. 138–143. doi: 10.1109/SmartCloud.2017.28.

S. de S. Sirisuriya, “A Comparative Study on Web Scraping,” 2015.

M. Bahrami, M. Singhal, and Z. Zhuang, “A cloud-based web crawler architecture,” in 2015 18th International Conference on Intelligence in Next Generation Networks, IEEE, 2015, pp. 216–223. doi: 10.1109/ICIN.2015.7073834.

R. Lawson, Web scraping with Python. UK: Packt Publishing Ltd, 2015.

R. L. Scheaffer, W. Mendenhall, R. L. Ott, and K. G. Gerow, Elementary survey sampling, Seventh edition. USA: Brooks/Cole, 2012.

S. Tyrer and B. Heyman, “Sampling in epidemiological research: issues, hazards and pitfalls,” BJPsych Bull, vol. 40, no. 2, pp. 57–60, Apr. 2016, doi: 10.1192/pb.bp.114.050203.

G. Yang, M. A. Jan, A. U. Rehman, M. Babar, M. M. Aimal, and S. Verma, “Interoperability and Data Storage in Internet of Multimedia Things: Investigating Current Trends, Research Challenges and Future Directions,” IEEE Access, vol. 8, pp. 124382–124401, 2020, doi: 10.1109/ACCESS.2020.3006036.

V. Dogra et al., “A Complete Process of Text Classification System Using State-of-the-Art NLP Models,” Comput Intell Neurosci, vol. 2022, pp. 1–26, Jun. 2022, doi: 10.1155/2022/1883698.

V. Kalra and R. Aggarwal, “Importance of Text Data Preprocessing & Implementation in RapidMiner,” Jan. 2018, pp. 71–75. doi: 10.15439/2017KM46.

E. Haddi, X. Liu, and Y. Shi, “The Role of Text Pre-processing in Sentiment Analysis,” Procedia Comput Sci, vol. 17, pp. 26–32, 2013, doi: 10.1016/j.procs.2013.05.005.

M. Schonlau, N. Guenther, and I. Sucholutsky, “Text Mining with n-gram Variables,” The Stata Journal: Promoting communications on statistics and Stata, vol. 17, no. 4, pp. 866–881, Dec. 2017, doi: 10.1177/1536867X1801700406.

B. Jang, I. Kim, and J. W. Kim, “Word2vec convolutional neural networks for classification of news articles and tweets,” PLoS One, vol. 14, no. 8, p. e0220976, Aug. 2019, doi: 10.1371/journal.pone.0220976.

H. Juwiantho, I. Setiawan, J. Santoso, and H. Purnomo, “Sentiment Analysis Twitter Bahasa Indonesia Berbasis WORD2VEC Menggunakan Deep Convolutional Neural Network,” Jurnal Teknologi Informasi and Ilmu Komputer (JTIIK), vol. 7, no. 1, pp. 181–188, Feb. 2020.

C. Sievert and K. Shirley, “LDAvis: A method for visualizing and interpreting topics,” in Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, Stroudsburg, PA, USA: Association for Computational Linguistics, 2014, pp. 63–70. doi: 10.3115/v1/W14-3110.

E. Haddi, X. Liu, and Y. Shi, “The role of text pre-processing in sentiment analysis,” in Procedia Computer Science, Elsevier B.V., 2013, pp. 26–32. doi: 10.1016/j.procs.2013.05.005.

Published
2023-12-18
How to Cite
[1]
R. Amalia, K. Sadik, and K. Notodiputro, “A PRELIMINARY STUDY OF SENTIMENT ANALYSIS ON COVID-19 NEWS: LESSON LEARNED FROM DATA ACQUISITION, PRE-PROCESSING, AND DESCRIPTIVE ANALYTICS”, BAREKENG: J. Math. & App., vol. 17, no. 4, pp. 1901-1914, Dec. 2023.