A PRELIMINARY STUDY OF SENTIMENT ANALYSIS ON COVID-19 NEWS: LESSON LEARNED FROM DATA ACQUISITION, PRE-PROCESSING, AND DESCRIPTIVE ANALYTICS
Abstract
Sentiment analysis is a method used to analyze opinions and feelings. The goal of sentiment analysis is to determine whether a document contains a positive or negative emotion. Along with the spread of Covid-19 cases, news related to Covid-19 has often become a trending topic in the mass media. Conducting sentiment analysis using all news becomes more challenging because it might take time and cost. Therefore, the sampling method is needed to obtain representative news for the analysis. Web scraping was employed to obtain the news article about Covid-19 in Indonesia. In order to select the representative news, two-step sampling was employed by using stratified and systematic random sampling. According to the topic modelling results using lambda 0.6, news articles are grouped into three topics: updating Covid-19 cases, vaccination, and government policy. In addition, based on the number of positive and negative words, news articles are grouped into news dominated by positive words, news dominated by negative words, and news with the same number of positive and negative words. Methods for representing text in numerical form have been developed. Some of them use tf-idf weighting and word embedding. It does not pay attention to word order or meaning, only based on the frequency of words both locally and globally. Furthermore, this method will form a vector size as large as the number of unique words in the document, so it is less effective when many documents are used. Meanwhile, the vector size generated from the word2vec method is not as much as the number of unique words in the corpus. In addition, word2vec considers the context of the words in the corpus.
Downloads
References
M. Anandarajan, C. Hill, and T. Nolan, Practical Text Analytics, vol. 2. in Advances in Analytics and Data Science, vol. 2. Cham: Springer International Publishing, 2019. doi: 10.1007/978-3-319-95663-3.
E. Cambria, “Affective Computing and Sentiment Analysis,” IEEE Intell Syst, vol. 31, no. 2, pp. 102–107, Mar. 2016, doi: 10.1109/MIS.2016.31.
F. F. Rachman and S. Pramana, “Analisis Sentimen Pro and Kontra Masyarakat Indonesia tentang Vaksin COVID-19 pada Media Sosial Twitter,” 2020.
M. I. Abidin, K. A. Notodiputro, and B. Sartono, “Improving Classification Model Performances using an Active Learning Method to Detect Hate Speech in Twitter,” Indonesian Journal of Statistics and Its Applications, vol. 5, no. 1, pp. 26–38, Mar. 2021, doi: 10.29244/ijsa.v5i1p26-38.
H. Raza, M. Faizan, A. Hamza, A. Mushtaq, and N. Akhtar, “Scientific Text Sentiment Analysis using Machine Learning Techniques,” 2019. [Online]. Available: www.ijacsa.thesai.org
O. Somantri and D. Apriliani, “Support Vector Machine Berbasis Feature Selection Untuk Sentiment Analysis Kepuasan Pelanggan Terhadap Pelayanan Warung and Restoran Kuliner Kota Tegal,” Jurnal Teknologi Informasi and Ilmu Komputer, vol. 5, no. 5, p. 537, Oct. 2018, doi: 10.25126/jtiik.201855867.
R. S. Chaulagain, S. Pandey, S. R. Basnet, and S. Shakya, “Cloud Based Web Scraping for Big Data Applications,” in 2017 IEEE International Conference on Smart Cloud (SmartCloud), IEEE, Nov. 2017, pp. 138–143. doi: 10.1109/SmartCloud.2017.28.
S. de S. Sirisuriya, “A Comparative Study on Web Scraping,” 2015.
M. Bahrami, M. Singhal, and Z. Zhuang, “A cloud-based web crawler architecture,” in 2015 18th International Conference on Intelligence in Next Generation Networks, IEEE, 2015, pp. 216–223. doi: 10.1109/ICIN.2015.7073834.
R. Lawson, Web scraping with Python. UK: Packt Publishing Ltd, 2015.
R. L. Scheaffer, W. Mendenhall, R. L. Ott, and K. G. Gerow, Elementary survey sampling, Seventh edition. USA: Brooks/Cole, 2012.
S. Tyrer and B. Heyman, “Sampling in epidemiological research: issues, hazards and pitfalls,” BJPsych Bull, vol. 40, no. 2, pp. 57–60, Apr. 2016, doi: 10.1192/pb.bp.114.050203.
G. Yang, M. A. Jan, A. U. Rehman, M. Babar, M. M. Aimal, and S. Verma, “Interoperability and Data Storage in Internet of Multimedia Things: Investigating Current Trends, Research Challenges and Future Directions,” IEEE Access, vol. 8, pp. 124382–124401, 2020, doi: 10.1109/ACCESS.2020.3006036.
V. Dogra et al., “A Complete Process of Text Classification System Using State-of-the-Art NLP Models,” Comput Intell Neurosci, vol. 2022, pp. 1–26, Jun. 2022, doi: 10.1155/2022/1883698.
V. Kalra and R. Aggarwal, “Importance of Text Data Preprocessing & Implementation in RapidMiner,” Jan. 2018, pp. 71–75. doi: 10.15439/2017KM46.
E. Haddi, X. Liu, and Y. Shi, “The Role of Text Pre-processing in Sentiment Analysis,” Procedia Comput Sci, vol. 17, pp. 26–32, 2013, doi: 10.1016/j.procs.2013.05.005.
M. Schonlau, N. Guenther, and I. Sucholutsky, “Text Mining with n-gram Variables,” The Stata Journal: Promoting communications on statistics and Stata, vol. 17, no. 4, pp. 866–881, Dec. 2017, doi: 10.1177/1536867X1801700406.
B. Jang, I. Kim, and J. W. Kim, “Word2vec convolutional neural networks for classification of news articles and tweets,” PLoS One, vol. 14, no. 8, p. e0220976, Aug. 2019, doi: 10.1371/journal.pone.0220976.
H. Juwiantho, I. Setiawan, J. Santoso, and H. Purnomo, “Sentiment Analysis Twitter Bahasa Indonesia Berbasis WORD2VEC Menggunakan Deep Convolutional Neural Network,” Jurnal Teknologi Informasi and Ilmu Komputer (JTIIK), vol. 7, no. 1, pp. 181–188, Feb. 2020.
C. Sievert and K. Shirley, “LDAvis: A method for visualizing and interpreting topics,” in Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, Stroudsburg, PA, USA: Association for Computational Linguistics, 2014, pp. 63–70. doi: 10.3115/v1/W14-3110.
E. Haddi, X. Liu, and Y. Shi, “The role of text pre-processing in sentiment analysis,” in Procedia Computer Science, Elsevier B.V., 2013, pp. 26–32. doi: 10.1016/j.procs.2013.05.005.
Copyright (c) 2023 Rahmatin Nur Amalia, Kusman Sadik, Khairil Anwar Notodiputro
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.