SENTIMENT ANALYSIS OF REVIEWS ON X APPS ON GOOGLE PLAY STORE USING SUPPORT VECTOR MACHINE AND N-GRAM FEATURE SELECTION

  • Fahri Aimar Kusumo Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Sebelas Maret, Indonesia https://orcid.org/0009-0000-9240-6662
  • Dewi Retno Sari Saputro Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Sebelas Maret, Indonesia https://orcid.org/0000-0002-6569-394X
  • Purnami Widyaningsih Department of Mathematics, Faculty of Mathematics and Natural Sciences, Universitas Sebelas Maret, Indonesia https://orcid.org/0000-0002-4737-6502
Keywords: AUC, Feature Selection, N-gram, Sentiment Analysis, SVM

Abstract

Sentiment analysis is an application of text mining that is used to find out opinions from a set of textual data about a particular event or topic. The main function of sentiment analysis is to extract information and find the meaning and opinions of a given user. Sentiment analysis requires classification algorithms, such as Support Vector Machine (SVM). SVM is a frequently used algorithm for text data classification because it can handle high-dimensional data. The concept of SVM is to determine the best hyperplane that serves as a separator of two classes in the input space. Text data with a large number of features causes data imbalance and affects the classification process so it is necessary to do feature selection. Feature selection is a technique used to reduce irrelevant attributes in the dataset. N-gram feature selection is a statistics-based approach to classifying text. N-grams are able to classify unknown text with the highest certainty. The characteristics of N-grams in sentiment analysis are that they function well despite textual errors, run efficiently, require simple storage, and fast processing time. This research aims to perform sentiment analysis on  application reviews on the Google Play Store with SVM and unigram, bigram, and trigram feature selection. The methodology of this research includes conducting theoretical studies, web scraping, text preprocessing, labeling sentiments with VADER, weighting with TF-IDF, dividing data into training data (80%) and testing data (20%), training and evaluating models, classifying testing data, and interpreting results. Based on the research results, 3151 testing data were classified. SVM classification and unigram feature selection have the highest accuracy value of 90% and AUC of 0.93 (excellent). SVM classification and bigram feature selection have an accuracy value of 78% with an AUC value of 0.81 (good). SVM classification and trigram feature selection had the lowest accuracy value of 68% with an AUC value of 0.66 (poor).

Downloads

Download data is not yet available.

References

S. Kemp, “DIGITAL 2023: INDONESIA,” 2023. https://datareportal.com/reports/digital-2023-indonesia.

World of Statistics, “TWITTER USERS,” 2023. https://twitter.com/stats_feed/status/1661437849978417152?s=21.

D. D. A. Yani, H. S. Pratiwi, and H. Muhardi, “IMPLEMENTASI WEB SCRAPING UNTUK PENGAMBILAN DATA PADA SITUS MARKETPLACE,” J. Sist. dan Teknol. Inf., vol. 7, no. 4, p. 257, 2019, doi: 10.26418/justin.v7i4.30930.

J. Han, M. Kamber, and J. Pei, DATA MINING: CONCEPTS AND TECHNIQUES. 2011.

B. Liu, “SENTIMENT ANALYSIS AND MINING OF OPINIONS,” Morgan Claypool Publ., 2012, doi: 10.1007/978-3-319-60435-0_20.

E. Cambria, D. Das, and S. Bandyopadhay, A PRACTICAL GUIDE TO SENTIMENT ANALYSIS. Switzerland: Springer International Publishing, 2017.

F. Aftab et al., “A COMPREHENSIVE SURVEY ON SENTIMENT ANALYSIS TECHNIQUES,” Int. J. Technol., vol. 14, no. 6, pp. 1288–1298, 2023, doi: 10.14716/ijtech.v14i6.6632.

N. Arifin, U. Enri, and N. Sulistiyowati, “PENERAPAN ALGORITMA SUPPORT VECTOR MACHINE (SVM) DENGAN TF-IDF N-GRAM UNTUK TEXT CLASSIFICATION,” STRING (Satuan Tulisan Ris. dan Inov. Teknol., vol. 6, no. 2, p. 129, 2021, doi: 10.30998/string.v6i2.10133.

A. S. Nugroho, A. B. Witarto, and D. Handoko, “APPLICATION OF SUPPORT VECTOR MACHINE IN BIOINFORMATICS,” 2003.

G.-T. Tsvetanka and M. Duraku, “RESEARCH ON N-GRAMS FEATURE SELECTION METHODS FOR TEXT CLASSIFICATION,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1031, no. 1, 2021, doi: 10.1088/1757-899X/1031/1/012048.

A. Nugroho, “ANALISIS SENTIMEN PADA MEDIA SOSIAL TWITTER MENGGUNAKAN NAIVE BAYES CLASSIFIER DENGAN EKSTRASI FITUR N-GRAM,” J-SAKTI (Jurnal Sains Komput. dan Inform., vol. 2, no. 2, p. 200, 2018, doi: 10.30645/j-sakti.v2i2.83.

F. Fitriyani and T. Arifin, “PENERAPAN WORD N-GRAM UNTUK SENTIMENT ANALYSIS REVIEW MENGGUNAKAN METODE SUPPORT VECTOR MACHINE (STUDI KASUS: APLIKASI SAMBARA),” Sistemasi, vol. 9, no. 3, p. 610, 2020, doi: 10.32520/stmsi.v9i3.954.

P. Koncz and J. Paralic, “AN APPROACH TO FEATURE SELECTION FOR SENTIMENT ANALYSIS,” INES 2011 - 15th Int. Conf. Intell. Eng. Syst. Proc., pp. 357–362, 2011, doi: 10.1109/INES.2011.5954773.

K. O. Ogada, “N-GRAMS FOR TEXT CLASSIFICATION USING SUPERVISED MACHINE LEARNING ALGORITHMS,” 2016.

X Corp., “[Online],” [Online]. https://play.google.com/store/apps/details?id=com.twitter.android&pcampaignid=web_share.

E. Haddi, X. Liu, and Y. Shi, “THE ROLE OF TEXT PRE-PROCESSING IN SENTIMENT ANALYSIS,” Procedia Comput. Sci., vol. 17, pp. 26–32, 2013, doi: 10.1016/j.procs.2013.05.005.

A. K. Uysal and S. Gunal, “THE IMPACT OF PREPROCESSING ON TEXT CLASSIFICATION,” Inf. Process. Manag., vol. 50, no. 1, pp. 104–112, 2014, doi: 10.1016/j.ipm.2013.08.006.

C. J. Hutto and E. Gilbert, “VADER: A PARSIMONIOUS RULE-BASED MODEL FOR SENTIMENT ANALYSIS OF SOCIAL MEDIA TEXt,” Eighth Int. AAAI Conf. Weblogs Soc. Media, vol. 8, no. 1, pp. 216–225, 2014, [Online]. Available: https://ojs.aaai.org/index.php/ICWSM/article/view/14550.

S. Elbagir and J. Yang, “LANGUAGE TOOLKIT AND VADER SENTIMENT,” Proc. Int. MultiConference Eng. Comput. Sci., vol. 0958, pp. 12–16, 2019.

A. Kowalczyk, “SUPPORT VECTOR MACHINES SUCCINCTLY,” p. 114, 2017, [Online]. Available: www.syncfusion.com.

B. Santosa, “TUTORIAL SUPPORT VECTOR MACHINES,” 2010, pp. 1–19.

S. H. Yadav and P. M. Manwatkar, “AN APPROACH FOR OFFENSIVE TEXT DETECTION AND PREVENTION IN SOCIAL NETWORKS,” ICIIECS 2015 - 2015 IEEE Int. Conf. Innov. Information, Embed. Commun. Syst., pp. 3–6, 2015, doi: 10.1109/ICIIECS.2015.7193018.

F. Gorunescu, DATA MINING: CONCEPTS, MODELS AND TECHNIQUES. Berlin, 2011.

H. He and Y. Ma, IMBALANCED LEARNING: FOUNDATIONS, ALGORITHMS, AND APPLICATIONS. 2013.

M. Bekkar, H. K. Djemaa, and T. A. Alitouche, “EVALUATION MEASURES FOR MODELS ASSESSMENT OVER IMBALANCED DATA SETS,” J. Inf. Eng. Appl., vol. 3, no. 10, pp. 27–38, 2013, [Online]. Available: http://www.iiste.org/Journals/index.php/JIEA/article/view/7633.

Published
2025-04-01
How to Cite
[1]
F. A. Kusumo, D. R. S. Saputro, and P. Widyaningsih, “SENTIMENT ANALYSIS OF REVIEWS ON X APPS ON GOOGLE PLAY STORE USING SUPPORT VECTOR MACHINE AND N-GRAM FEATURE SELECTION”, BAREKENG: J. Math. & App., vol. 19, no. 2, pp. 1037-1046, Apr. 2025.

Most read articles by the same author(s)