SENTIMENT ANALYSIS OF REVIEWS ON X APPS ON GOOGLE PLAY STORE USING SUPPORT VECTOR MACHINE AND N-GRAM FEATURE SELECTION
Abstract
Sentiment analysis is an application of text mining that is used to find out opinions from a set of textual data about a particular event or topic. The main function of sentiment analysis is to extract information and find the meaning and opinions of a given user. Sentiment analysis requires classification algorithms, such as Support Vector Machine (SVM). SVM is a frequently used algorithm for text data classification because it can handle high-dimensional data. The concept of SVM is to determine the best hyperplane that serves as a separator of two classes in the input space. Text data with a large number of features causes data imbalance and affects the classification process so it is necessary to do feature selection. Feature selection is a technique used to reduce irrelevant attributes in the dataset. N-gram feature selection is a statistics-based approach to classifying text. N-grams are able to classify unknown text with the highest certainty. The characteristics of N-grams in sentiment analysis are that they function well despite textual errors, run efficiently, require simple storage, and fast processing time. This research aims to perform sentiment analysis on application reviews on the Google Play Store with SVM and unigram, bigram, and trigram feature selection. The methodology of this research includes conducting theoretical studies, web scraping, text preprocessing, labeling sentiments with VADER, weighting with TF-IDF, dividing data into training data (80%) and testing data (20%), training and evaluating models, classifying testing data, and interpreting results. Based on the research results, 3151 testing data were classified. SVM classification and unigram feature selection have the highest accuracy value of 90% and AUC of 0.93 (excellent). SVM classification and bigram feature selection have an accuracy value of 78% with an AUC value of 0.81 (good). SVM classification and trigram feature selection had the lowest accuracy value of 68% with an AUC value of 0.66 (poor).
Downloads
References
S. Kemp, “DIGITAL 2023: INDONESIA,” 2023. https://datareportal.com/reports/digital-2023-indonesia.
World of Statistics, “TWITTER USERS,” 2023. https://twitter.com/stats_feed/status/1661437849978417152?s=21.
D. D. A. Yani, H. S. Pratiwi, and H. Muhardi, “IMPLEMENTASI WEB SCRAPING UNTUK PENGAMBILAN DATA PADA SITUS MARKETPLACE,” J. Sist. dan Teknol. Inf., vol. 7, no. 4, p. 257, 2019, doi: 10.26418/justin.v7i4.30930.
J. Han, M. Kamber, and J. Pei, DATA MINING: CONCEPTS AND TECHNIQUES. 2011.
B. Liu, “SENTIMENT ANALYSIS AND MINING OF OPINIONS,” Morgan Claypool Publ., 2012, doi: 10.1007/978-3-319-60435-0_20.
E. Cambria, D. Das, and S. Bandyopadhay, A PRACTICAL GUIDE TO SENTIMENT ANALYSIS. Switzerland: Springer International Publishing, 2017.
F. Aftab et al., “A COMPREHENSIVE SURVEY ON SENTIMENT ANALYSIS TECHNIQUES,” Int. J. Technol., vol. 14, no. 6, pp. 1288–1298, 2023, doi: 10.14716/ijtech.v14i6.6632.
N. Arifin, U. Enri, and N. Sulistiyowati, “PENERAPAN ALGORITMA SUPPORT VECTOR MACHINE (SVM) DENGAN TF-IDF N-GRAM UNTUK TEXT CLASSIFICATION,” STRING (Satuan Tulisan Ris. dan Inov. Teknol., vol. 6, no. 2, p. 129, 2021, doi: 10.30998/string.v6i2.10133.
A. S. Nugroho, A. B. Witarto, and D. Handoko, “APPLICATION OF SUPPORT VECTOR MACHINE IN BIOINFORMATICS,” 2003.
G.-T. Tsvetanka and M. Duraku, “RESEARCH ON N-GRAMS FEATURE SELECTION METHODS FOR TEXT CLASSIFICATION,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1031, no. 1, 2021, doi: 10.1088/1757-899X/1031/1/012048.
A. Nugroho, “ANALISIS SENTIMEN PADA MEDIA SOSIAL TWITTER MENGGUNAKAN NAIVE BAYES CLASSIFIER DENGAN EKSTRASI FITUR N-GRAM,” J-SAKTI (Jurnal Sains Komput. dan Inform., vol. 2, no. 2, p. 200, 2018, doi: 10.30645/j-sakti.v2i2.83.
F. Fitriyani and T. Arifin, “PENERAPAN WORD N-GRAM UNTUK SENTIMENT ANALYSIS REVIEW MENGGUNAKAN METODE SUPPORT VECTOR MACHINE (STUDI KASUS: APLIKASI SAMBARA),” Sistemasi, vol. 9, no. 3, p. 610, 2020, doi: 10.32520/stmsi.v9i3.954.
P. Koncz and J. Paralic, “AN APPROACH TO FEATURE SELECTION FOR SENTIMENT ANALYSIS,” INES 2011 - 15th Int. Conf. Intell. Eng. Syst. Proc., pp. 357–362, 2011, doi: 10.1109/INES.2011.5954773.
K. O. Ogada, “N-GRAMS FOR TEXT CLASSIFICATION USING SUPERVISED MACHINE LEARNING ALGORITHMS,” 2016.
X Corp., “[Online],” [Online]. https://play.google.com/store/apps/details?id=com.twitter.android&pcampaignid=web_share.
E. Haddi, X. Liu, and Y. Shi, “THE ROLE OF TEXT PRE-PROCESSING IN SENTIMENT ANALYSIS,” Procedia Comput. Sci., vol. 17, pp. 26–32, 2013, doi: 10.1016/j.procs.2013.05.005.
A. K. Uysal and S. Gunal, “THE IMPACT OF PREPROCESSING ON TEXT CLASSIFICATION,” Inf. Process. Manag., vol. 50, no. 1, pp. 104–112, 2014, doi: 10.1016/j.ipm.2013.08.006.
C. J. Hutto and E. Gilbert, “VADER: A PARSIMONIOUS RULE-BASED MODEL FOR SENTIMENT ANALYSIS OF SOCIAL MEDIA TEXt,” Eighth Int. AAAI Conf. Weblogs Soc. Media, vol. 8, no. 1, pp. 216–225, 2014, [Online]. Available: https://ojs.aaai.org/index.php/ICWSM/article/view/14550.
S. Elbagir and J. Yang, “LANGUAGE TOOLKIT AND VADER SENTIMENT,” Proc. Int. MultiConference Eng. Comput. Sci., vol. 0958, pp. 12–16, 2019.
A. Kowalczyk, “SUPPORT VECTOR MACHINES SUCCINCTLY,” p. 114, 2017, [Online]. Available: www.syncfusion.com.
B. Santosa, “TUTORIAL SUPPORT VECTOR MACHINES,” 2010, pp. 1–19.
S. H. Yadav and P. M. Manwatkar, “AN APPROACH FOR OFFENSIVE TEXT DETECTION AND PREVENTION IN SOCIAL NETWORKS,” ICIIECS 2015 - 2015 IEEE Int. Conf. Innov. Information, Embed. Commun. Syst., pp. 3–6, 2015, doi: 10.1109/ICIIECS.2015.7193018.
F. Gorunescu, DATA MINING: CONCEPTS, MODELS AND TECHNIQUES. Berlin, 2011.
H. He and Y. Ma, IMBALANCED LEARNING: FOUNDATIONS, ALGORITHMS, AND APPLICATIONS. 2013.
M. Bekkar, H. K. Djemaa, and T. A. Alitouche, “EVALUATION MEASURES FOR MODELS ASSESSMENT OVER IMBALANCED DATA SETS,” J. Inf. Eng. Appl., vol. 3, no. 10, pp. 27–38, 2013, [Online]. Available: http://www.iiste.org/Journals/index.php/JIEA/article/view/7633.
Copyright (c) 2025 Fahri Aimar Kusumo, Dewi Retno Sari Saputro, Purnami Widyaningsih

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.