GOJEK DATA ANALYSIS THROUGH TEXT MINING USING SUPPORT VECTOR MACHINE (SVM) AND K-NEAREST NEIGHBOR (KNN)

  • Siti Hadijah Hasanah Statistics Study Program, Faculty of Science and Technology, Universitas Terbuka, Indonesia
  • Muhamad Riyan Maulana Information Systems Study Program, Faculty of Science and Technology, Universitas Terbuka, Indonesia
  • Dian Nurdiana Information Systems Study Program, Faculty of Science and Technology, Universitas Terbuka, Indonesia https://orcid.org/0000-0002-4141-6438
Keywords: Gojek, KNN, SVM, Text Mining

Abstract

The main focus of this research is to apply and test the effectiveness of SVM and KNN methods in Gojek data text analysis. This research will examine how the two methods can classify user comments and feedback and identify data sentiment analysis at the same time practically help Gojek understand user needs and improve service quality. The data obtained through scrapping is categorized into positive and negative sentiment. Data is taken from Gojek application user reviews throughout the year 2022 with a total of 1148 sentiment data with a percentage of 80% training data and 20% testing data. Evaluation of model performance using Confusion Matrix and AUC-ROC Curve shows that SVM is more effective than KNN, with accuracy on training data of 92.55% for SVM and 81.71% for KNN, as well as accuracy on testing data of 82.40% for SVM and 77,09% for KNN.

Downloads

Download data is not yet available.

References

C. Zhang, I. Khan, V. Dagar, A. Saeed, and M. W. Zafar, “ENVIRONMENTAL IMPACT OF INFORMATION AND COMMUNICATION TECHNOLOGY: UNVEILING THE ROLE OF EDUCATION IN DEVELOPING COUNTRIES,” Technol. Forecast. Soc. Change, vol. 178, no. January, p. 121570, 2022, doi: 10.1016/j.techfore.2022.121570.

M. Jovanovic, D. Sjödin, and V. Parida, “CO-EVOLUTION OF PLATFORM ARCHITECTURE, PLATFORM SERVICES, AND PLATFORM GOVERNANCE: EXPANDING THE PLATFORM VALUE OF INDUSTRIAL DIGITAL PLATFORMS,” Technovation, vol. 118, p. 102218, Dec. 2022, doi: 10.1016/j.technovation.2020.102218.

L. Dessyanawaty and Y.-S. Yen, “AN OPTIMIZING OMNI-CHANNEL STRATEGY FOR RIDE-HAILING COMPANIES: THE CASE OF GOJEK IN INDONESIA,” Adv. Manag. Appl. Econ., vol. 10, no. 1, pp. 51–59, 2020.

K. A. D. Putra, F. Hidayatullah, and N. Farida, “MEDIATISASI LAYANAN PESAN ANTAR MAKANAN DI INDONESIA MELALUI APLIKASI GO-FOOD,” Islam. Commun. J., vol. 5, no. 1, p. 114, Jun. 2020, doi: 10.21580/icj.2020.5.1.5416.

N. E. Kartika, “FITUR APLIKASI GOJEK FAVORIT KONSUMEN PADA SAAT PANDEMI COVID-19 DI KOTA BANDUNG,” J. Communio J. Jur. Ilmu Komun., vol. 9, no. 2, pp. 1680–1695, Nov. 2020, doi: 10.35508/jikom.v9i2.2922.

U. S, “ANALYSIS OF BIG DATA UTILIZATION IN TECHNOLOGY COMPANIES (GOJEK CASE STUDY: PT GOTO GOJEK TOKOPEDIA Tbk),” J. Bus. Manag. Stud., vol. 4, no. 4, pp. 92–96, Sep. 2022, doi: 10.32996/jbms.2022.4.4.8.

S. T. Muhammad Wali et al., PENERAPAN & IMPLEMENTASI BIG DATA DI BERBAGAI SEKTOR (PEMBANGUNAN BERKELANJUTAN ERA INDUSTRI 4.0 DAN SOCIETY 5.0). PT. Sonpedia Publishing Indonesia, 2023.

A. Rehman, S. Naz, and I. Razzak, “LEVERAGING BIG DATA ANALYTICS IN HEALTHCARE ENHANCEMENT: TRENDS, CHALLENGES AND OPPORTUNITIES,” Multimed. Syst., vol. 28, no. 4, pp. 1339–1371, Aug. 2022, doi: 10.1007/s00530-020-00736-8.

J. Basukie, Y. Wang, and S. Li, “BIG DATA GOVERNANCE AND ALGORITHMIC MANAGEMENT IN SHARING ECONOMY PLATFORMS: A CASE OF RIDESHARING IN EMERGING MARKETS,” Technol. Forecast. Soc. Change, vol. 161, p. 120310, Dec. 2020, doi: 10.1016/j.techfore.2020.120310.

Q. A. Nisar, N. Nasir, S. Jamshed, S. Naz, M. Ali, and S. Ali, “BIG DATA MANAGEMENT AND ENVIRONMENTAL PERFORMANCE: ROLE OF BIG DATA DECISION-MAKING CAPABILITIES AND DECISION-MAKING QUALITY,” J. Enterp. Inf. Manag., vol. 34, no. 4, pp. 1061–1096, Jul. 2021, doi: 10.1108/JEIM-04-2020-0137.

S. V. Tsiu, L. Mathabela, and M. Ngobeni, “APPLICATIONS AND COMPETITIVE ADVANTAGES OF DATA MINING AND BUSINESS INTELLIGENCE IN SMES PERFORMANCE: A SYSTEMATIC REVIEW,” Available at SSRN 4958874. 2024. doi: 10.2139/ssrn.4958874.

A. Ligthart, C. Catal, and B. Tekinerdogan, “SYSTEMATIC REVIEWS IN SENTIMENT ANALYSIS: A TERTIARY STUDY,” Artif. Intell. Rev., vol. 54, no. 7, pp. 4997–5053, Oct. 2021, doi: 10.1007/s10462-021-09973-3.

P. K. Jain, R. Pamula, and G. Srivastava, “A SYSTEMATIC LITERATURE REVIEW ON MACHINE LEARNING APPLICATIONS FOR CONSUMER SENTIMENT ANALYSIS USING ONLINE REVIEWS,” Comput. Sci. Rev., vol. 41, p. 100413, Aug. 2021, doi: 10.1016/j.cosrev.2021.100413.

Q. Qiu, Z. Xie, L. Wu, and L. Tao, “AUTOMATIC SPATIOTEMPORAL AND SEMANTIC INFORMATION EXTRACTION FROM UNSTRUCTURED GEOSCIENCE REPORTS USING TEXT MINING TECHNIQUES,” Earth Sci. Informatics, vol. 13, no. 4, pp. 1393–1410, Dec. 2020, doi: 10.1007/s12145-020-00527-9.

S. Raschka, J. Patterson, and C. Nolet, “MACHINE LEARNING IN PYTHON: MAIN DEVELOPMENTS AND TECHNOLOGY TRENDS IN DATA SCIENCE, MACHINE LEARNING, AND ARTIFICIAL INTELLIGENCE,” Information, vol. 11, no. 4, p. 193, Apr. 2020, doi: 10.3390/info11040193.

A. K. Kushwaha, A. K. Kar, and Y. K. Dwivedi, “APPLICATIONS OF BIG DATA IN EMERGING MANAGEMENT DISCIPLINES: A LITERATURE REVIEW USING TEXT MINING,” Int. J. Inf. Manag. Data Insights, vol. 1, no. 2, p. 100017, Nov. 2021, doi: 10.1016/j.jjimei.2021.100017.

H. Hassani, C. Beneki, S. Unger, M. T. Mazinani, and M. R. Yeganegi, “TEXT MINING IN BIG DATA ANALYTICS,” Big Data Cogn. Comput., vol. 4, no. 1, p. 1, 2020.

D. Tao, P. Yang, and H. Feng, “UTILIZATION OF TEXT MINING AS A BIG DATA ANALYSIS TOOL FOR FOOD SCIENCE AND NUTRITION,” Compr. Rev. Food Sci. Food Saf., vol. 19, no. 2, pp. 875–894, Mar. 2020, doi: 10.1111/1541-4337.12540.

S. Kumar, A. K. Kar, and P. V. Ilavarasan, “APPLICATIONS OF TEXT MINING IN SERVICES MANAGEMENT: A SYSTEMATIC LITERATURE REVIEW,” Int. J. Inf. Manag. Data Insights, vol. 1, no. 1, p. 100008, 2021.

L. Hickman, S. Thapa, L. Tay, M. Cao, and P. Srinivasan, “TEXT PREPROCESSING FOR TEXT MINING IN ORGANIZATIONAL RESEARCH: REVIEW AND RECOMMENDATIONS,” Organ. Res. Methods, vol. 25, no. 1, pp. 114–146, 2022.

P. Nandwani and R. Verma, “A REVIEW ON SENTIMENT ANALYSIS AND EMOTION DETECTION FROM TEXT,” Soc. Netw. Anal. Min., vol. 11, no. 1, p. 81, 2021.

K. Thakur and V. Kumar, “APPLICATION OF TEXT MINING TECHNIQUES ON SCHOLARLY RESEARCH ARTICLES: METHODS AND TOOLS,” New Rev. Acad. Librariansh., vol. 28, no. 3, pp. 279–302, 2022.

C. Zucco, B. Calabrese, G. Agapito, P. H. Guzzi, and M. Cannataro, “SENTIMENT ANALYSIS FOR MINING TEXTS AND SOCIAL NETWORKS DATA: METHODS AND TOOLS,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 10, no. 1, p. e1333, 2020.

A. Gupta, V. Dengre, H. A. Kheruwala, and M. Shah, “COMPREHENSIVE REVIEW OF TEXT-MINING APPLICATIONS IN FINANCE,” Financ. Innov., vol. 6, no. 1, p. 39, Dec. 2020, doi: 10.1186/s40854-020-00205-1.

A. K. S. Yadav and M. Sora, “FRAUD DETECTION IN FINANCIAL STATEMENTS USING TEXT MINING METHODS: A REVIEW,” in IOP conference series: Materials science and engineering, IOP Publishing, 2021, p. 12012.

F. R. Lucini, L. M. Tonetto, F. S. Fogliatto, and M. J. Anzanello, “TEXT MINING APPROACH TO EXPLORE DIMENSIONS OF AIRLINE CUSTOMER SATISFACTION USING ONLINE CUSTOMER REVIEWS,” J. Air Transp. Manag., vol. 83, p. 101760, 2020.

L. Serrano, A. Ariza-Montes, M. Nader, A. Sianes, and R. Law, “EXPLORING PREFERENCES AND SUSTAINABLE ATTITUDES OF AIRBNB GREEN USERS IN THE REVIEW COMMENTS AND RATINGS: A TEXT MINING APPROACH,” in Sustainable Consumer Behaviour and the Environment, Routledge, 2021, pp. 114–132.

E. Y. Boateng, J. Otoo, and D. A. Abaye, “BASIC TENETS OF CLASSIFICATION ALGORITHMS K-NEAREST-NEIGHBOR, SUPPORT VECTOR MACHINE, RANDOM FOREST AND NEURAL NETWORK: A REVIEW,” J. Data Anal. Inf. Process., vol. 8, no. 4, pp. 341–357, 2020.

A. Salma and W. Silfianti, “SENTIMENT ANALYSIS OF USER REVIEWS ON COVID-19 INFORMATION APPLICATIONS USING NAIVE BAYES CLASSIFIER, SUPPORT VECTOR MACHINE, AND K-NEAREST NEIGHBOR,” Int. Res. J. Adv. Eng. Sci., vol. 6, no. 4, pp. 158–162, 2021.

M. Bansal, A. Goyal, and A. Choudhary, “A COMPARATIVE ANALYSIS OF K-NEAREST NEIGHBOR, GENETIC, SUPPORT VECTOR MACHINE, DECISION TREE, AND LONG SHORT TERM MEMORY ALGORITHMS IN MACHINE LEARNING,” Decis. Anal. J., vol. 3, p. 100071, 2022.

D. M. Abdullah and A. M. Abdulazeez, “MACHINE LEARNING APPLICATIONS BASED ON SVM CLASSIFICATION A REVIEW,” Qubahan Acad. J., vol. 1, no. 2, pp. 81–90, 2021.

M. Arhami, M. Kom, and S. T. Muhammad Nasir, DATA MINING-ALGORITMA DAN IMPLEMENTASI. Penerbit Andi, 2020.

H. Wang, P. Xu, and J. Zhao, “IMPROVED KNN ALGORITHMS OF SPHERICAL REGIONS BASED ON CLUSTERING AND REGION DIVISION,” Alexandria Eng. J., vol. 61, no. 5, pp. 3571–3585, 2022.

S. U. Hassan, J. Ahamed, and K. Ahmad, “ANALYTICS OF MACHINE LEARNING-BASED ALGORITHMS FOR TEXT CLASSIFICATION,” Sustain. Oper. Comput., vol. 3, no. July 2021, pp. 238–248, 2022, doi: 10.1016/j.susoc.2022.03.001.

S. H. Hasanah, M. Permatasari, and C. Author, “BACKPROPAGATION ARTIFICIAL NEURAL NETWORK CLASSIFICATION METHOD IN STATISTICS STUDENTS OF OPEN UNIVERSITY,” BAREKENG J. Ilmu Mat. dan Terap. , vol. 14, no. 2, pp. 243–252, 2020, [Online]. Available: https://ojs3.unpatti.ac.id/index.php/barekeng/

A. R. Isnain, J. Supriyanto, and M. P. Kharisma, “IMPLEMENTATION OF K-NEAREST NEIGHBOR (K-NN) ALGORITHM FOR PUBLIC SENTIMENT ANALYSIS OF ONLINE LEARNING,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 15, no. 2, p. 121, 2021, doi: 10.22146/ijccs.65176.

S. H. Hasanah, “CLASSIFICATION SUPPORT VECTOR MACHINE IN BREAST CANCER PATIENTS,” BAREKENG J. Ilmu Mat. dan Terap., vol. 16, no. 1, pp. 129–136, Mar. 2022, doi: 10.30598/barekengvol16iss1pp129-136.

N. Kalcheva, M. Karova, and I. Penev, “COMPARISON OF THE ACCURACY OF SVM KEMEL FUNCTIONS IN TEXT CLASSIFICATION,” in 2020 International Conference on Biomedical Innovations and Applications (BIA), IEEE, 2020, pp. 141–145.

X. Luo, “EFFICIENT ENGLISH TEXT CLASSIFICATION USING SELECTED MACHINE LEARNING TECHNIQUES,” Alexandria Eng. J., vol. 60, no. 3, pp. 3401–3409, Jun. 2021, doi: 10.1016/j.aej.2021.02.009.

A. A. Akinyelu, “ADVANCES IN SPAM DETECTION FOR EMAIL SPAM, WEB SPAM, SOCIAL NETWORK SPAM, AND REVIEW SPAM: ML-BASED AND NATURE-INSPIRED-BASED TECHNIQUES,” J. Comput. Secur., vol. 29, no. 5, pp. 473–529, 2021.

H. Najadat, M. A. Alzubaidi, and I. Qarqaz, “DETECTING ARABIC SPAM REVIEWS IN SOCIAL NETWORKS BASED ON CLASSIFICATION ALGORITHMS,” Trans. Asian Low-Resource Lang. Inf. Process., vol. 21, no. 1, pp. 1–13, 2021.

S. Kaddoura, G. Chandrasekaran, D. E. Popescu, and J. H. Duraisamy, “A SYSTEMATIC LITERATURE REVIEW ON SPAM CONTENT DETECTION AND CLASSIFICATION,” PeerJ Comput. Sci., vol. 8, p. e830, 2022.

Z. Chen, L. J. Zhou, X. Da Li, J. N. Zhang, and W. J. Huo, “THE LAO TEXT CLASSIFICATION METHOD BASED ON KNN,” Procedia Comput. Sci., vol. 166, pp. 523–528, 2020, doi: 10.1016/j.procs.2020.02.053.

M. R. Alam, A. Akter, M. A. Shafin, M. M. Hasan, and A. Mahmud, “SOCIAL MEDIA CONTENT CATEGORIZATION USING SUPERVISED BASED MACHINE LEARNING METHODS AND NATURAL LANGUAGE PROCESSING IN BANGLA LANGUAGE,” in 2020 11th International Conference on Electrical and Computer Engineering (ICECE), IEEE, 2020, pp. 270–273.

T. Kanan, A. Mughaid, R. Al-Shalabi, M. Al-Ayyoub, M. Elbes, and O. Sadaqa, “BUSINESS INTELLIGENCE USING DEEP LEARNING TECHNIQUES FOR SOCIAL MEDIA CONTENTS,” Cluster Comput., vol. 26, no. 2, pp. 1285–1296, 2023.

N. Yan, X. Xu, T. Tong, and L. Huang, “EXAMINING CONSUMER COMPLAINTS FROM AN ON-DEMAND SERVICE PLATFORM,” Int. J. Prod. Econ., vol. 237, p. 108153, Jul. 2021, doi: 10.1016/j.ijpe.2021.108153.

R. Pereira and C. Tam, “IMPACT OF ENJOYMENT ON THE USAGE CONTINUANCE INTENTION OF VIDEO-ON-DEMAND SERVICES,” Inf. Manag., vol. 58, no. 7, p. 103501, Nov. 2021, doi: 10.1016/j.im.2021.103501.

E. O. Omuya, G. O. Okeyo, and M. W. Kimwele, “FEATURE SELECTION FOR CLASSIFICATION USING PRINCIPAL COMPONENT ANALYSIS AND INFORMATION GAIN,” Expert Syst. Appl., vol. 174, p. 114765, 2021.

D. A. Pisner and D. M. Schnyer, “SUPPORT VECTOR MACHINE,” in Machine learning, Elsevier, 2020, pp. 101–121.

W. Xie, Y. She, and Q. Guo, “RESEARCH ON MULTIPLE CLASSIFICATION BASED ON IMPROVED SVM ALGORITHM FOR BALANCED BINARY DECISION TREE,” Sci. Program., vol. 2021, no. 1, p. 5560465, 2021.

S. Dong, “MULTI CLASS SVM ALGORITHM WITH ACTIVE LEARNING FOR NETWORK TRAFFIC CLASSIFICATION,” Expert Syst. Appl., vol. 176, p. 114885, Aug. 2021, doi: 10.1016/j.eswa.2021.114885.

K.-X. Han, W. Chien, C.-C. Chiu, and Y.-T. Cheng, “APPLICATION OF SUPPORT VECTOR MACHINE (SVM) IN THE SENTIMENT ANALYSIS OF TWITTER DATASET,” Appl. Sci., vol. 10, no. 3, p. 1125, 2020.

V. D. P. Jasti et al., “RELEVANT‐BASED FEATURE RANKING (RBFR) METHOD FOR TEXT CLASSIFICATION BASED ON MACHINE LEARNING ALGORITHM,” J. Nanomater., vol. 2022, no. 1, p. 9238968, 2022.

F. Firmansyah et al., “COMPARING SENTIMENT ANALYSIS OF INDONESIAN PRESIDENTIAL ELECTION 2019 WITH SUPPORT VECTOR MACHINE AND K-NEAREST NEIGHBOR ALGORITHM,” in 2020 6th International Conference on Computing Engineering and Design (ICCED), IEEE, 2020, pp. 1–6.

S. Al Sulaimani and A. Starkey, “SHORT TEXT CLASSIFICATION USING CONTEXTUAL ANALYSIS,” IEEE Access, vol. 9, pp. 149619–149629, 2021, doi: 10.1109/ACCESS.2021.3125768.

K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A COMPARATIVE ANALYSIS OF LOGISTIC REGRESSION, RANDOM FOREST AND KNN MODELS FOR THE TEXT CLASSIFICATION,” Augment. Hum. Res., vol. 5, no. 1, p. 12, 2020.

H. Saadatfar, S. Khosravi, J. H. Joloudari, A. Mosavi, and S. Shamshirband, “A NEW K-NEAREST NEIGHBORS CLASSIFIER FOR BIG DATA BASED ON EFFICIENT DATA PRUNING,” Mathematics, vol. 8, no. 2, p. 286, 2020.

D. Krstinić, M. Braović, L. Šerić, and D. Božić-Štulić, “MULTI-LABEL CLASSIFIER PERFORMANCE EVALUATION WITH CONFUSION MATRIX,” in Computer Science & Information Technology, AIRCC Publishing Corporation, Jun. 2020, pp. 01–14. doi: 10.5121/csit.2020.100801.

M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: MULTI-LABEL CONFUSION MATRIX,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: 10.1109/ACCESS.2022.3151048.

D. D. Nada, S. Soehardjoepri, and R. M. Atok, “PERBANDINGAN ANALISIS SENTIMEN MENGENAI BPJS PADA MEDIA SOSIAL TWITTER MENGGUNAKAN NAÏVE BAYES CLASSIFIER (NBC) DAN SUPPORT VECTOR MACHINE (SVM),” J. Sains dan Seni ITS, vol. 11, no. 6, pp. D480–D485, May 2023, doi: 10.12962/j23373520.v11i6.96330.

F. S. Nahm, “RECEIVER OPERATING CHARACTERISTIC CURVE: OVERVIEW AND PRACTICAL USE FOR CLINICIANS,” Korean J. Anesthesiol., vol. 75, no. 1, pp. 25–36, Feb. 2022, doi: 10.4097/kja.21209.

D. Chicco and G. Jurman, “THE MATTHEWS CORRELATION COEFFICIENT (MCC) SHOULD REPLACE THE ROC AUC AS THE STANDARD METRIC FOR ASSESSING BINARY CLASSIFICATION,” BioData Min., vol. 16, no. 1, p. 4, Feb. 2023, doi: 10.1186/s13040-023-00322-4.

D. Antons, E. Grünwald, P. Cichy, and T. O. Salge, “THE APPLICATION OF TEXT MINING METHODS IN INNOVATION RESEARCH: CURRENT STATE, EVOLUTION PATTERNS, AND DEVELOPMENT PRIORITIES,” R&D Manag., vol. 50, no. 3, pp. 329–351, Jun. 2020, doi: 10.1111/radm.12408.

P. Föll and F. Thiesse, “EXPLORING INFORMATION SYSTEMS CURRICULA,” Bus. Inf. Syst. Eng., vol. 63, no. 6, pp. 711–732, Dec. 2021, doi: 10.1007/s12599-021-00702-2.

S. M. Mohammad, “SENTIMENT ANALYSIS,” in Emotion Measurement, Elsevier, 2021, pp. 323–379. doi: 10.1016/B978-0-12-821124-3.00011-9.

M. Birjali, M. Kasri, and A. Beni-Hssane, “A COMPREHENSIVE SURVEY ON SENTIMENT ANALYSIS: APPROACHES, CHALLENGES AND TRENDS,” Knowledge-Based Syst., vol. 226, p. 107134, Aug. 2021, doi: 10.1016/j.knosys.2021.107134.

M. Wankhade, A. C. S. Rao, and C. Kulkarni, “A SURVEY ON SENTIMENT ANALYSIS METHODS, APPLICATIONS, AND CHALLENGES,” Artif. Intell. Rev., vol. 55, no. 7, pp. 5731–5780, Oct. 2022, doi: 10.1007/s10462-022-10144-1.

M. Alzate, M. Arce-Urriza, and J. Cebollada, “MINING THE TEXT OF ONLINE CONSUMER REVIEWS TO ANALYZE BRAND IMAGE AND BRAND POSITIONING,” J. Retail. Consum. Serv., vol. 67, p. 102989, Jul. 2022, doi: 10.1016/j.jretconser.2022.102989.

Published
2025-04-01
How to Cite
[1]
S. H. Hasanah, M. R. Maulana, and D. Nurdiana, “GOJEK DATA ANALYSIS THROUGH TEXT MINING USING SUPPORT VECTOR MACHINE (SVM) AND K-NEAREST NEIGHBOR (KNN)”, BAREKENG: J. Math. & App., vol. 19, no. 2, pp. 889-902, Apr. 2025.