TEXT CLASSIFICATION USING ADAPTIVE BOOSTING ALGORITHM WITH OPTIMIZATION OF PARAMETERS TUNING ON CABLE NEWS NETWORK (CNN) ARTICLES
Abstract
The development of the era encourages advances in communication and information technology. This resulted in the exchange of information being faster because it is connected to the internet. One platform that provides online news articles is Cabel News Network (CNN), which has been broadcasting news on its website since 1995. The number of Cabel News Network news articles continues to increase, so news articles are categorized to make it easier for readers to find articles according to the category they want. Classification is a technique for determining the class of an object based on its characteristics, where the class label is known beforehand. One of the algorithms for classification is adaptive boosting (AdaBoost). The AdaBoost algorithm performs classification by building several weighted decision trees (stumps), then the class determination is based on the number of stumps with the highest weight. The AdaBoost algorithm can be combined with parameter tuning to avoid overfitting or underfitting resulting from a weak set of stumps. Therefore, this study implements the AdaBoost algorithm with parameter tuning on CNN news article classification. The data used in this study is CNN news article data from 2011 to 2022 sourced from the Kaggle page. The data is categorized into six classes, namely business, entertainment, health, news, politics, and sports. This study uses two evaluation metrics, namely the accuracy value and the confusion matrix to measure the performance of the AdaBoost algorithm. The accuracy value obtained is 0,78763, the precision value is 0.91, the recall value is 0.85, and the F1 score value is 0.88.
Downloads
References
S. A. Eldridge, K. Hess, E. C. Tandoc, and O. Westlund, “Navigating the Scholarly Terrain: Introducing the Digital Journalism Studies Compass,” Digit. Journal., vol. 7, no. 3, pp. 386–403, 2019, doi: 10.1080/21670811.2019.1599724.
C. Juditha, “Akurasi Berita dalam Jurnalisme Online (Kasus Dugaan Korupsi Mahkamah Konstitusi di Portal Berita Detiknews),” J. Pekommas, vol. 16, no. 3, pp. 145–154, 2013, [Online]. Available: https://media.neliti.com/media/publications/222363-akurasi-berita-dalam-jurnalisme-online-k.pdf.
O. D. Ugo, O. C. Uzoma, Izuogu, and K. Chukwuemeka, “COMMUNICATION AUDIT OF CABLE NEWS NETWORK (CNN) ONLINE REPORTS ON BOKO HARAM INSURGENCY IN NIGERIA (2012-2016),” vol. 3, no. 5, pp. 19–34, 2017, [Online]. Available: https://eajournals.org/ijirmmcs/vol-3-issue-5-october-2017/communication-audit-cable-news-network-cnn-online-reports-boko-haram-insurgency-nigeria-2012-2016/.
M. A. Jassim and S. N. Abdulwahid, “Data Mining preparation: Process, Techniques and Major Issues in Data Analysis,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1090, no. 1, p. 012053, 2021, doi: 10.1088/1757-899x/1090/1/012053.
S. H. Liao, P. H. Chu, and P. Y. Hsiao, “Data mining techniques and applications - A decade review from 2000 to 2011,” Expert Syst. Appl., vol. 39, no. 12, pp. 11303–11311, 2012, doi: 10.1016/j.eswa.2012.02.063.
A. O. Adebayo and M. S. Chaubey, “Data Mining Classification Techniques on the Analysis of Student’s Performance,” Glob. Sci. J., vol. 7, no. 4x, pp. 79–95, 2019, doi: 10.11216/gsj.2019.04.19671.
C. Tu, H. Liu, and B. Xu, “AdaBoost typical Algorithm and its application research,” MATEC Web Conf., vol. 139, 2017, doi: 10.1051/matecconf/201713900222.
R. Gao and Z. Liu, “An Improved AdaBoost Algorithm for Hyperparameter Optimization,” J. Phys. Conf. Ser., vol. 1631, no. 1, 2020, doi: 10.1088/1742-6596/1631/1/012048.
I. G. A. Purnajiwa Arimbawa and N. A. Sanjaya ER, “Penerapan Metode Adaboost Untuk Multi-Label Classification Pada Dokumen Teks,” JELIKU (Jurnal Elektron. Ilmu Komput. Udayana), vol. 9, no. 1, p. 127, 2020, doi: 10.24843/jlk.2020.v09.i01.p13.
S. R. Joseph, H. Hloman, K. Letsholo, and K. Sedimo, “Natural Language Processing: A Review,” Int. J. Res. Eng. Appl. Sci., vol. 6, no. 3, pp. 1–8, 2016, [Online]. Available: http://www.euroasiapub.org.
V. Gurusamy and S. Kannan, “Preprocessing Techniques for Text Mining,” Int. J. Comput. Sci. Commun. Networks, vol. 5, no. 1, pp. 7–16, 2014, [Online]. Available: https://www.researchgate.net/publication/273127322_Preprocessing_Techniques_for_Text_Mining.
A. I. Kadhim, “An Evaluation of Preprocessing Techniques for Text Classification,” Int. J. Comput. Sci. Inf. Secur., vol. 16, no. 6, pp. 22–32, 2018, [Online]. Available: https://sites.google.com/site/ijcsis/.
D. Mesafint and M. D. Huchaiah, “Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results,” Int. J. Comput. Appl., vol. 44, pp. 1–12, 2021, doi: 10.1080/1206212X.2021.1974663.
Nofriani, “Comparations of Supervised Machine Learning Techniques in Predicting the Classification of the Household’s Welfare Status,” J. Pekommas, vol. 4, no. 1, pp. 43–52, 2019, doi: 10.30818/jpkm.2019.2040105.
M. Hossin and M. N. Sulaiman, “A Review on Evaluation Metrics for Data Classification Evaluations,” Int. J. Data Min. Knowl. Manag. Process, vol. 5, no. 2, pp. 01–11, 2015, doi: 10.5121/ijdkp.2015.5201.
Copyright (c) 2024 Dewi Retno Sari Saputro, Krisna Sidiq, Harun Al Rasyid, Sutanto Sutanto
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.