Categorical Boosting and Bayesian Optimization in Natural Disaster Tweet Classification
Abstract
Multi-label classification is an important challenge in natural language processing, especially when a single text data point can have more than one label. This study applies a multi-label classification approach to group information in Twitter comments related to natural disasters in Indonesia. The data is categorized into six labels: disaster, location, damage, victims, aid, and others. To address the complexity of text data, the Categorical Boosting (CatBoost) algorithm is used, which is a decision tree-based boosting method that excels at handling categorical features and reducing overfitting. The model is built using the MultiOutputClassifier approach to handle multiple labels simultaneously. Additionally, Bayesian optimization is performed, which is a parameter search method that uses a probabilistic approach to select the best parameter combination based on previous evaluations. Optimization focused on four main parameters: number of iterations, learning rate, tree depth, and L2 regularization. The results showed that the model achieved an accuracy of 75.41% and a Hamming loss of 0.0520, demonstrating the effectiveness of this approach in handling multi-label classification on Twitter data.
Downloads
Copyright (c) 2025 Enzelica Vica Christina, Wahyu Syaifullah J. S, Kartika Maulida Hindrayani

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.