Categorical Boosting and Bayesian Optimization in Natural Disaster Tweet Classification

Keywords: Bayesian Optimization, Categorical Boosting, Multi-Label Classification, Text

Abstract

Multi-label classification is an important challenge in natural language processing, especially when a single text data point can have more than one label. This study applies a multi-label classification approach to group information in Twitter comments related to natural disasters in Indonesia. The data is categorized into six labels: disaster, location, damage, victims, aid, and others. To address the complexity of text data, the Categorical Boosting (CatBoost) algorithm is used, which is a decision tree-based boosting method that excels at handling categorical features and reducing overfitting. The model is built using the MultiOutputClassifier approach to handle multiple labels simultaneously. Additionally, Bayesian optimization is performed, which is a parameter search method that uses a probabilistic approach to select the best parameter combination based on previous evaluations. Optimization focused on four main parameters: number of iterations, learning rate, tree depth, and L2 regularization. The results showed that the model achieved an accuracy of 75.41% and a Hamming loss of 0.0520, demonstrating the effectiveness of this approach in handling multi-label classification on Twitter data.

Downloads

Download data is not yet available.
Published
2025-09-30