A GENETIC ALGORITHM–PARTICLE SWARM OPTIMIZATION OPTIMIZED DOFCM APPROACH TO ENHANCE CLUSTERING AND OUTLIER DETECTION

  • Sintia Afriyani Master Program in Statistics, Department of Statistics, Faculty of Mathematics and Natural Sciences, Universitas Islam Indonesia, Indonesia https://orcid.org/0009-0004-4884-8024
  • Rohmatul Fajriyah Master Program in Statistics, Department of Statistics, Faculty of Mathematics and Natural Sciences, Universitas Islam Indonesia, Indonesia https://orcid.org/0000-0002-7132-3937
Keywords: Clustering, Outlier, DOFCM, Genetic Algorithm, Particle Swarm Optimization

Abstract

In the era of Industry 4.0, Big Data from the IoT demands advanced analysis techniques. Outlier detection is vital as anomalies may indicate sensor failures, fraud, or abnormal medical records. Fuzzy clustering methods such as DOFCM are often applied, yet their performance depends on accurate cluster center placement, which remains challenging. While several Fuzzy C-Means extensions address outlier sensitivity, most rely on single optimization strategies. The integration of PSO and GA into DOFCM has been rarely explored, making this study novel in evaluating how different evolutionary algorithms enhance clustering robustness and anomaly detection. This research introduces DOFCM-PSO and DOFCM-GA, tested on five benchmark datasets with outliers: Iris, Wine, Sonar, Diabetes, and Ionosphere. The Silhouette Coefficient (SC) was used as the evaluation metric. Results show that GA consistently outperforms PSO, with SC values improving by approximately 0.02–0.03 (equivalent to an increase of 8–12%) across datasets. For instance, the Iris dataset improved from 0.6029 (PSO) to 0.6291 (GA), while the Wine dataset increased from 0.2759 to 0.2958. In addition, evaluation of computational time and outlier detection further supports these findings. Although GA required slightly longer runtime than PSO, it substantially reduced the number of outliers while still achieving higher SC values. A similar pattern was observed in the Diabetes dataset, where GA decreased outliers from 20 to 7 with a modest SC improvement. These results indicate that PSO is more efficient in runtime, but GA provides more robust clustering by minimizing anomalies and producing better separation quality. Despite promising results, this study is limited by the relatively small dataset sizes and sensitivity to parameter settings, which may influence outcomes. Future work should apply the method to larger datasets and include additional clustering indices. Overall, DOFCM-GA can be considered a robust approach for fuzzy clustering in the presence of anomalies.

Downloads

Download data is not yet available.

References

S. Munirathinam, INDUSTRY 4 . 0 : INDUSTRIAL INTERNET OF THINGS ( IIOT ), 1st ed. Elsevier Inc., 2020.

G. Aceto, V. Persico, and A. Pescapé, “INTEGRATION INDUSTRY 4 . 0 AND HEALTH : INTERNET OF THINGS, BIG DATA, AND CLOUD COMPUTING,” J. Ind. Inf. Integr., vol. 18, no. February, p. 100129, 2020. doi: https://doi.org/10.1016/j.jii.2020.100129.

M. Lavasani, R. Sotudeh-gharebagh, and R. Zarghami, “BIG DATA ANALYTICS OPPORTUNITIES FOR APPLICATIONS IN PROCESS ENGINEERING,” Rev. Chem. Eng., vol. 39, no. 3, pp. 479–511, 2023. doi: https://doi.org/10.1515/revce-2020-0054.

A. Boukerche, L. Zheng, and O. Alfandi, “OUTLIER DETECTION: METHODS, MODELS, AND CLASSIFICATION,” ACM Comput. Surv., vol. 53, no. 3, 2020. doi: https://doi.org/10.1145/3381028.

K. Yu, W. Shi, and N. Santoro, “DESIGNING A STREAMING ALGORITHM FOR OUTLIER DETECTION IN DATA MINING — AN INCREMENTAL APPROACH,” Sensors, vol. 20, no. 5, pp. 1–24, 2020. doi: https://doi.org/10.3390/s20051261.

A. Abid, S. El, and A. Kachouri, “WIRELESS SENSOR NETWORKS,” Computing, vol. 103, no. 10, pp. 2275–2292, 2021. doi: https://doi.org/10.1007/s00607-021-00939-5.

M. Boersma, K. Manoorkar, A. Palmigiano, and V. Universiteit, “OUTLIER DETECTION USING FLEXIBLE CATEGORISATION AND INTERROGATIVE AGENDAS,” Decis. Support Syst., vol. 180, no. 5, pp. 1–28, 2024. doi: https://doi.org/10.1016/j.dss.2024.114196.

W. Wang, X. Hu, and Y. Du, “ALGORITHM OPTIMIZATION AND ANOMALY DETECTION SIMULATION BASED ON EXTENDED JARVIS-PATRICK CLUSTERING AND OUTLIER DETECTION,” Alexandria Eng. J., 2021. doi: https://doi.org/10.1016/j.aej.2021.08.009.

P. Filzmoser and M. Gregorich, “MULTIVARIATE OUTLIER DETECTION IN APPLIED DATA ANALYSIS : GLOBAL, LOCAL, COMPOSITIONAL, AND CELLWISE OUTLIERS,” Math. Geosci., vol. 52, no. 8, pp. 1049–1066, 2020. doi: https://doi.org/10.1007/s11004-020-09861-6.

A. Conde, U. Mori, and J. A. Lozano, “A REVIEW ON OUTLIER/ANOMALY DETECTION IN TIME SERIES DATA,” 2020.

M. Al Samara, I. Bennis, A. Abouaissa, and P. Lorenz, “A SURVEY OF OUTLIER DETECTION TECHNIQUES IN IOT : REVIEW AND CLASSIFICATION,” J. Sens. Actuator Netw., vol. 11, no. 1, 2022. doi: https://doi.org/10.3390/jsan11010004.

F. Ridzuan, W. Mohd, and N. Wan, “DIAGNOSTIC ANALYSIS FOR OUTLIER DETECTION IN BIG DATA ANALYTICS,” Procedia Comput. Sci., vol. 197, pp. 685–692, 2022. doi: https://doi.org/10.1016/j.procs.2021.12.189.

M. C. Massi, F. Ieva, and E. Lettieri, “DATA MINING APPLICATION TO HEALTHCARE FRAUD DETECTION : A TWO-STEP UNSUPERVISED CLUSTERING METHOD FOR OUTLIER DETECTION WITH ADMINISTRATIVE DATABASES,” vol. 9, pp. 1–11, 2020.

N. H. M. M. Shrifan, M. F. Akbar, N. Ashidi, and M. Isa, “AN ADAPTIVE OUTLIER REMOVAL AIDED K‐MEANS CLUSTERING ALGORITHM,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 8, pp. 6365–6376, 2022. doi: https://doi.org/10.1016/j.jksuci.2021.07.003.

H. E. G. Lopes and M. de S. Gosling, “CLUSTER ANALYSIS IN PRACTICE : DEALING WITH OUTLIERS IN,” Rev. Adm. Contemp. J. Contemp. Adm., vol. 25, no. 1, pp. 1–19, 2021. doi: https://doi.org/10.1590/1982-7849rac2021200081.

M. Corain, P. Torino, and P. Torino, “DBSCOUT : A DENSITY-BASED METHOD FOR SCALABLE OUTLIER DETECTION IN VERY LARGE DATASETS,” Publ. IEEE, 2021. doi: https://doi.org/0.1109/ICDE51399.2021.00011.

O. Alghushairy, R. Alsini, T. Soule, and X. Ma, “A REVIEW OF LOCAL OUTLIER FACTOR ALGORITHMS FOR OUTLIER DETECTION IN BIG DATA STREAMS,” big data Cogn. Comput., vol. 5, no. 1, pp. 1–24, 2021. doi: https://doi.org/10.3390/bdcc5010001.

A. Nowak-brzezi, “QUALITATIVE DATA CLUSTERING TO DETECT OUTLIERS,” entropy, vol. 23, no. 7, p. 869, 2021. doi: https://doi.org/10.3390/e23070869.

H. Yadav, J. Singh, and A. Gosain, “EXPERIMENTAL ANALYSIS OF FUZZY CLUSTERING TECHNIQUES FOR OUTLIER DETECTION,” Procedia Comput. Sci., vol. 218, no. 3, pp. 959–968, 2023. doi: https://doi.org/10.1016/j.procs.2023.01.076.

W. Hyun, S. Sanghoun, and C. W. Ahn, “METAHEURISTIC-BASED TIME SERIES CLUSTERING FOR ANOMALY DETECTION IN MANUFACTURING INDUSTRY,” Appl. Intell., vol. 53, no. 6, pp. 21723–21742, 2023. doi: https://doi.org/10.1007/s10489-023-04594-5.

M. H. Sulaiman, Z. Mustaffa, M. Ahmed, Q. Li, L. Guo, and X. Wang, “GREY WOLF OPTIMIZER AND OTHER METAHEURISTIC OPTIMIZATION TECHNIQUES WITH IMAGE PROCESSING AS THEIR APPLICATIONS : A REVIEW,” Mater. Sci. Eng., vol. 1136, no. 1, pp. 0–17, 2021. doi: https://doi.org/10.1088/1757-899X/1136/1/012053.

R. P. Parouha and P. Verma, “A SYSTEMATIC OVERVIEW OF DEVELOPMENTS IN DIFFERENTIAL EVOLUTION AND PARTICLE SWARM OPTIMIZATION WITH THEIR ADVANCED SUGGESTION,” Appl. Intell., vol. 52, no. 1, pp. 10448–10492, 2022. doi: https://doi.org/10.1007/s10489-021-02803-7.

U. M. L. Repository, “IRIS DATASET,” Kaggle, 2016. https://www.kaggle.com/datasets/uciml/iris (accessed Sep. 22, 2024).

U. I. M. L. Repository, “RED WINE QUALITY,” Kaggle, 2009. https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009 (accessed Sep. 22, 2024).

M. H. Shawon, “SONAR DATA,” Kaggle, 2024. https://www.kaggle.com/datasets/mahmudulhaqueshawon/sonar-data (accessed Sep. 22, 2024).

Mathchi, “DIABETES DATA SET,” Kaggle, 2020. https://www.kaggle.com/datasets/mathchi/diabetes-data-set (accessed Sep. 22, 2024).

V. G. Sigillito, “IONOSPHERE,” Kaggle, 1989. https://www.kaggle.com/datasets/jamieleech/ionosphere (accessed Sep. 22, 2024).

A. Karami and C. Igbokwe, “THE IMPACT OF BIG DATA CHARACTERISTICS ON CREDIT RISK ASSESSMENT,” Int. J. Data Sci. Anal., vol. 2, no. March, pp. 1–13, 2025. doi: https://doi.org/10.1007/s41060-025-00753-8.

G. P. Selvarajan, “HARNESSING AI-DRIVEN DATA MINING FOR PREDICTIVE INSIGHTS : A FRAMEWORK FOR ENHANCING DECISION- MAKING IN DYNAMIC DATA ENVIRONMENTS,” Int. J. Creat. Res. Thoughts, vol. 9, no. 2, pp. 5476–5486, 2021.

M. Shantal, Z. Othman, and A. A. Bakar, “A NOVEL APPROACH FOR DATA FEATURE WEIGHTING USING CORRELATION COEFFICIENTS AND MIN–MAX NORMALIZATION,” Symmetry (Basel)., vol. 15, no. 12, 2023. doi: https://doi.org/10.3390/sym15122185.

S. S. Nagari, L. Inayati, U. Airlangga, E. Java, M. V. Midwife, and E. Java, “IMPLEMENTATION OF CLUSTERING USING K-MEANS METHOD TO DETERMINE NUTRITIONAL STATUS,” J. Biometrika dan Kependud., vol. 9, no. 1, pp. 62–68, 2020. doi: https://doi.org/10.20473/jbk.v9i1.2020.62.

A. Shahraki, A. Taherkordi, Ø. Haugen, and F. Eliassen, “CLUSTERING OBJECTIVES IN WIRELESS SENSOR NETWORKS : A SURVEY AND RESEARCH DIRECTION ANALYSIS,” Comput. Networks, p. 107376, 2020. doi: https://doi.org/10.1016/j.comnet.2020.107376.

C. Arimie, E. Biu, and M. Ijomah, “OUTLIER DETECTION AND EFFECTS ON MODELING,” Open Access Libr. J., vol. 7, no. 9, pp. 1–30, 2020. doi: https://doi.org/10.4236/oalib.1106619.

S. S. Kumar, S. T. Ahmed, Q. Xin, S. Sandeep, and M. Madheswaran, “UNSTRUCTURED ONCOLOGICAL IMAGE CLUSTER IDENTIFICATION USING IMPROVED UNSUPERVISED CLUSTERING TECHNIQUES,” Comput. Mater. Contin., vol. 72, no. 1, pp. 281–299, 2022. doi: https://doi.org/10.32604/cmc.2022.023693.

K. Bhalla and A. Gosain, “A NOVEL HYBRIDIZED FUZZY CLUSTERING TECHNIQUE FOR SEGMENTATION OF NOISY MAMMOGRAM IMAGES,” J. Inf. Optim. Sci., vol. 46, no. 2, pp. 383–401, 2025. doi: https://doi.org/10.47974/JIOS-1922.

T. M. Shami, A. A. El-saleh, and S. Member, “PARTICLE SWARM OPTIMIZATION : A COMPREHENSIVE SURVEY,” IEEE Access, vol. 99, no. 1, pp. 10031–10061, 2022. doi: https://dx.doi.org/10.1109/ACCESS.2022.3142859.

A. G. Gad, “PARTICLE SWARM OPTIMIZATION ALGORITHM AND ITS APPLICATIONS: A SYSTEMATIC REVIEW,” Arch. Comput. Methods Eng., vol. 29, no. 5, pp. 2531–2561, 2022. doi: https://doi.org/10.1007/s11831-021-09694-4.

H. A. Alsattar and A. A. Z. B. B. Zaidan, “NOVEL META-HEURISTIC BALD EEAGLE SEARCH OPTIMISATION ALGORITHM,” Artif. Intell. Rev., vol. 53, no. 7, pp. 2237–2264, 2019. doi: https://doi.org/10.1007/s10462-019-09732-5.

T. Alam, S. Qamar, A. Dixit, and M. Benaida, “GENETIC ALGORITHM : REVIEWS , IMPLEMENTATIONS , AND APPLICATIONS,” Int. J. Eng. Pedagog., vol. 10, no. 6, pp. 57–77, 2020. doi: https://doi.org/10.3991/ijep.v10i6.14567.

S. Damodaran, GENETIC ALGORITHMS ( GAS ); MIMICKING THE NATURAL SELECTION PROCESS TO ARRIVE AT OPTIMUM DESIGN SOLUTION ( S ), no. February. 2022.

H. B. Tambunan, “ELECTRICAL PEAK LOAD CLUSTERING ANALYSIS USING K-MEANS ALGORITHM AND SILHOUETTE COEFFICIENT,” in International Conference on Technology and Policy in Energy and Electric Power (ICT-PEP), 2020, pp. 258–262. doi: https://doi.org/10.1109/ICT-PEP50916.2020.9249773.

Published
2026-01-26
How to Cite
[1]
S. Afriyani and R. Fajriyah, “A GENETIC ALGORITHM–PARTICLE SWARM OPTIMIZATION OPTIMIZED DOFCM APPROACH TO ENHANCE CLUSTERING AND OUTLIER DETECTION”, BAREKENG: J. Math. & App., vol. 20, no. 2, pp. 1453–1472, Jan. 2026.