A GENETIC ALGORITHM–PARTICLE SWARM OPTIMIZATION OPTIMIZED DOFCM APPROACH TO ENHANCE CLUSTERING AND OUTLIER DETECTION
Abstract
In the era of Industry 4.0, Big Data from the IoT demands advanced analysis techniques. Outlier detection is vital as anomalies may indicate sensor failures, fraud, or abnormal medical records. Fuzzy clustering methods such as DOFCM are often applied, yet their performance depends on accurate cluster center placement, which remains challenging. While several Fuzzy C-Means extensions address outlier sensitivity, most rely on single optimization strategies. The integration of PSO and GA into DOFCM has been rarely explored, making this study novel in evaluating how different evolutionary algorithms enhance clustering robustness and anomaly detection. This research introduces DOFCM-PSO and DOFCM-GA, tested on five benchmark datasets with outliers: Iris, Wine, Sonar, Diabetes, and Ionosphere. The Silhouette Coefficient (SC) was used as the evaluation metric. Results show that GA consistently outperforms PSO, with SC values improving by approximately 0.02–0.03 (equivalent to an increase of 8–12%) across datasets. For instance, the Iris dataset improved from 0.6029 (PSO) to 0.6291 (GA), while the Wine dataset increased from 0.2759 to 0.2958. In addition, evaluation of computational time and outlier detection further supports these findings. Although GA required slightly longer runtime than PSO, it substantially reduced the number of outliers while still achieving higher SC values. A similar pattern was observed in the Diabetes dataset, where GA decreased outliers from 20 to 7 with a modest SC improvement. These results indicate that PSO is more efficient in runtime, but GA provides more robust clustering by minimizing anomalies and producing better separation quality. Despite promising results, this study is limited by the relatively small dataset sizes and sensitivity to parameter settings, which may influence outcomes. Future work should apply the method to larger datasets and include additional clustering indices. Overall, DOFCM-GA can be considered a robust approach for fuzzy clustering in the presence of anomalies.
Downloads
References
S. Munirathinam, INDUSTRY 4 . 0 : INDUSTRIAL INTERNET OF THINGS ( IIOT ), 1st ed. Elsevier Inc., 2020.
G. Aceto, V. Persico, and A. Pescapé, “INTEGRATION INDUSTRY 4 . 0 AND HEALTH : INTERNET OF THINGS, BIG DATA, AND CLOUD COMPUTING,” J. Ind. Inf. Integr., vol. 18, no. February, p. 100129, 2020. doi: https://doi.org/10.1016/j.jii.2020.100129.
M. Lavasani, R. Sotudeh-gharebagh, and R. Zarghami, “BIG DATA ANALYTICS OPPORTUNITIES FOR APPLICATIONS IN PROCESS ENGINEERING,” Rev. Chem. Eng., vol. 39, no. 3, pp. 479–511, 2023. doi: https://doi.org/10.1515/revce-2020-0054.
A. Boukerche, L. Zheng, and O. Alfandi, “OUTLIER DETECTION: METHODS, MODELS, AND CLASSIFICATION,” ACM Comput. Surv., vol. 53, no. 3, 2020. doi: https://doi.org/10.1145/3381028.
K. Yu, W. Shi, and N. Santoro, “DESIGNING A STREAMING ALGORITHM FOR OUTLIER DETECTION IN DATA MINING — AN INCREMENTAL APPROACH,” Sensors, vol. 20, no. 5, pp. 1–24, 2020. doi: https://doi.org/10.3390/s20051261.
A. Abid, S. El, and A. Kachouri, “WIRELESS SENSOR NETWORKS,” Computing, vol. 103, no. 10, pp. 2275–2292, 2021. doi: https://doi.org/10.1007/s00607-021-00939-5.
M. Boersma, K. Manoorkar, A. Palmigiano, and V. Universiteit, “OUTLIER DETECTION USING FLEXIBLE CATEGORISATION AND INTERROGATIVE AGENDAS,” Decis. Support Syst., vol. 180, no. 5, pp. 1–28, 2024. doi: https://doi.org/10.1016/j.dss.2024.114196.
W. Wang, X. Hu, and Y. Du, “ALGORITHM OPTIMIZATION AND ANOMALY DETECTION SIMULATION BASED ON EXTENDED JARVIS-PATRICK CLUSTERING AND OUTLIER DETECTION,” Alexandria Eng. J., 2021. doi: https://doi.org/10.1016/j.aej.2021.08.009.
P. Filzmoser and M. Gregorich, “MULTIVARIATE OUTLIER DETECTION IN APPLIED DATA ANALYSIS : GLOBAL, LOCAL, COMPOSITIONAL, AND CELLWISE OUTLIERS,” Math. Geosci., vol. 52, no. 8, pp. 1049–1066, 2020. doi: https://doi.org/10.1007/s11004-020-09861-6.
A. Conde, U. Mori, and J. A. Lozano, “A REVIEW ON OUTLIER/ANOMALY DETECTION IN TIME SERIES DATA,” 2020.
M. Al Samara, I. Bennis, A. Abouaissa, and P. Lorenz, “A SURVEY OF OUTLIER DETECTION TECHNIQUES IN IOT : REVIEW AND CLASSIFICATION,” J. Sens. Actuator Netw., vol. 11, no. 1, 2022. doi: https://doi.org/10.3390/jsan11010004.
F. Ridzuan, W. Mohd, and N. Wan, “DIAGNOSTIC ANALYSIS FOR OUTLIER DETECTION IN BIG DATA ANALYTICS,” Procedia Comput. Sci., vol. 197, pp. 685–692, 2022. doi: https://doi.org/10.1016/j.procs.2021.12.189.
M. C. Massi, F. Ieva, and E. Lettieri, “DATA MINING APPLICATION TO HEALTHCARE FRAUD DETECTION : A TWO-STEP UNSUPERVISED CLUSTERING METHOD FOR OUTLIER DETECTION WITH ADMINISTRATIVE DATABASES,” vol. 9, pp. 1–11, 2020.
N. H. M. M. Shrifan, M. F. Akbar, N. Ashidi, and M. Isa, “AN ADAPTIVE OUTLIER REMOVAL AIDED K‐MEANS CLUSTERING ALGORITHM,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 8, pp. 6365–6376, 2022. doi: https://doi.org/10.1016/j.jksuci.2021.07.003.
H. E. G. Lopes and M. de S. Gosling, “CLUSTER ANALYSIS IN PRACTICE : DEALING WITH OUTLIERS IN,” Rev. Adm. Contemp. J. Contemp. Adm., vol. 25, no. 1, pp. 1–19, 2021. doi: https://doi.org/10.1590/1982-7849rac2021200081.
M. Corain, P. Torino, and P. Torino, “DBSCOUT : A DENSITY-BASED METHOD FOR SCALABLE OUTLIER DETECTION IN VERY LARGE DATASETS,” Publ. IEEE, 2021. doi: https://doi.org/0.1109/ICDE51399.2021.00011.
O. Alghushairy, R. Alsini, T. Soule, and X. Ma, “A REVIEW OF LOCAL OUTLIER FACTOR ALGORITHMS FOR OUTLIER DETECTION IN BIG DATA STREAMS,” big data Cogn. Comput., vol. 5, no. 1, pp. 1–24, 2021. doi: https://doi.org/10.3390/bdcc5010001.
A. Nowak-brzezi, “QUALITATIVE DATA CLUSTERING TO DETECT OUTLIERS,” entropy, vol. 23, no. 7, p. 869, 2021. doi: https://doi.org/10.3390/e23070869.
H. Yadav, J. Singh, and A. Gosain, “EXPERIMENTAL ANALYSIS OF FUZZY CLUSTERING TECHNIQUES FOR OUTLIER DETECTION,” Procedia Comput. Sci., vol. 218, no. 3, pp. 959–968, 2023. doi: https://doi.org/10.1016/j.procs.2023.01.076.
W. Hyun, S. Sanghoun, and C. W. Ahn, “METAHEURISTIC-BASED TIME SERIES CLUSTERING FOR ANOMALY DETECTION IN MANUFACTURING INDUSTRY,” Appl. Intell., vol. 53, no. 6, pp. 21723–21742, 2023. doi: https://doi.org/10.1007/s10489-023-04594-5.
M. H. Sulaiman, Z. Mustaffa, M. Ahmed, Q. Li, L. Guo, and X. Wang, “GREY WOLF OPTIMIZER AND OTHER METAHEURISTIC OPTIMIZATION TECHNIQUES WITH IMAGE PROCESSING AS THEIR APPLICATIONS : A REVIEW,” Mater. Sci. Eng., vol. 1136, no. 1, pp. 0–17, 2021. doi: https://doi.org/10.1088/1757-899X/1136/1/012053.
R. P. Parouha and P. Verma, “A SYSTEMATIC OVERVIEW OF DEVELOPMENTS IN DIFFERENTIAL EVOLUTION AND PARTICLE SWARM OPTIMIZATION WITH THEIR ADVANCED SUGGESTION,” Appl. Intell., vol. 52, no. 1, pp. 10448–10492, 2022. doi: https://doi.org/10.1007/s10489-021-02803-7.
U. M. L. Repository, “IRIS DATASET,” Kaggle, 2016. https://www.kaggle.com/datasets/uciml/iris (accessed Sep. 22, 2024).
U. I. M. L. Repository, “RED WINE QUALITY,” Kaggle, 2009. https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009 (accessed Sep. 22, 2024).
M. H. Shawon, “SONAR DATA,” Kaggle, 2024. https://www.kaggle.com/datasets/mahmudulhaqueshawon/sonar-data (accessed Sep. 22, 2024).
Mathchi, “DIABETES DATA SET,” Kaggle, 2020. https://www.kaggle.com/datasets/mathchi/diabetes-data-set (accessed Sep. 22, 2024).
V. G. Sigillito, “IONOSPHERE,” Kaggle, 1989. https://www.kaggle.com/datasets/jamieleech/ionosphere (accessed Sep. 22, 2024).
A. Karami and C. Igbokwe, “THE IMPACT OF BIG DATA CHARACTERISTICS ON CREDIT RISK ASSESSMENT,” Int. J. Data Sci. Anal., vol. 2, no. March, pp. 1–13, 2025. doi: https://doi.org/10.1007/s41060-025-00753-8.
G. P. Selvarajan, “HARNESSING AI-DRIVEN DATA MINING FOR PREDICTIVE INSIGHTS : A FRAMEWORK FOR ENHANCING DECISION- MAKING IN DYNAMIC DATA ENVIRONMENTS,” Int. J. Creat. Res. Thoughts, vol. 9, no. 2, pp. 5476–5486, 2021.
M. Shantal, Z. Othman, and A. A. Bakar, “A NOVEL APPROACH FOR DATA FEATURE WEIGHTING USING CORRELATION COEFFICIENTS AND MIN–MAX NORMALIZATION,” Symmetry (Basel)., vol. 15, no. 12, 2023. doi: https://doi.org/10.3390/sym15122185.
S. S. Nagari, L. Inayati, U. Airlangga, E. Java, M. V. Midwife, and E. Java, “IMPLEMENTATION OF CLUSTERING USING K-MEANS METHOD TO DETERMINE NUTRITIONAL STATUS,” J. Biometrika dan Kependud., vol. 9, no. 1, pp. 62–68, 2020. doi: https://doi.org/10.20473/jbk.v9i1.2020.62.
A. Shahraki, A. Taherkordi, Ø. Haugen, and F. Eliassen, “CLUSTERING OBJECTIVES IN WIRELESS SENSOR NETWORKS : A SURVEY AND RESEARCH DIRECTION ANALYSIS,” Comput. Networks, p. 107376, 2020. doi: https://doi.org/10.1016/j.comnet.2020.107376.
C. Arimie, E. Biu, and M. Ijomah, “OUTLIER DETECTION AND EFFECTS ON MODELING,” Open Access Libr. J., vol. 7, no. 9, pp. 1–30, 2020. doi: https://doi.org/10.4236/oalib.1106619.
S. S. Kumar, S. T. Ahmed, Q. Xin, S. Sandeep, and M. Madheswaran, “UNSTRUCTURED ONCOLOGICAL IMAGE CLUSTER IDENTIFICATION USING IMPROVED UNSUPERVISED CLUSTERING TECHNIQUES,” Comput. Mater. Contin., vol. 72, no. 1, pp. 281–299, 2022. doi: https://doi.org/10.32604/cmc.2022.023693.
K. Bhalla and A. Gosain, “A NOVEL HYBRIDIZED FUZZY CLUSTERING TECHNIQUE FOR SEGMENTATION OF NOISY MAMMOGRAM IMAGES,” J. Inf. Optim. Sci., vol. 46, no. 2, pp. 383–401, 2025. doi: https://doi.org/10.47974/JIOS-1922.
T. M. Shami, A. A. El-saleh, and S. Member, “PARTICLE SWARM OPTIMIZATION : A COMPREHENSIVE SURVEY,” IEEE Access, vol. 99, no. 1, pp. 10031–10061, 2022. doi: https://dx.doi.org/10.1109/ACCESS.2022.3142859.
A. G. Gad, “PARTICLE SWARM OPTIMIZATION ALGORITHM AND ITS APPLICATIONS: A SYSTEMATIC REVIEW,” Arch. Comput. Methods Eng., vol. 29, no. 5, pp. 2531–2561, 2022. doi: https://doi.org/10.1007/s11831-021-09694-4.
H. A. Alsattar and A. A. Z. B. B. Zaidan, “NOVEL META-HEURISTIC BALD EEAGLE SEARCH OPTIMISATION ALGORITHM,” Artif. Intell. Rev., vol. 53, no. 7, pp. 2237–2264, 2019. doi: https://doi.org/10.1007/s10462-019-09732-5.
T. Alam, S. Qamar, A. Dixit, and M. Benaida, “GENETIC ALGORITHM : REVIEWS , IMPLEMENTATIONS , AND APPLICATIONS,” Int. J. Eng. Pedagog., vol. 10, no. 6, pp. 57–77, 2020. doi: https://doi.org/10.3991/ijep.v10i6.14567.
S. Damodaran, GENETIC ALGORITHMS ( GAS ); MIMICKING THE NATURAL SELECTION PROCESS TO ARRIVE AT OPTIMUM DESIGN SOLUTION ( S ), no. February. 2022.
H. B. Tambunan, “ELECTRICAL PEAK LOAD CLUSTERING ANALYSIS USING K-MEANS ALGORITHM AND SILHOUETTE COEFFICIENT,” in International Conference on Technology and Policy in Energy and Electric Power (ICT-PEP), 2020, pp. 258–262. doi: https://doi.org/10.1109/ICT-PEP50916.2020.9249773.
Copyright (c) 2026 Sintia Afriyani, Rohmatul Fajriyah

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.




1.gif)


