A NOVEL APPROACH TO SYMBOLIC DATA CLUSTERING USING ENHANCED K-MEANS ALGORITHM

  • Husty Serviana Husain Department of Mathematics, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung, Indonesia https://orcid.org/0000-0003-0739-5757
  • Sapto Wahyu Indratno Department of Mathematics, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung, Indonesia https://orcid.org/0009-0000-1347-4027
  • Sandy Vantika Department of Mathematics, Faculty of Mathematics and Natural Sciences, Institut Teknologi Bandung, Indonesia https://orcid.org/0009-0007-5422-544X
Keywords: Cluster image, K-Means algorithm, Represent features, Symbolic data

Abstract

Clustering is a crucial technique in image analysis, yet traditional methods such as K-Means often struggle when dealing with complex, high-dimensional, or uncertain data. This limitation reduces their effectiveness in accurately grouping images, particularly when variability and overlapping features exist across categories. To address this problem, this paper introduces a novel approach that integrates symbolic data with the K-Means algorithm to cluster image data more effectively. By symbolically representing both color intensity and spatial features, we enhance the algorithm’s ability to handle variability and uncertainty. We test our method on the CIFAR-10 dataset, where it achieves a clustering accuracy of 94.0% with an Adjusted Rand Index of 0.7, outperforming traditional methods such as K-Means (82.5%), DBSCAN (78.1%), and Hierarchical clustering (81.3%). Our results demonstrate that symbolic data analysis offers a more flexible and accurate solution for image clustering, with potential applications in fields such as medical image processing and environmental monitoring. Limitations and directions for future research are also discussed.

Downloads

Download data is not yet available.

References

J. A. Hartigan and M. A. Wong, “ALGORITHM AS 136: A K-MEANS CLUSTERING ALGORITHM,” Applied Statistics, vol. 28, no. 1, hlm. 100, 1979, doi: https://doi.org/10.2307/2346830.

M. E. Celebi, “IMPROVING THE PERFORMANCE OF K-MEANS FOR COLOR QUANTIZATION,” Image and Vision Computing, vol. 29, no. 4, hlm. 260–271, Mar 2011, doi: https://doi.org/10.1016/j.imavis.2010.10.002.

L. Billard and E. Diday, SYMBOLIC DATA ANALYSIS: CONCEPTUAL STATISTICS AND DATA MINING, in Wiley series in computational statistics. Hoboken, NJ: Wiley, 2007.

M. Noirhomme‐Fraiture and P. Brito, “FAR BEYOND THE CLASSICAL DATA MODELS: SYMBOLIC DATA ANALYSIS,” Statistical Analysis, vol. 4, no. 2, hlm. 157–170, Apr 2011, doi: https://doi.org/10.1002/sam.10112.

K. Jajuga, A. Sokołowski, and H.-H. Bock, Ed., CLASSIFICATION, CLUSTERING, AND DATA ANALYSIS: RECENT ADVANCES AND APPLICATIONS. in Studies in Classification, Data Analysis, and Knowledge Organization. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002. doi: https://doi.org/10.1007/978-3-642-56181-8.

G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A.W.M. V. D. Laak, B. V.aGinneken and C. I. Sánchez., “A SURVEY ON DEEP LEARNING IN MEDICAL IMAGE ANALYSIS,” Medical Image Analysis, vol. 42, hlm. 60–88, Des 2017, doi: https://doi.org/10.1016/j.media.2017.07.005.

Y. J. Zhang, “A SURVEY ON EVALUATION METHODS FOR IMAGE SEGMENTATION,” Pattern Recognition, vol. 29, no. 8, hlm. 1335–1346, Agu 1996, doi: https://doi.org/10.1016/0031-3203(95)00169-7.

M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “DEEP CLUSTERING FOR UNSUPERVISED LEARNING OF VISUAL FEATURES,” in Computer Vision – ECCV 2018, vol. 11218, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Ed., in Lecture Notes in Computer Science, vol. 11218. , Cham: Springer International Publishing, 2018, hlm. 139–156. doi: https://doi.org/10.1007/978-3-030-01264-9_9.

Y. LeCun, Y. Bengio, and G. Hinton, “DEEP LEARNING,” Nature, vol. 521, no. 7553, hlm. 436–444, Mei 2015, doi: https://doi.org/10.1038/nature14539.

J. Macqueen, “SOME METHODS FOR CLASSIFICATION AND ANALYSIS OF MULTIVARIATE OBSERVATIONS”.

P. Fränti and S. Sieranoja, “HOW MUCH CAN K-MEANS BE IMPROVED BY USING BETTER INITIALIZATION AND REPEATS?,” Pattern Recognition, vol. 93, hlm. 95–112, Sep 2019, doi: https://doi.org/10.1016/j.patcog.2019.04.014.

F. Peng and K. Li, “DEEP IMAGE CLUSTERING BASED ON LABEL SIMILARITY AND MAXIMIZING MUTUAL INFORMATION ACROSS VIEWS,” Applied Sciences, vol. 13, no. 1, hlm. 674, Jan 2023, doi: https://doi.org/10.3390/app13010674.

Y. Li, P. Hu, D. Peng, J. Lv, J. Fan, and X. Peng, “IMAGE CLUSTERING WITH EXTERNAL GUIDANCE,” 16 Juli 2024, arXiv: arXiv:2310.11989. doi: https://doi.org/10.48550/arXiv.2310.11989.

A. Stephan, L. Miklautz, K. Sidak, J. P. Wahle, B. Gipp, C. Plant and B. Roth., “TEXT-GUIDED IMAGE CLUSTERING”.

S. Raya, M. Orabi, I. Afyouni, and Z. Al Aghbari, “MULTI-MODAL DATA CLUSTERING USING DEEP LEARNING: A SYSTEMATIC REVIEW,” Neurocomputing, vol. 607, hlm. 128348, Nov 2024, doi: https://doi.org/10.1016/j.neucom.2024.128348.

H.-Y. Hsu, K. H. Keoy, J.-R. Chen, H.-C. Chao, and C.-F. Lai, “PERSONALIZED FEDERATED LEARNING ALGORITHM WITH ADAPTIVE CLUSTERING FOR NON-IID IOT DATA INCORPORATING MULTI-TASK LEARNING AND NEURAL NETWORK MODEL CHARACTERISTICS,” Sensors, vol. 23, no. 22, hlm. 9016, Nov 2023, doi: https://doi.org/10.3390/s23229016.

X. Wu, Y.-F. Yu, L. Chen, W. Ding, and Y. Wang, “ROBUST DEEP FUZZY K -MEANS CLUSTERING FOR IMAGE DATA,” Pattern Recognition, vol. 153, hlm. 110504, Sep 2024, doi: https://doi.org/10.1016/j.patcog.2024.110504.

J. D. Yanosky, C. J. Paciorek, F. Laden, J. E. Hart, R. C. Puett, D. Liao and H. H. Suh., “SPATIO-TEMPORAL MODELING OF PARTICULATE AIR POLLUTION IN THE CONTERMINOUS UNITED STATES USING GEOGRAPHIC AND METEOROLOGICAL PREDICTORS,” Environ Health, vol. 13, no. 1, hlm. 63, Des 2014, doi: https://doi.org/10.1186/1476-069X-13-63.

D. Hasenfratz, O. Saukh, S. Sturzenegger, and L. Thiele, “PARTICIPATORY AIR POLLUTION MONITORING USING SMARTPHONES”.

A. J. Alaran, N. O’Sullivan, L. Tatah, R. Sserunjogi, and G. Okello, “AIR POLLUTION (PM2.5 ) AND ITS METEOROLOGY PREDICTORS IN KAMPALA AND JINJA CITIES, IN UGANDA,” Environ. Sci.: Atmos., vol. 4, no. 10, hlm. 1145–1156, 2024, doi: https://doi.org/10.1039/D4EA00074A.

M. Ester, H.-P. Kriegel, and X. Xu, “A DENSITY-BASED ALGORITHM FOR DISCOVERING CLUSTERS IN LARGE SPATIAL DATABASES WITH NOISE”.

S. C. Johnson, “HIERARCHICAL CLUSTERING SCHEMES,” in Psychometrika, vol. 32, hlm. 241–254.

R. C. Gonzalez and R. E. Woods, DIGITAL IMAGE PROCESSING, 2nd ed. Upper Saddle River, N.J: Prentice Hall, 2002.

“NUMPY DOCUMENTATION — NUMPY V2.3 MANUAL.” Diakses: 23 September 2025. [Daring]. Available online: https://numpy.org/doc/stable/

“PYPI DOCS.” Diakses: 23 September 2025. [Daring]. Available online: https://docs.pypi.org/

“USER GUIDE,” scikit-learn. Diakses: 23 September 2025. [Daring]. Available online: https://scikit-learn/stable/user_guide.html

Published
2026-01-26
How to Cite
[1]
H. Serviana Husain, S. Wahyu Indratno, and S. Vantika, “A NOVEL APPROACH TO SYMBOLIC DATA CLUSTERING USING ENHANCED K-MEANS ALGORITHM”, BAREKENG: J. Math. & App., vol. 20, no. 2, pp. 1263–1282, Jan. 2026.