A NOVEL APPROACH TO SYMBOLIC DATA CLUSTERING USING ENHANCED K-MEANS ALGORITHM
Abstract
Clustering is a crucial technique in image analysis, yet traditional methods such as K-Means often struggle when dealing with complex, high-dimensional, or uncertain data. This limitation reduces their effectiveness in accurately grouping images, particularly when variability and overlapping features exist across categories. To address this problem, this paper introduces a novel approach that integrates symbolic data with the K-Means algorithm to cluster image data more effectively. By symbolically representing both color intensity and spatial features, we enhance the algorithm’s ability to handle variability and uncertainty. We test our method on the CIFAR-10 dataset, where it achieves a clustering accuracy of 94.0% with an Adjusted Rand Index of 0.7, outperforming traditional methods such as K-Means (82.5%), DBSCAN (78.1%), and Hierarchical clustering (81.3%). Our results demonstrate that symbolic data analysis offers a more flexible and accurate solution for image clustering, with potential applications in fields such as medical image processing and environmental monitoring. Limitations and directions for future research are also discussed.
Downloads
References
J. A. Hartigan and M. A. Wong, “ALGORITHM AS 136: A K-MEANS CLUSTERING ALGORITHM,” Applied Statistics, vol. 28, no. 1, hlm. 100, 1979, doi: https://doi.org/10.2307/2346830.
M. E. Celebi, “IMPROVING THE PERFORMANCE OF K-MEANS FOR COLOR QUANTIZATION,” Image and Vision Computing, vol. 29, no. 4, hlm. 260–271, Mar 2011, doi: https://doi.org/10.1016/j.imavis.2010.10.002.
L. Billard and E. Diday, SYMBOLIC DATA ANALYSIS: CONCEPTUAL STATISTICS AND DATA MINING, in Wiley series in computational statistics. Hoboken, NJ: Wiley, 2007.
M. Noirhomme‐Fraiture and P. Brito, “FAR BEYOND THE CLASSICAL DATA MODELS: SYMBOLIC DATA ANALYSIS,” Statistical Analysis, vol. 4, no. 2, hlm. 157–170, Apr 2011, doi: https://doi.org/10.1002/sam.10112.
K. Jajuga, A. Sokołowski, and H.-H. Bock, Ed., CLASSIFICATION, CLUSTERING, AND DATA ANALYSIS: RECENT ADVANCES AND APPLICATIONS. in Studies in Classification, Data Analysis, and Knowledge Organization. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002. doi: https://doi.org/10.1007/978-3-642-56181-8.
G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A.W.M. V. D. Laak, B. V.aGinneken and C. I. Sánchez., “A SURVEY ON DEEP LEARNING IN MEDICAL IMAGE ANALYSIS,” Medical Image Analysis, vol. 42, hlm. 60–88, Des 2017, doi: https://doi.org/10.1016/j.media.2017.07.005.
Y. J. Zhang, “A SURVEY ON EVALUATION METHODS FOR IMAGE SEGMENTATION,” Pattern Recognition, vol. 29, no. 8, hlm. 1335–1346, Agu 1996, doi: https://doi.org/10.1016/0031-3203(95)00169-7.
M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “DEEP CLUSTERING FOR UNSUPERVISED LEARNING OF VISUAL FEATURES,” in Computer Vision – ECCV 2018, vol. 11218, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Ed., in Lecture Notes in Computer Science, vol. 11218. , Cham: Springer International Publishing, 2018, hlm. 139–156. doi: https://doi.org/10.1007/978-3-030-01264-9_9.
Y. LeCun, Y. Bengio, and G. Hinton, “DEEP LEARNING,” Nature, vol. 521, no. 7553, hlm. 436–444, Mei 2015, doi: https://doi.org/10.1038/nature14539.
J. Macqueen, “SOME METHODS FOR CLASSIFICATION AND ANALYSIS OF MULTIVARIATE OBSERVATIONS”.
P. Fränti and S. Sieranoja, “HOW MUCH CAN K-MEANS BE IMPROVED BY USING BETTER INITIALIZATION AND REPEATS?,” Pattern Recognition, vol. 93, hlm. 95–112, Sep 2019, doi: https://doi.org/10.1016/j.patcog.2019.04.014.
F. Peng and K. Li, “DEEP IMAGE CLUSTERING BASED ON LABEL SIMILARITY AND MAXIMIZING MUTUAL INFORMATION ACROSS VIEWS,” Applied Sciences, vol. 13, no. 1, hlm. 674, Jan 2023, doi: https://doi.org/10.3390/app13010674.
Y. Li, P. Hu, D. Peng, J. Lv, J. Fan, and X. Peng, “IMAGE CLUSTERING WITH EXTERNAL GUIDANCE,” 16 Juli 2024, arXiv: arXiv:2310.11989. doi: https://doi.org/10.48550/arXiv.2310.11989.
A. Stephan, L. Miklautz, K. Sidak, J. P. Wahle, B. Gipp, C. Plant and B. Roth., “TEXT-GUIDED IMAGE CLUSTERING”.
S. Raya, M. Orabi, I. Afyouni, and Z. Al Aghbari, “MULTI-MODAL DATA CLUSTERING USING DEEP LEARNING: A SYSTEMATIC REVIEW,” Neurocomputing, vol. 607, hlm. 128348, Nov 2024, doi: https://doi.org/10.1016/j.neucom.2024.128348.
H.-Y. Hsu, K. H. Keoy, J.-R. Chen, H.-C. Chao, and C.-F. Lai, “PERSONALIZED FEDERATED LEARNING ALGORITHM WITH ADAPTIVE CLUSTERING FOR NON-IID IOT DATA INCORPORATING MULTI-TASK LEARNING AND NEURAL NETWORK MODEL CHARACTERISTICS,” Sensors, vol. 23, no. 22, hlm. 9016, Nov 2023, doi: https://doi.org/10.3390/s23229016.
X. Wu, Y.-F. Yu, L. Chen, W. Ding, and Y. Wang, “ROBUST DEEP FUZZY K -MEANS CLUSTERING FOR IMAGE DATA,” Pattern Recognition, vol. 153, hlm. 110504, Sep 2024, doi: https://doi.org/10.1016/j.patcog.2024.110504.
J. D. Yanosky, C. J. Paciorek, F. Laden, J. E. Hart, R. C. Puett, D. Liao and H. H. Suh., “SPATIO-TEMPORAL MODELING OF PARTICULATE AIR POLLUTION IN THE CONTERMINOUS UNITED STATES USING GEOGRAPHIC AND METEOROLOGICAL PREDICTORS,” Environ Health, vol. 13, no. 1, hlm. 63, Des 2014, doi: https://doi.org/10.1186/1476-069X-13-63.
D. Hasenfratz, O. Saukh, S. Sturzenegger, and L. Thiele, “PARTICIPATORY AIR POLLUTION MONITORING USING SMARTPHONES”.
A. J. Alaran, N. O’Sullivan, L. Tatah, R. Sserunjogi, and G. Okello, “AIR POLLUTION (PM2.5 ) AND ITS METEOROLOGY PREDICTORS IN KAMPALA AND JINJA CITIES, IN UGANDA,” Environ. Sci.: Atmos., vol. 4, no. 10, hlm. 1145–1156, 2024, doi: https://doi.org/10.1039/D4EA00074A.
M. Ester, H.-P. Kriegel, and X. Xu, “A DENSITY-BASED ALGORITHM FOR DISCOVERING CLUSTERS IN LARGE SPATIAL DATABASES WITH NOISE”.
S. C. Johnson, “HIERARCHICAL CLUSTERING SCHEMES,” in Psychometrika, vol. 32, hlm. 241–254.
R. C. Gonzalez and R. E. Woods, DIGITAL IMAGE PROCESSING, 2nd ed. Upper Saddle River, N.J: Prentice Hall, 2002.
“NUMPY DOCUMENTATION — NUMPY V2.3 MANUAL.” Diakses: 23 September 2025. [Daring]. Available online: https://numpy.org/doc/stable/
“PYPI DOCS.” Diakses: 23 September 2025. [Daring]. Available online: https://docs.pypi.org/
“USER GUIDE,” scikit-learn. Diakses: 23 September 2025. [Daring]. Available online: https://scikit-learn/stable/user_guide.html
Copyright (c) 2026 Husty Serviana Husain, Sapto Wahyu Indratno, Sandy Vantika

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this Journal agree to the following terms:
- Author retain copyright and grant the journal right of first publication with the work simultaneously licensed under a creative commons attribution license that allow others to share the work within an acknowledgement of the work’s authorship and initial publication of this journal.
- Authors are able to enter into separate, additional contractual arrangement for the non-exclusive distribution of the journal’s published version of the work (e.g. acknowledgement of its initial publication in this journal).
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their websites) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published works.




1.gif)


