CLUSTERING AND VISUALIZATION OF OLYMPIC ATHLETE DATA BASED ON PHYSICAL AND DISCIPLINARY ATTRIBUTES
Abstract
This study aims to identify hidden patterns in international athlete data through clustering and data visualization approaches. The goal is to group athletes based on physical characteristics and sports disciplines to uncover meaningful trends. Utilizing a dataset of over 200,000 entries from 1896 to 2016, the study applies K-Means, Agglomerative and DBSCAN clustering methods. Preprocessing steps include handling missing data, selecting relevant variables (Height, Weight, Age, Sex, Sport, and Medal), and data normalization. The Silhouette score for K-Means (0.273647136516163645), Agglomerative (0.26134664130023655), and DBSCAN (-0.23920792207945957) indicates suboptimal clustering with overlapping clusters. K-Means clustering performs slightly better among the three methods. The findings are visualized through cluster plots and an interactive map showing medal distribution. This study highlights the limitations of traditional clustering methods for large datasets and suggests future exploration with advanced techniques.
Downloads
Copyright (c) 2025 VARIANCE: Journal of Statistics and Its Applications

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Editorial Team
Peer Review Process
Focus & Scope
Open Acces Policy
Privacy Statement
Author Guidelines
Publication Ethics
Publication Fees
Copyrigth Notice
Plagiarism Screening
Digital Archiving




