CLUSTERING AND VISUALIZATION OF OLYMPIC ATHLETE DATA BASED ON PHYSICAL AND DISCIPLINARY ATTRIBUTES

Hilwin Nisa; Abela Chairunissa

doi:10.30598/variancevol7iss1page113-122

Hilwin Nisa Department of Statistics, Faculty of Mathematics and Natural Sciences, Brawijaya University
Abela Chairunissa Department of Statistics, Faculty of Mathematics and Natural Sciences, Brawijaya University

DOI: https://doi.org/10.30598/variancevol7iss1page113-122

Keywords: Agglomerative, Cluster Analysis, DBSCAN, K-Means, Sport Analytics

Abstract

This study aims to identify hidden patterns in international athlete data through clustering and data visualization approaches. The goal is to group athletes based on physical characteristics and sports disciplines to uncover meaningful trends. Utilizing a dataset of over 200,000 entries from 1896 to 2016, the study applies K-Means, Agglomerative and DBSCAN clustering methods. Preprocessing steps include handling missing data, selecting relevant variables (Height, Weight, Age, Sex, Sport, and Medal), and data normalization. The Silhouette score for K-Means (0.273647136516163645), Agglomerative (0.26134664130023655), and DBSCAN (-0.23920792207945957) indicates suboptimal clustering with overlapping clusters. K-Means clustering performs slightly better among the three methods. The findings are visualized through cluster plots and an interactive map showing medal distribution. This study highlights the limitations of traditional clustering methods for large datasets and suggests future exploration with advanced techniques.

Downloads

Download data is not yet available.