CLUSTERING AND VISUALIZATION OF OLYMPIC ATHLETE DATA BASED ON PHYSICAL AND DISCIPLINARY ATTRIBUTES

Main Article Content

Hilwin Nisa
Abela Chairunissa

Abstract

This study aims to identify hidden patterns in international athlete data through clustering and data visualization approaches. The goal is to group athletes based on physical characteristics and sports disciplines to uncover meaningful trends. Utilizing a dataset of over 200,000 entries from 1896 to 2016, the study applies K-Means, Agglomerative and DBSCAN clustering methods. Preprocessing steps include handling missing data, selecting relevant variables (Height, Weight, Age, Sex, Sport, and Medal), and data normalization. The Silhouette score for K-Means (0.273647136516163645), Agglomerative (0.26134664130023655), and DBSCAN (-0.23920792207945957) indicates suboptimal clustering with overlapping clusters. K-Means clustering performs slightly better among the three methods. The findings are visualized through cluster plots and an interactive map showing medal distribution. This study highlights the limitations of traditional clustering methods for large datasets and suggests future exploration with advanced techniques.

Downloads

Download data is not yet available.

Article Details

Section
Articles