Walter Nordström and Jacob Håkansson

Finding Clusters of Similar Artists - Analysis of DBSCAN and K-means Clustering

Abstract

We have applied k-means clustering and DBSCAN to the problem of finding sets of similar artists based on a large number of artists and their genres. For our experiments we used data from the Million Song Dataset, which is a freely available collection of a million popular music tracks’ metadata created specifically for research. We ran the algorithms with varying values on their parameters and studied the effects. The resulting clusters were analyzed and for k-means we found three different types of clusters. Although the results from k-means were quite noisy, many of the clusters could be used gain some insight in the similarity between artists.This implied that using distances as a representation of similarities between artists is viable. DBSCAN did not prove to be as useful. This was because its clustering method is density-based and the density of the clusters in the input data differed by far too much for DBSCAN to handle. We found that more features in the input data, such as genre per track, would be desirable and would probably improve the results of the algorithms. Further study of other clustering algorithms applied to the same data would shed light on the actual effectiveness of the algorithms studied here.