Predicting Response Times for the Spotify Backend

by Rerngvit Yanggratoke, Gunnar Kreitz, Mikael Goldmann, and Rolf Stadler

Best paper at The International Conference on Network and Service Management (CNSM) 2012


We model and evaluate the performance of a distributed key-value storage system that is part of the Spotify backend. Spotify is an on-demand music streaming service, offering low-latency access to a library of over 16 million tracks and serving over 10 million users currently. We first present a simplified model of the Spotify storage architecture, in order to make its analysis feasible. We then introduce an analytical model for the distribution of the response time, a key metric in the Spotify service. We parameterize and validate the model using measurements from two different testbed configurations and from the operational Spotify infrastructure. We find that the model is accurate — measurements are within 11% of predictions — within the range of normal load patterns. We apply the model to what-if scenarios that are essential to capacity planning and robustness engineering. The main difference between our work and related research in storage system performance is that our model provides distributions of key system metrics, while related research generally gives only expectations, which is not sufficient in our case.