Carl Nyströmer

Alleviating the Lack of Data in Musical Instrument Activity Detection

Abstract:

With the ever growing media and music catalogs, tools that search and navigate this data are important. For more complex search queries, meta-data is needed, but to manually label the vast amounts of new content is impossible. In this thesis, automatic labeling of musical instrument activities in songs is investigated, with a focus on ways to alleviate the lack of annotated data for instrument activity detection models. Two methods for alleviating the problem of small amounts of data are proposed and evaluated. Firstly, a selfsupervised approach based on random mixing from different instrument stems is investigated. Secondly, a domain-adaptation approach that trains models on MIDI music for the classification on real recorded music is explored. The selfsupervised approach yields better results compared to the baseline and points to the fact that deep learning models can learn instrument activity detection without an intrinsic musical structure in the audio mix. The domain-adaptation models trained solely on MIDI audio performed worse than the baseline, however using MIDI data in conjunction with real recorded music boosted the performance. A hybrid model combining both self-supervised learning and domain-adaptation by using both MIDI music and real recorded music produced the best results overall