Carl Nyströmer
Alleviating the Lack of Data in Musical Instrument Activity Detection
Abstract:
With the ever growing media and music catalogs, tools that search and navigate this data are important. For more complex search queries, meta-data
is needed, but to manually label the vast amounts of new content is impossible. In this thesis, automatic labeling of musical instrument activities in
songs is investigated, with a focus on ways to alleviate the lack of annotated
data for instrument activity detection models. Two methods for alleviating the
problem of small amounts of data are proposed and evaluated. Firstly, a selfsupervised approach based on random mixing from different instrument stems
is investigated. Secondly, a domain-adaptation approach that trains models on
MIDI music for the classification on real recorded music is explored. The selfsupervised approach yields better results compared to the baseline and points
to the fact that deep learning models can learn instrument activity detection
without an intrinsic musical structure in the audio mix. The domain-adaptation
models trained solely on MIDI audio performed worse than the baseline, however using MIDI data in conjunction with real recorded music boosted the
performance. A hybrid model combining both self-supervised learning and
domain-adaptation by using both MIDI music and real recorded music produced the
best results overall