Automated Multimodal Emotion Recognition

Marcos Fernández Carbonell

Abstract

Being able to read and interpret affective states plays a significant role in human society. However, this is difficult in some situations, especially when information is limited to either vocal or visual cues. Many researchers have investigated the so-called basic emotions in a supervised way. This thesis holds the results of a multimodal supervised and unsupervised study of a more realistic number of emotions. To that end, audio and video features are extracted from the GEMEP dataset employing openSMILE and OpenFace, respectively. The supervised approach includes the comparison of multiple solutions and proves that multimodal pipelines can outperform unimodal ones, even with a higher number of affective states. The unsupervised approach embraces a conservative and an exploratory method to find meaningful patterns in the multimodal dataset. It also contains an innovative procedure to better understand the output of clustering techniques.

Keywords Multimodal Machine Learning, Emotion Recognition, Supervised Learning, Unsupervised Learning