KTH / CSC / Kurser / DT2140

DT2140 Multimodal interfaces

Games with the purpose to acquire labels for multimodal data

One of the keys to current speech applications is machine learning of different kinds, which has allowed researchers and developers to reach further than what otherwise have been possible. But machine learning requires training material, which is usually acquired by recording and labelling speech and actions. Virtually every successful speech application, be it multimodal or speech-only, owes much of its success to the labelling and analysis of great amounts of data.

The recording of data has become easier and easier by the day, and today, we have access to ever-increasing collections of text, sound and video, as well as more unusual data such as motion capture. In many cases, however, manual labelling of the data is still necessary, and manual labelling this is a time-consuming and painstaking process. These days, Amazon's Mechanical Turk is often used to acquire labels for training. Although Mechanical Turk results have consistently proved above expectations (e.g. Novotney & Callison-Burch, 2010), voices have been raised that using Mechanical Turk may be a way to actively exploit workers and circumvent labour laws.

Recent years have seen the dawn of attempts to acquire labels at low cost and with high efficiency with so-called human computation through games with a purpose. The terms were coined by Louis von Ahn and Laura Dabbish (von Ahn, 2006; von Ahn & Dabbish, 2008), and refer to a method where the efforts gamers spend when playing games are utilized to some other (possibly hidden) purpose. Von Ahn and Dabbish used the method first in the ESP game , which was later acquired by Google and used to collect data for Google's image search.

In this project, you will build a prototype game with a purpose designed to collect labels for audio and video recordings of spoken dialogues, and to test the quality of the labels. The goal is to keep the overall game design general enough that it can be used for other tasks with relatively small effort.

A certain level of programming skills is required.

References
Novotney, S., & Callison-Burch, C. (2010). Cheap, fast and good enough: automatic speech recognition with non-expert transcription. In Proc. of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT '10).
von Ahn, L., & Dabbish, L. (2008). Designing games with a purpose. Communications of the ACM, 51(8), 58-67.
von Ahn, L. (2006). Games with a Purpose. COMPUTER, 92-94.

(1) Amazon's Mechanical Turk, https://www.mturk.com/
(2) The ESP Game, http://www.gwap.com/gwap/gamesPreview/espgame/

Contact: Olov Engwall

Course responsible: Olov Engwall, engwall@kth.se, 790 75 65