DT2140 Multimodal interfaces
Games with the purpose to acquire labels for multimodal data
One of the keys to current speech applications is machine learning of different kinds, which has allowed researchers and developers to reach
further than what otherwise have been possible. But machine learning
requires training material, which is usually acquired by recording and
labelling speech and actions. Virtually every successful speech
application, be it multimodal or speech-only, owes much of its success
to the labelling and analysis of great amounts of data.
The recording of data has become easier and easier by the day, and
today, we have access to ever-increasing collections of text, sound
and video, as well as more unusual data such as motion capture. In
many cases, however, manual labelling of the data is still necessary,
and manual labelling this is a time-consuming and painstaking process.
These days, Amazon's Mechanical Turk is often used to acquire labels
for training. Although Mechanical Turk results have consistently
proved above expectations (e.g. Novotney & Callison-Burch, 2010),
voices have been raised that using Mechanical Turk may be a way to
actively exploit workers and circumvent labour laws.
Recent years have seen the dawn of attempts to acquire labels at low
cost and with high efficiency with so-called human computation through
games with a purpose. The terms were coined by Louis von Ahn and
Laura Dabbish (von Ahn, 2006; von Ahn & Dabbish, 2008), and refer to a
method where the efforts gamers spend when playing games are utilized
to some other (possibly hidden) purpose. Von Ahn and Dabbish used the
method first in the ESP game , which was later acquired by Google and
used to collect data for Google's image search.
In this project, you will build a prototype game with a purpose
designed to collect labels for audio and video recordings of spoken
dialogues, and to test the quality of the labels. The goal is to keep
the overall game design general enough that it can be used for other
tasks with relatively small effort.
A certain level of programming skills is required.
References
Novotney, S., & Callison-Burch, C. (2010). Cheap, fast and good
enough: automatic speech recognition with non-expert transcription. In
Proc. of Human Language Technologies: The 2010 Annual Conference of
the North American Chapter of the Association for Computational
Linguistics (HLT '10).
von Ahn, L., & Dabbish, L. (2008). Designing games with a purpose.
Communications of the ACM, 51(8), 58-67.
von Ahn, L. (2006). Games with a Purpose. COMPUTER, 92-94.
(1) Amazon's Mechanical Turk, https://www.mturk.com/
(2) The ESP Game, http://www.gwap.com/gwap/gamesPreview/espgame/
Contact: Olov Engwall
Course responsible: Olov Engwall, engwall@kth.se, 790 75 65