DT2140 Multimodal interfacesSuggested projectsThis list will be completed and is non-exhaustive. You may choose any project topic that interests you more, as long as you have had it approved by one of the teachers of the course.Supervisors: OE=Olov Engwall, AF=Anders Friberg, RB=Roberto Bresin, JM=Jonas Moll AL=Anders Lundstrom.
Tangible interfaces, Augmented Reality etc.Visual input and gesturesSound and MusicSpoken interactionGestures as a control in immersive gaming (OE):Description: Build an aplication using Kinect which allows you to navigate through google street view using gestures i.e. turn left, turn right, walk ahead, walk backwards, zoom-in, zoom-out. The aim is to use this application for people who are in an immersive gaming dome (http://mediagrid.org/summit/media/2011_Boston_Summit/GeoDome_Portal.jpg). Test your application by recording one participant navigating trough google street view in the immersive gaming dome. Requirements: Some programming experience (C++ or C#), high fun factor, good teamwork skills. Real-time facial animation (OE): Description: In this project, you will use an existing video-based facial tracking software, Face-API (Seeing Machines), to animate a 3D head in real-time. The project involves streaming tracking information into a game-engine, employing animation and retargeting techniques to animate a 3D head. You may optionally choose to build a head-mounted camera (i.e. the head-worn rig for the camera) to capture the facial images. Key-words: Real-time facial animation, telepresence, face tracking. Large virtual camera (OE): Background: By having a one-to-one correspondence between a real object and a 3D model of the same object and also knowing how they are spatially related, you can project the virtual object onto the real one. This gives the possibility to alter the real objects surface representation and to add a dynamic dimension to it (cf. Rorschachs face in the film “Watchmen”). Project description: >/i> This project aims to use motion capture technology to track a physical projection screen in space and let a virtual representation of the screen be projected back onto it in real-time. The virtual screen, in its turn, acts as a virtual camera in a digital scene. Set together, the tracked screen acts as a large camera viewfinder into a virtual space. Key-words: Motion capture, Mixed Reality, projection, Virtual camera
Games with the purpose to acquire labels for multimodal data (OE):
Task: To build a prototype game with a purpose designed to collect labels for audio and video recordings of spoken dialogues, and to test the quality of the labels. The goal is to keep the overall game design general enough that it can be used for other tasks with relatively small effort. Longer description
Head Rotation in 3 party dialogue (OE):
Do people always rotate their heads towards
who they are directing speech at, when standing at a
Short distance? If not, when do they?
Mutual Gaze in 3 party dialogues (OE):
How much do people have eye contact when talking?
How much people look at easy others when standing closly
or further from each others?
Audiovisual speech perception test with an augmented reality talking face (OE): "Is a 2D or a 3D representation best?"
Multimodal speech perception test with an augmented reality talking face (OE): "How can you read tongue movements?" . Use a freely available API to build a multimodal (or speech only) interface (OE): Software: Microsoft ASR & TTS, available in English Windows Vista & 7; WAMI toolkit for Javascript; Nuance Cafe for VoiceXML; CSLU toolkit, as an extension of Lab 3. Build an app or evaluate the performance of speech recognition on IPhone or Android mobile phones (OE) (Siri, Dragon Dictation etc.) Speech syntehsis evaluation (OE): For example, how good is the "Read out loud" feature in Adobe Reader? Test this functionality of the reader. Let listeners write down what they hear; what is the word accuracy? Let listeners rate the effort and pleasantness; is the "Read out loud" function any good?
Haptic interfacesCourse responsible: Olov Engwall, engwall@kth.se, 790 75 65 |