DT2140 Multimodal interfacesLaboratory exercise 3: Multimodal speech interfacesTask: To build a simple speech-driven interface with a talking head to the object manipulation in Processing (lab exercise 2) and sound manipulation in pd (lab exercise 1).The objective of the exercise is to give the students a basic hands-on experience with speech synthesis and speech recognition components as well as illustrating the benefit of using alternative modalities and the importance of efficient confirmation and correction management. Software & equipment: The CSLU toolkit, headphones and one microphone. Please bring your own headset/microphone if you have one. Note: If you wish, it is possible to download the CSLU toolkit to your own computer and do (much of) the task on your own (to do it entirely, including the communication with other programs, you also need the tcl package udp, Processing and pd on your computer). If you do, you should save your solution as a .rad-file and present it at the scheduled exercise session.
Background:The human-computer interface that you create should be appropriate for e.g. a cell phone (a small screen, some limited cursor control, inefficient keyboard for letters, but a digit keypad):- The main interaction (input and output) is via voice. - You should use an animated agent that speaks with either text-to-speech synthesis or time aligned recorded prompts of your own voice (see Tutorial 13 for the latter). - The screen presents the animated agent and allowed replies from the user for each question. - The digit keypad may be used in repair subdialogues (e.g. to type the amount to resize), but should not be the main interaction input. - The cursor may be used in repair subdialogues to make a user selection graphically.
The agent needs to check that the user has been correctly recognized and correct misrecognitions. It is however inefficient (and hence not allowed in this exercise) to do the check
Preparations:Before the exercise session, you should- look at Tutorials 1-3, 5-6, 11-12 and 19, in order to get to know the Rapid Application Developer in the toolkit. - team up with a fellow student to do the lab with. - (in the lab pairs) draw a flowchart in the program of your choice (or on paper and scan the sketch) of the components and the transitions you need for the dialogue interaction in the exercise (see below). - The sketch should be submitted in bilda before the lab. You should be able to present and discuss the flowchart with the lab assistant at the beginning of the lab session. - decide when and how confirmation requests and repairs should be done in the dialogue. Instructions:The exercise consists of two parts, one compulsory, where you build a simple and very constrained dialogue, where the interaction is completely system-driven and one optional, where you create a freer, more user-controlled interaction, if you have the time. Download the following scripts and save them in V:/.rad/:
The socket program that enables you to communicate with Processing Start Processing, load the files in V:/.rad/UDP and play. Start PD and load the sound patch Start the RAD (Rapid Application Development) from the CSLU Toolkit folder on the Start menu. Use the RAD components to create a dialogue where a computer-animated character asks the user for (with possible answers):
SHAPE: The shape of the object: [rectangle], [triangle], [circle], [ellipse]. You should insert intermediate confirmation/correction checks at appropriate states in the dialogue (e.g., include a correction option in the next stage) and take the correct action to correct errors, i.e. the user should NOT have to repeat answers that had been correctly recognized. Repair sub-dialogues may use an alternative method to request the information (see the dialogue components below). Build the dialogue incrementally, i.e., start with a small part of the dialogue, build and test it, then add new components and finally repair sub-dialogues. At the end of the exercise session, show your solution to the lab assistant and discuss your choice of dialogue flow and the performance of your interface. Dialogue components:You should use the following states in your dialogue, but you may use others as well, and you decide on how each component is used. You may also make small changes to the states in order to adapt them to your dialogue scheme, as long as the complexity of the task or the use of multimodality are not reduced.
* WELCOME: The agent greets the user.
Useful components and hints: Note that for scale, you need combine the answer at RESIZE and SIZE_AMOUNT, so that you send one scale value, which should be equal to $SIZE_AMOUNT if $RESIZE is "larger" and 1/$SIZE_AMOUNT if $RESIZE is "smaller". That is, $varvalue for scale may be 0.2, 0.25, 0.33, 0.5, 2, 3, 4 or 5.
* The recognition output variable from a GENERIC component (e.g. SHAPE) is referred to using the component name + (recog), e.g. $SHAPE(recog)
Optional add-ons:If you have the time:Change the dialogue so that it is more user-driven, i.e., the agent asks "What do you want to do?" and depending on the user's answer (e.g. "Make a red circle", "Change the sound", "Make it twice as big."), different actions are taken.
Create multimodal repair components:
Requirements: Course responsible: Olov Engwall, engwall@kth.se, 790 75 65 |