How Automatic Speech Recognition (ASR) and Speech Synthesis (or Text-To-Speech – TTS) work and why these are such computationally-difficult problems
Where are ASR and TTS used in current commercial interactive applications
What are the usability issues surrounding speech-based interaction systems, particularly in mobile and pervasive computing
What are the challenges in enabling speech as a modality for mobile interaction
What is the current state-of-the-art in ASR and TTS research
What are the differences between the commercial ASR systems' accuracy claims and the needs of mobile interactive applications
What are the difficulties in evaluating the quality of TTS systems, particularly from a usability and user perspective
What opportunities exist for HCI researchers in terms of enhancing systems' interactivity by enabling speech
How do current heuristic guidelines apply to voice interfaces, and how are these influenced by engineering limitations
New to 2021 is a theoretical and practical review of most recent research on developing design guidelines for conversation user interfaces, which we are contextualizing in terms of the engineering capabilities of the underlying speech processing systems. Hands-on activities will also be carried out specific to this topic, with participants invited to conduct usability walk-throughs of a readily-available system (e.g Alexa, Google Assistant, Siri) as guided by heuristic guidelines.
The course includes three interactive, hands-on activities. The first activity will engage participants in proposing design alternatives for the error-handling interaction of a smartphone's voice-based search assistant, based on an empirical assessment of the type of ASR errors exhibited (e.g. acoustic, language, semantic). For the second activity, participants will conduct an evaluation of the quality of the synthetic speech output typically employed in mobile-based speech interfaces, and propose alternate evaluation methods that better reflect the mobile user experience. NEW ACTIVITY: The third activity will center around uncovering speech processing errors of a home-based personal assistant and designing interactions that maintain a positive user experience in the face of unexpected variations in speech processing accuracy.