Speech-based interaction

Conversational Voice User Interfaces: Connecting Engineering Fundamentals to Design Considerations

HCI research has for long been dedicated to better and more naturally facilitating information transfer between humans and machines. Unfortunately, humans' most natural form of communication, speech, is also one of the most difficult modalities to be understood by machines – despite, and perhaps, because it is the highest-bandwidth communication channel we possess. While significant research efforts, from engineering, to linguistic, and to cognitive sciences, have been spent on improving machines' ability to understand speech, the CHI community (and the HCI field at large) has only recently started embracing this modality as a central focus of research. This can be attributed in part to the unexpected variations in error rates when processing speech, in contrast with often-unfounded claims of success from industry, but also to the intrinsic difficulty of designing and especially evaluating speech and natural language interfaces. As such, the development of interactive speech-based systems is mostly driven by engineering efforts to improve such systems with respect to largely arbitrary performance metrics. Such developments have often been void of any user-centered design principles or consideration for usability or usefulness in the same ways as graphical user interfaces have benefited from heuristic design guidelines.

CHI 2021 Course

Please join us at CHI 2021 if you want to learn how speech recognition and synthesis works, what are its limitations and usability challenges, how can it enhance interaction paradigms, and what is the current research and commercial state-of-the-art.

The course will be beneficial to all HCI researchers or practitioners without a strong expertise in ASR or TTS, who still believe in fulfilling HCI's goal of developing methods and systems that allow humans to naturally interact with the ever increasingly ubiquitous mobile technology, but are disappointed with the lack of success in using speech and natural language to achieve this goal. During the past CHI instances of this course many of the attendees were from the speech processing community, especially from relevant industries attending the course (and CHI) due to their interests in learning about how to incorporate their own engineering advances into better user-centred designs.

Through this, we hope that HCI researchers and practitioners will learn how to combine recent advances in speech processing with user-centred principles in designing more usable and useful speech-based interactive systems.