Designing Speech and Language Interactions for Mobiles and Wearables

Goals

Our workshop aims to develop speech, audio, and multimodal interaction as a well-established area of study within HCI, aiming to leverage current engineering advances in ASR, NLP, TTS, multimodal, gesture recognition, or brain-computer interfaces (BCI). In return, advances in HCI can contribute to creating processing algorithms that are informed by and better address the usability challenges of such interfaces.

We also aim to increase the cohesion between research currently dispersed across many areas including HCI, wearable and mobile design, ASR, NLP, BCI complementing speech, EMG interaction, eye-gaze input, or gesture processing. Our hope is to energize the HCI and engineering communities to push the boundaries of what is possible with wearable, mobile, social robots, smart homes, smart assistants, or pervasive computing, but also make advances in each of the respective communities. As an example, the recent significant breakthroughs in deep neural networks is largely confined to audio-only features, while there is a significant opportunity to incorporate into this framework other features and context (such as multimodal input for wearables). We anticipate this can only be accomplished by closer collaboration between the speech and the HCI communities.

Our ultimate goal is to cross pollinate ideas from the activities and priorities of different disciplines. With its unique format and reach, this workshop offers the opportunity to strengthen future approaches and unify practices moving forward. The HCI community at large can be a host to researchers from other disciplines with the goal of advancing multimodal interaction design for wearable, mobile, and pervasive computing. The organizing committee for this workshop (the authors list) is living proof that such inter-disciplinary collaborations are possible.

Topics

We are proposing to build upon the discussions started during our lively-debated and highly-engaging panel on speech interaction that was held at CHI 2013, which was followed by successful workshops (20+ participants each) on speech and language interaction, held at CHI 2014, 2016, and 2017. We aim to broaden the domain of this community-building activities to all types of interfaces for which voice communications may be suitable: desktops, mobiles, wearables, personal assistant robots, smart home devices. We propose several topics for discussions and activity:

  • What are the important challenges in using speech as a “mainstream” modality? Speech is increasingly available – can we characterize which applications speech is suitable or beneficial for?

  • What interaction opportunities are presented by the rapidly evolving mobile and wearable research areas?

  • Can speech and multimodal increase usability and robustness of interfaces and improve user experience beyond input/output?

  • What can the MobileHCI community learn from Automatic Speech Recognition (ASR), Text-to-Speech Synthesis (TTS), and Natural Language Processing (NLP) research, and in turn, how can it help these communities improve the user-acceptance of such technologies? How can work in context and discourse understanding or dialogue management shape research in speech and multimodal UI? And can we bridge the divide between the evaluation methods used in HCI and the AI-like batch evaluations used in speech processing?

  • How can UI designers make better use of the acoustic-prosodic information in speech, such as emotion recognition or identifying users' cognitive states? How can this be translated into the design of empathic voice interfaces?

  • What are the usability challenges of synthetic speech? How can expressiveness and naturalness be incorporated into interface design guidelines, particularly in mobile or wearable contexts where text-to-speech could potentially play a significant role in users' experiences? And how can this be generalized to designing usable UIs for mobile and pervasive (in-car, in-home) applications that rely on multimedia response generation?

  • What are the opportunities and challenges for speech and multimodal interaction with regards to spontaneous access to information afforded by wearable and mobile devices? And what are the privacy concerns, e.g. with using such modalities in a secure and personal manner?

  • Are there particular challenges when interacting with emerging devices such as smart home / ambient personal assistants / conversational interfaces?

  • What are the implications for the design of speech and multimodal interaction presented by new contexts for wearable use, including hands-busy, cognitively demanding situations, or unintentional use (in the case of body-worn sensors)?