Incremental Acquisition and Reuse of Multimodal Affective Behaviors in a Conversational Agent

 

We explore a way to elicit and evaluate affective behavior using crowdsourcing. We show that untrained crowd workers are able to author content for a broad variety of target affect states when given semi-situated narratives as prompts.

December 16, 2018
International Conference on Human-Agent Interaction 2018

 

Authors

Maike Paetzel (Disney Research/Uppsala)

James Kennedy (Disney Research)

Ginevra Castellano (Uppsala University)

Jill Lehman (Disney Research)

Incremental Acquisition and Reuse of Multimodal Affective Behaviors in a Conversational Agent

Abstract

To feel novel and engaging over time it is critical for an autonomous agent to have a large corpus of potential responses. As the size and multi-domain nature of the corpus grows, however, traditional hand-authoring of dialogue content is no longer practical. While crowdsourcing can help to overcome the problem of scale, a diverse set of authors contributing independently to an agent’s language can also introduce inconsistencies in expressed behavior. In terms of affect or mood, for example, incremental authoring can result in an agent who reacts calmly at one moment but impatiently moments later with no clear reason for the transition. In contrast, affect in natural conversation develops over time based on both the agent’s personality and contextual triggers. To better achieve this dynamic, an autonomous agent needs to (a) have content and behavior available for different desired affective states and (b) be able to predict what affective state will be perceived by a person for a given behavior. In this proof-of-concept paper, we explore a way to elicit and evaluate affective behavior using crowdsourcing. We show that untrained crowd workers are able to author content for a broad variety of target affect states when given semi-situated narratives as prompts. We also demonstrate that it is possible to strategically combine multimodal affective behavior and voice content from the authored pieces using a predictive model of how the expressed behavior will be perceived.

Copyright Notice