Title: Wired for Speech:
1- Wired for Speech
- How Voice Activates and Advances the
Human-Computer Relationship - Clifford Nass
- Stanford University
2Speaking is Fundamental
- Fundamental means of human communication
- Everyone speaks
- IQs as low as 50
- Brains as small as 400 grams
- Humans are built for words
- Learn new word every two hours for 11 years
3Listening to Speech is Fundamental
- Womb Mothers voice differentiation
- One day old Differentiate speech vs. other
sounds - Responses
- Brain hemispheres
- Four day olds Differentiate native language vs.
other languages - Adults
- Phoneme differentiation at 40-50 phonemes per
second - Cope with cocktail parties
4Listening Beyond Speech is Fundamental
- Humans are acutely aware of para-linguistic cues
- Gender
- Personality
- Accent
- Emotion
- Identity
5Humans are Wired for Speech
- Special parts of the brain devoted to
- Speech recognition
- Speech production
- Para-linguistic processing
- Voice recognition and discrimination
6Therefore
- Voice interface should be the most
- Enjoyable,
- Efficient,
- Memorable
- method for providing and acquiring information
-
-
7Are They? No!Why Not?
- Machines are different than humans
- Technology is insufficient
- But are these good reasons?
8Critical Insights
- Voice Human
- Technology Voice Human Voice
- Human-Technology Interaction
- Human-Human Interaction
9Wheres the Leverage?
- Social sciences can give us
- Whats important
- Whats unimportant
- Understanding
- Methods
- Unanswered questions
10Male or Female Voice?
- Is gender important?
- Can technology have gender?
11The Case of BMW
12Brains are Built to Detect Voice Gender
- First human category
- Infants at six months
- Self-identification by 2-3 years old
- Within seconds for adults
- Multiple ways to recognize gender in voice
- Pitch
- Pitch range
- Variety of other spectral characteristics
13Once Person Identifies Gender by Voice
- Guides every interaction
- Same-gender favoritism
- Trust
- Comfort
- Gender stereotyping
14Gender and Products
- Gender should match product
- More appropriate
- More credible
- Mutual influence of voice and product gender
- Female voices feminize products (and conversely)
- Female products feminize voices (and conversely)
- Match principle
15Research Context
- Gender of voice (synthetic)
- Gender of user
- Gender of product
- E-Commerce website
16Examples of Advertisements
- Female voice female product
- Male voice female product
- Male voice male product
17Appropriateness of the Voice
18Voice/Product Gender Influences
- Female voices feminize productsMale voices
masculinize products - Strongest for opposite gender products
- Female products feminize voicesMale products
maculinize voices - Strong preference when voice matches product
19Results for User Gender
- People trust voices that match themselves
- Females conform more with female voices
- Males conform more with male voices
- People like voices that match themselves
- Females like the female voice more
- Males like the male voice more
20Other Results
- Participants denied stereotyping technology
- Participants denied harboring stereotypes!
21People stereotype voices by gender
- Voice gender should match content gender
- Product descriptions
- Teaching
- Praise
- Jokes
22Gender is Marked by Word Choice
- Female speech
- More I, you, she, her, their, myself
- Less the, that, these, one, two, some
more - More compliments
- More apologies
- More relationships between things
- Less description of particular things
- They for living things only
- Voices should speak consistently with their
gender
23Selecting Voices
- Voices manifest many traits
- Gender
- Personality
- Age
- Ethnicity
- Voice traits should match content traits
- Content
- Language style
- Appearance (e.g., accent and race)
- Context
- Voice traits should match user traits
24If Only One Voice
- Consider stereotypes
- Masculine vs. feminine (same voice)
- Boost high frequencies (feminine)
- Boost low frequencies (masculine)
25Emotions
26Emotion and Voice
- Voice is the first indicator of emotion
- Voice emotion has many markers
- Pitch
- Value
- Range
- Change rate
- Amplitude
- Value
- Range
- Change rate
- Words per minute
27Emotion is always relevant
- User has initial emotion
- Interactions create emotions
- Voice is particularly powerful
- Frustration is particularly powerful
28Emotion and Technology
- Could technology-based voices exhibit emotion?
- Could technology-based voice emotion influence
people?
29Research Context
- Create upset or happy drivers
- Have them drive for 15 minutes
- Female voice gives information and makes
suggestions - Upbeat
- Subdued
30Number of Accidents
31Results
- People speak to car much more when emotion is
consistent - People like car much more when emotion is
consistent
32Implications
- User emotion is a critical part of any
interaction - Emotion must match content
- Perception of voice
- Trust
- Intelligence
- User
- Performance
- Comfort
- Enjoyment
33One Voice Emotion Select for Goal
- Overall liking
- Slightly happy voice
- Attention-getting
- Anger
- Sadness
- Trust and vulnerability
- Sadness (mild)
34If You Cant Manipulate Voice Emotion
- Manipulate content
- Manipulate music
35Using the First Person Should IT say I
36Should Voice Interfaces say I?
- When should a voice interface say I?
- Does synthetic vs. recorded speech affect the
answer to the previous question?
37The Importance of I
- I is the most basic claim to humanity
- I think, therefore I am
- I, Robot
- Dobby and monsters dont say I
- I is the marker of responsibility
- I made a mistake vs.Mistakes were made
38Research Context
- Auction site
- Telephone interface with speech recognition
- Recorded bidding behavior
- Online questionnaire
39Average Bidding Price
40Results
- When IRecorded or No ISynthetic
- System is higher quality
- Users were much more relaxed
- No I is more objective
- I is more present
41Results
- I is right for embodiments
- Robots
- Characters
- Autonomous intelligence (KITT)
- I is wrong when voice is second fiddle to
technology - Traditional car
- Heavily-branded products
42Design
- Text-to-Speech is a machine voice
- Recorded speech is a human voice
- Design questions are
- Not philosophical questions
- Not judgment questions
- Experimentally verifiable
43Mistakes are Tough to Talk About
44Who is Responsible for Errors?
- Recognition is not perfect
- When system fails, who should be assigned
responsibility? - System
- User
- No one
45Responding to Errors
- Modesty
- Likable
- Unintelligent (people believe modesty!)
- Criticism
- Isnt really constructive
- Unpleasant
- Intelligent
- Scapegoating
- Effective
- Safe
46System Responses to Errors
- System blame (most common)
- No blame
- User blame
47Research context
- Amazon-by-phone
- Numerous planned interaction errors
48Book Buying
49Results
- Neutral and system blame
- Sell much better than user blame
- Neutral blame
- Easier to use than system blame
- Nicer than system blame
- User blame is most intelligent!
- System blame is least intelligent
50Results for Errors
- Take responsibility when unavoidable
- Increases trust
- Increases liking
- Weak negative effect on intelligence
- Ignore errors whenever possible
- Duck responsibility to third party if needed
- Blame the phone line
- Blame the road
51Results for Errors
- Show commitment to the interaction
- Make guesses
- Show concern
- Griceian maxims
- Quantity
- Relevance
- Clarity
52Design
- Error recovery is critically important
- Negative experiences are more memorable
- Adaptation is crucially important
- Flattery is effective
- Note times when interaction is successful
- Design to avoid errors
- Alignment (good repetition)
- Air quotes
- Scripting is important at all stages of the
interaction
53Other Key Findings
- Personality
- Accents
- Multiple voices and mixing voices
- Input vs. output modality
- Microphone type
54Tying it All Together
- Voice interfaces can be the most enjoyable,
efficient, and memorable method for acquiring and
providing information - Voice interfaces turn up the volume knob in user
responses - The key is leveraging social aspects of speech
55Summary Part 1
- Humans are wired for speech
- Interactions with voice interfaces are
fundamentally social - Same social rules
- Same social expectations
56Summary Part 2
- Social aspects of voice interfaces can be
beneficial - Users perform better
- Users feel better
- Users understand better
- Social aspects of voice interfaces cannot be
ignored - Social audit is critical
- Social design is critical
- Design psychology can be leveraged
- Less expensive than technology
- More effective than technology
- Broader impact than technology