Wired for Speech: - PowerPoint PPT Presentation

About This Presentation
Title:

Wired for Speech:

Description:

How Voice Activates and Advances the Human-Computer Relationship Clifford Nass Stanford University – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 57
Provided by: Departme200
Category:

less

Transcript and Presenter's Notes

Title: Wired for Speech:


1
  • Wired for Speech
  • How Voice Activates and Advances the
    Human-Computer Relationship
  • Clifford Nass
  • Stanford University

2
Speaking is Fundamental
  • Fundamental means of human communication
  • Everyone speaks
  • IQs as low as 50
  • Brains as small as 400 grams
  • Humans are built for words
  • Learn new word every two hours for 11 years

3
Listening to Speech is Fundamental
  • Womb Mothers voice differentiation
  • One day old Differentiate speech vs. other
    sounds
  • Responses
  • Brain hemispheres
  • Four day olds Differentiate native language vs.
    other languages
  • Adults
  • Phoneme differentiation at 40-50 phonemes per
    second
  • Cope with cocktail parties

4
Listening Beyond Speech is Fundamental
  • Humans are acutely aware of para-linguistic cues
  • Gender
  • Personality
  • Accent
  • Emotion
  • Identity

5
Humans are Wired for Speech
  • Special parts of the brain devoted to
  • Speech recognition
  • Speech production
  • Para-linguistic processing
  • Voice recognition and discrimination

6
Therefore
  • Voice interface should be the most
  • Enjoyable,
  • Efficient,
  • Memorable
  • method for providing and acquiring information

7
Are They? No!Why Not?
  • Machines are different than humans
  • Technology is insufficient
  • But are these good reasons?

8
Critical Insights
  • Voice Human
  • Technology Voice Human Voice
  • Human-Technology Interaction
  • Human-Human Interaction

9
Wheres the Leverage?
  • Social sciences can give us
  • Whats important
  • Whats unimportant
  • Understanding
  • Methods
  • Unanswered questions

10
Male or Female Voice?
  • Is gender important?
  • Can technology have gender?

11
The Case of BMW
12
Brains are Built to Detect Voice Gender
  • First human category
  • Infants at six months
  • Self-identification by 2-3 years old
  • Within seconds for adults
  • Multiple ways to recognize gender in voice
  • Pitch
  • Pitch range
  • Variety of other spectral characteristics

13
Once Person Identifies Gender by Voice
  • Guides every interaction
  • Same-gender favoritism
  • Trust
  • Comfort
  • Gender stereotyping

14
Gender and Products
  • Gender should match product
  • More appropriate
  • More credible
  • Mutual influence of voice and product gender
  • Female voices feminize products (and conversely)
  • Female products feminize voices (and conversely)
  • Match principle

15
Research Context
  • Gender of voice (synthetic)
  • Gender of user
  • Gender of product
  • E-Commerce website

16
Examples of Advertisements
  • Female voice female product
  • Male voice female product
  • Male voice male product

17
Appropriateness of the Voice
18
Voice/Product Gender Influences
  • Female voices feminize productsMale voices
    masculinize products
  • Strongest for opposite gender products
  • Female products feminize voicesMale products
    maculinize voices
  • Strong preference when voice matches product

19
Results for User Gender
  • People trust voices that match themselves
  • Females conform more with female voices
  • Males conform more with male voices
  • People like voices that match themselves
  • Females like the female voice more
  • Males like the male voice more

20
Other Results
  • Participants denied stereotyping technology
  • Participants denied harboring stereotypes!

21
People stereotype voices by gender
  • Voice gender should match content gender
  • Product descriptions
  • Teaching
  • Praise
  • Jokes

22
Gender is Marked by Word Choice
  • Female speech
  • More I, you, she, her, their, myself
  • Less the, that, these, one, two, some
    more
  • More compliments
  • More apologies
  • More relationships between things
  • Less description of particular things
  • They for living things only
  • Voices should speak consistently with their
    gender

23
Selecting Voices
  • Voices manifest many traits
  • Gender
  • Personality
  • Age
  • Ethnicity
  • Voice traits should match content traits
  • Content
  • Language style
  • Appearance (e.g., accent and race)
  • Context
  • Voice traits should match user traits

24
If Only One Voice
  • Consider stereotypes
  • Masculine vs. feminine (same voice)
  • Boost high frequencies (feminine)
  • Boost low frequencies (masculine)

25
Emotions
26
Emotion and Voice
  • Voice is the first indicator of emotion
  • Voice emotion has many markers
  • Pitch
  • Value
  • Range
  • Change rate
  • Amplitude
  • Value
  • Range
  • Change rate
  • Words per minute

27
Emotion is always relevant
  • User has initial emotion
  • Interactions create emotions
  • Voice is particularly powerful
  • Frustration is particularly powerful

28
Emotion and Technology
  • Could technology-based voices exhibit emotion?
  • Could technology-based voice emotion influence
    people?

29
Research Context
  • Create upset or happy drivers
  • Have them drive for 15 minutes
  • Female voice gives information and makes
    suggestions
  • Upbeat
  • Subdued

30
Number of Accidents
31
Results
  • People speak to car much more when emotion is
    consistent
  • People like car much more when emotion is
    consistent

32
Implications
  • User emotion is a critical part of any
    interaction
  • Emotion must match content
  • Perception of voice
  • Trust
  • Intelligence
  • User
  • Performance
  • Comfort
  • Enjoyment

33
One Voice Emotion Select for Goal
  • Overall liking
  • Slightly happy voice
  • Attention-getting
  • Anger
  • Sadness
  • Trust and vulnerability
  • Sadness (mild)

34
If You Cant Manipulate Voice Emotion
  • Manipulate content
  • Manipulate music

35
Using the First Person Should IT say I
36
Should Voice Interfaces say I?
  • When should a voice interface say I?
  • Does synthetic vs. recorded speech affect the
    answer to the previous question?

37
The Importance of I
  • I is the most basic claim to humanity
  • I think, therefore I am
  • I, Robot
  • Dobby and monsters dont say I
  • I is the marker of responsibility
  • I made a mistake vs.Mistakes were made

38
Research Context
  • Auction site
  • Telephone interface with speech recognition
  • Recorded bidding behavior
  • Online questionnaire

39
Average Bidding Price
40
Results
  • When IRecorded or No ISynthetic
  • System is higher quality
  • Users were much more relaxed
  • No I is more objective
  • I is more present

41
Results
  • I is right for embodiments
  • Robots
  • Characters
  • Autonomous intelligence (KITT)
  • I is wrong when voice is second fiddle to
    technology
  • Traditional car
  • Heavily-branded products

42
Design
  • Text-to-Speech is a machine voice
  • Recorded speech is a human voice
  • Design questions are
  • Not philosophical questions
  • Not judgment questions
  • Experimentally verifiable

43
Mistakes are Tough to Talk About
44
Who is Responsible for Errors?
  • Recognition is not perfect
  • When system fails, who should be assigned
    responsibility?
  • System
  • User
  • No one

45
Responding to Errors
  • Modesty
  • Likable
  • Unintelligent (people believe modesty!)
  • Criticism
  • Isnt really constructive
  • Unpleasant
  • Intelligent
  • Scapegoating
  • Effective
  • Safe

46
System Responses to Errors
  • System blame (most common)
  • No blame
  • User blame

47
Research context
  • Amazon-by-phone
  • Numerous planned interaction errors

48
Book Buying
49
Results
  • Neutral and system blame
  • Sell much better than user blame
  • Neutral blame
  • Easier to use than system blame
  • Nicer than system blame
  • User blame is most intelligent!
  • System blame is least intelligent

50
Results for Errors
  • Take responsibility when unavoidable
  • Increases trust
  • Increases liking
  • Weak negative effect on intelligence
  • Ignore errors whenever possible
  • Duck responsibility to third party if needed
  • Blame the phone line
  • Blame the road

51
Results for Errors
  • Show commitment to the interaction
  • Make guesses
  • Show concern
  • Griceian maxims
  • Quantity
  • Relevance
  • Clarity

52
Design
  • Error recovery is critically important
  • Negative experiences are more memorable
  • Adaptation is crucially important
  • Flattery is effective
  • Note times when interaction is successful
  • Design to avoid errors
  • Alignment (good repetition)
  • Air quotes
  • Scripting is important at all stages of the
    interaction

53
Other Key Findings
  • Personality
  • Accents
  • Multiple voices and mixing voices
  • Input vs. output modality
  • Microphone type

54
Tying it All Together
  • Voice interfaces can be the most enjoyable,
    efficient, and memorable method for acquiring and
    providing information
  • Voice interfaces turn up the volume knob in user
    responses
  • The key is leveraging social aspects of speech

55
Summary Part 1
  • Humans are wired for speech
  • Interactions with voice interfaces are
    fundamentally social
  • Same social rules
  • Same social expectations

56
Summary Part 2
  • Social aspects of voice interfaces can be
    beneficial
  • Users perform better
  • Users feel better
  • Users understand better
  • Social aspects of voice interfaces cannot be
    ignored
  • Social audit is critical
  • Social design is critical
  • Design psychology can be leveraged
  • Less expensive than technology
  • More effective than technology
  • Broader impact than technology
Write a Comment
User Comments (0)
About PowerShow.com