Speech User Interfaces - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Speech User Interfaces

Description:

– PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 33
Provided by: drew75
Category:

less

Transcript and Presenter's Notes

Title: Speech User Interfaces


1
Speech User Interfaces
  • CS 160, Fall 2000
  • Professor James Landay
  • October 20, 2000

2
Hall of Fame or Hall of Shame?
  • frys.com
  • Courtesy of Billy Chen

3
Hall of Shame
  • Does not follow blue links pattern
  • Navigation separate from content
  • no links on right
  • Why is this about Frys ISP?
  • Im looking for a store!

4
Speech User Interfaces
  • CS 160, Fall 2000
  • Professor James Landay
  • October 20, 2000

5
Outline
  • Review
  • Motivation for speech UIs
  • Speech recognition
  • UI problems with speech UIs
  • SpeechActs Guidelines for speech UIs
  • Announcements
  • Speech UI design tools
  • Multimodal UIs

6
Review
  • GOMS
  • doesnt tell you everything you want to know
    about a UI
  • only gives performance for expert behavior
  • hard to create model, but still easier than user
    testing
  • Automated usability ?
  • faster than traditional techniques
  • can involve more participants - convincing data
  • easier to do comparisons across sites
  • tradeoff with losing observational data

7
Motivation for Speech UIsPervasive Information
Access
8
UIs in the Pervasive Computing Era
  • Future computing devices wont have the same UI
    as current PCs
  • wide range of devices
  • small or embedded in environment
  • often w/ alternative I/O w/o screens
  • information appliances

9
Information Access via Speech
10
Speech UI Motivation
  • Smaller devices - difficult I/O
  • people can talk at 90 wpm - high speed
  • Virtually unlimited set of commands
  • Freedom for other body parts
  • Natural
  • evolutionarily selected for
  • reading, writing, typing are not

11
Speech Recognition
  • Continuous vs. non-continuous
  • Speaker independent vs. dependent
  • Speech often misunderstood by people
  • feedback via speech, facial expressions,
    gesture
  • Recognizers trained with real samples
  • often get gender-based problems
  • Based on probabilities (HMMs - Bayes)
  • trigrams of sounds or words
  • Several popular recognizers
  • Nuance, SpeechWorks, IBM ViaVoice, Dragon

12
Speech Production
  • Three frequency regions of great intensity
    visible on oscilloscope
  • come from larynx, throat, mouth
  • Two needed for recognition but tinny
  • Can generate emotion affect in speech
  • Demo
  • anger, disgust, gladness, sadness, fear,
    surprise http//cahn.www.media.mit.edu/people/cahn
    /emot-speech.html

13
Recognition Problems
  • Poor recognition
  • humans
  • top recognition systems get 5-10 error rates
  • computers dont use much context
  • Background noise
  • even worse recognition rates (20-40 error)
  • Slow
  • simple matter of hardware getting faster

14
More Recognition Problems
  • Isolated, short words difficult
  • common words become short
  • Segmentation
  • silly versus sill lea
  • Spelling
  • mail vs. male

15
Speech UI Problems
  • Speech UI no-nos
  • modes (no feedback)
  • deep hierarchies (aka voice mail hell)
  • Verbose feedback wastes time/patience
  • only confirm consequential things
  • use meaningful, short cues
  • Interruption
  • half-duplex communication (i.e., no barge-in
    support)
  • Too much speech on the part of the user is tiring
  • Speech takes up space in working memory
  • can cause problems when problem solving

16
SpeechActs Guidelines for Speech UIs
  • Speech interface to computer tools
  • email, calendar, weather, stock quotes
  • Establish common ground shared context
  • make sure people know where they are in the
    conversation
  • Pacing
  • recog. delays are unnatural, make it clear when
    this occurs
  • barge-in lets user interrupt like in real
    conversations
  • tapering of prompts
  • progressive assistance short errors messages at
    first, longer when user needs more help
  • implicit confirmation include confirm in next
    command

17
SpeechActs Video
18
Announcements
  • Web page on using reporting in active desktop up
    this afternoon
  • weekly reports are required
  • for YOUR benefit
  • Interactive prototype due Wed.
  • presentation and report info now online
  • 6 minutes each presenter
  • Questions?

19
SUEDELow-fi Prototyping for Speech-based UIs
  • Built-in iterative design
  • design test analysis
  • fast - no real recognition
  • Support design practice
  • example scripts
  • Wizard of Oz (WoZ)
  • Handle needs of real UIs
  • error simulation

20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
design area
24
(No Transcript)
25
(No Transcript)
26
SUEDE Summary
  • Speech is an important mode for info access in
    the field
  • SUEDE supports speech-based UI design
  • moving from concrete examples to abstractions
  • embeds iterative design w/ design-test-analyze
  • Designers using SUEDE need not be experts in
    speech recognition technology

27
Future UIs for Information Access
  • Star Trek style UI
  • verbally ask the computer for info or services
  • may be common in mobile/hands-free situations
  • hard to get to work well since it requires
    perfect speech recognition unambiguous language
    understanding

28
Future UIs for Information Access
  • Star Trek style UI
  • verbally ask the computer for info or services
  • may be common in mobile/hands-free situations
  • hard to get to work well since it requires
    perfect speech recognition unambiguous language
    understanding
  • Put-that-there style UI Bolt, et. al. 80
  • user says retrieve something like this while
    pointing
  • combines speech w/ gesture to disambiguate
    (multimodal)

29
Multimodal Error Correction
  • Dictation error correction study
  • found users are better at correcting recognition
    errors with a different input modality
  • recognizer got it wrong the first time - it will
    get it wrong the second time
  • hyperarticulating aggravates
  • Correct dictation errors with
  • vocal spelling, writing, typing, etc

30
A Better Future Our Information Access will be
via Multimodal UIs
  • Benefits
  • take advantage of more than 1 mode of
    input/output
  • computers could be used in more situations
    places
  • UIs easier and useful to more people
  • Obstacles
  • building multimodal UIs is hard
  • often require immature recognition technology
  • hard to combine recognition technologies
  • programming expertise required to do design
  • this was the state of GUIs in 1980

31
Summary
  • Speech UIs
  • may permit more natural computer access
  • allow us to use computers in more situations
  • are hard to get to work well
  • recognition problems, etc.
  • UI tools are needed for speech UI design
  • Multimodal UIs address some of the problems with
    pure speech UIs

32
Next Time
  • Presentations
  • attendance required
  • Slides to Francis by 11 PM on Tuesday
Write a Comment
User Comments (0)
About PowerShow.com