Ryan Kilgore Mark Chignell - PowerPoint PPT Presentation

About This Presentation
Title:

Ryan Kilgore Mark Chignell

Description:

Traditional methods of synchronous communication do not adequately support large ... Singer, A., Hindus, D., Stifelman, L., and S. White, (1999),.Tangible progress: ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 29
Provided by: ryanki
Learn more at: https://docbox.etsi.org
Category:

less

Transcript and Presenter's Notes

Title: Ryan Kilgore Mark Chignell


1
seeing unfamiliar voices
does visualization of spatial position enhance
voice identification?
  • Ryan KilgoreMark Chignell
  • University of Toronto IBM CAS, KMDI03 22 06

2
presentation overview
  • Voice collaboration and spatial audio
  • Visualizing audio spaces
  • Experimental methodology
  • Results
  • Discussion

Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
02 19
3
problems with voice collaboration
  • Traditional methods of synchronous communication
    do not adequately support large groups
  • Monaural audio, lack of visual feedback, and poor
    audio quality make it difficult to determine
  • Who is present?
  • Who is speaking?
  • What is being said?

03 19
Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
4
spatial audio overview

04 19
Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
5
spatial audio benefits (1 of 2)
  • Reduction in masking facilitation of auditory
    scene analysis (Bregman, 1990 Gilkey Anderson,
    1997)
  • Increased speech intelligibility in noisy
    environments (Ericson McKinley, 1997)
  • Increased speech intelligibility in multi-talker
    listening tasks (Drullman Bronkhorst, 2000
    Abouchacra, 2001 Bolia, 2001)


05 19
Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
6
spatial audio benefits (2 of 2)
  • Distinct voice locations aid in cognition of
    audio conference events (Baldis, 2001 Kilgore
    et al, 2003)
  • Significantly preferred to traditional, monaural
    voice presentation
  • Reduced perception of attention required for
    speaker identification task
  • Increased speaker identification performance,
    particularly for personalized audio spaces


06 19
Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
7
Vocal Village interface
07 19
Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
8
visualization audio spaces
  • Early Vocal Village field trials indicated users
    want GUI for monitoring and controlling audio
    space
  • Participants in audio-only field trials have
    highlighted the difficulty of knowing who was
    present in the audio space (Singer et al, 1999)
  • Visual modality can convey awareness-supporting
    information parallel to audio communication
  • Will increased awareness of voice locations aid
    listeners in learning to identify completely
    unfamiliar voices?

08 19
Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
9
visualization previous studies
  • Spatially arranged photos of speakers showed no
    performance benefits but preference (Baldis,
    2001)
  • Graphic insert w/ voice names and locations
    showed no benefit to voice identification in an
    ATC task (MacDonald, 2002)
  • HOWEVER These studies used familiar
    collaborators, or were limited to only four voices


09 19
Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
10
experiment overview
  • Determine if visual representation of voice
    locations will aid listeners in learning to
    recognize voices that are completely unfamiliar
  • Dependent variables
  • Accuracy and response time for voice
    identification task
  • Confidence in voice identification task
    performance
  • Mental workload (NASA-TLX) (Hart Staveland,
    1998)


10 19
Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
11
experiment methodology (1 of 2)
  • Modified Coordinate Response Measure (CRM)
    listening task (Bolia et al, 2000)
  • Ready call sign, go to color number now
  • 4 male, 4 female voices
  • Response to target with color, number, speakers
    name
  • 27 Participants, no voice training
  • Provided performance feedback (w/ correct answer)


11 19
Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
12
experiment methodology (2 of 2)
  • Two independent variables
  • 4 experimental blocks
  • 40 stimuli per block (160 total)


12 19
Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
13
experiment stimuli

13 19
Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
14
experiment 3 results (1 of 3)
Correct Voice Identifications by Experimental
Block

Experimental Block F3, 30 61.15, p lt
.001 Number of Voices F1, 30 68.21, p lt .001
Format F2, 30 1.39, p .27 Number
Format F lt 1
Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
15 19
15
experiment 3 results (2 of 3)
  • Removed data for low-learning participants
  • Excluded subjects that showed no improvement in
    voice identification over duration of experiment
  • 2 Mono participants removed3 Spatial
    participants removed3 SpatialVisual
    participants removed


Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
16 19
16
experiment 3 results (3 of 3)
Correct Voice Identifications (low-learning
subjects removed)

4V Format Block F lt 1 8 Format
Block F2,10 5.43, p .025
Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
17 19
17
discussion
  • Simple visual representation of voice locations
    improves the learning of completely unfamiliar
    voices in larger audio spaces (8 talkers)
  • Visualizations continue to support identification
    as voices become increasingly familiar
  • Spatial presentation of voice, coupled with
    low-cost visualization methods, may be
    particularly useful in supporting
  • Large collaborative groups
  • Groups with limited familiarity


Kilgore Chignell, Seeing Unfamiliar Voices
HFT 2006
18 19
18
current work visual scale

19 19
19
questions?
fin
20
references (1 of 2)
Abouchacra, K., (2001). Binaural Helmet
Improving speech recognition in noise with
spatialized sound. Human Factors, 43 (4),
584. Baldis, Jessica., (2001). Effects of
spatial audio on memory, comprehension, and
preference during desktop conferences.
Proceedings of the SIGCHI conference on human
factors in computing systems, Vol. 3,
166-173. Bregman, A. S., (1990). Auditory Scene
Analysis. Cambridge MIT Press. Bolia, Robert
S., W. Todd Nelson, Mark A. Ericson, and Brian D.
Simpson, (2000). A speech corpus for
multitalker communication research. J. Acoust.
Soc. Am. 107 (2) 1065-1066. Bolia, R., (2001).
Asymmetric performance in the cocktail party
effect implications for the design of Spatial
Audio Displays. Human Factors, 43 (2),
208. Drullman, Rob and Adelbert W. Bronkhorst,
(2000). Multichannel speech intelligibility and
talker recognition using monaural, binaural, and
three-dimensional auditory presentation. J.
Acoust. Soc. Am. 107(4), 2224-2235. Ericson,
M.A., and R. L. McKinley, (1997). The
intelligibility of multiple talkers separated
spatially in noise. In Binaural and Spatial
Hearing in Real and Virtual Environments,
Gilkey, Robert H. and Timothy R. Anderson Eds.,
NJ, Lawrence Erlbaum Associates, 701-724.
ref1
21
references (2 of 2)
Gilkey, Robert H. and Timothy R. Anderson Eds.,
(1997). Binaural and Spatial Hearing in Real
and Virtual Environments, New Jersey Lawrence
Erlbaum Associates. Hart, S.G., and Staveland,
L.E., (1988). Development of the NASA-TLX (Task
Load Index) results of empirical and
theoretical research. In P.A. Hancock, and N.
Meshkati (Eds.), Human Mental Workload. North
Holland Elsevier Science Publishers, 139-183.
Kilgore, Ryan M., Mark Chignell and Paul W.
Smith, (2003). Spatialized audioconferencing
what are the benefits? Proceedings of the 2003
conference of the Centre for Advanced Studies
Conference on Collaborative Research,
111-120. MacDonald, J. (2002). Intelligibility
of speech in a virtual 3-D environment. Human
Factors, 44(2), 272. Singer, A., Hindus, D.,
Stifelman, L., and S. White, (1999),.Tangible
progress less is more in Somewire audio
spaces, Proceedings of the SIGCHI conference on
human factors in computing systems, 104-111.
ref2
22
spatial audio explanation
  • Perception of relative differences between
    signals picked up by the left and right ears
  • Allows people with binaural hearing to locate
    sound sources in three-dimensional space
  • Product of multiple interaural cues IID, ITD,
    HRTFs


23
experimental interface
24
Vocal Village interface
25
experiment 4 visual stimuli (1 of 2)

26
experiment 4 visual stimuli (2 of 2)
27
experiment 4 audio stimuli

28
thanks
  • UofT Interactive Media Lab and the Vocal Village
    development team
  • My committee (Mark Chignell, Greg Jamieson, Ron
    Baecker)
  • IBM Centre for Advanced Studies, Toronto
  • Knowledge Media Design Institute

Write a Comment
User Comments (0)
About PowerShow.com