Binaural Sonification of Disparity Maps - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Binaural Sonification of Disparity Maps

Description:

Images from a stereo camera pair will be used to detect objects in the scene and ... Stereo image acquisition. Disparity map estimation. Disparity map ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 16
Provided by: FAC98
Category:

less

Transcript and Presenter's Notes

Title: Binaural Sonification of Disparity Maps


1
Binaural Sonificationof Disparity Maps
  • Alfonso Alba, Carlos Zubieta, Edgar Arce
  • Facultad de Ciencias
  • Universidad Autónoma de San Luis Potosí

2
Contents
  • Project description
  • Estimation of disparity maps
  • Segmentation of disparity maps
  • Object sonification
  • Test application
  • Preliminary results
  • Future work

3
Project description
  • The goal of this project is to develop a scene
    sonification system for the visually impaired.
  • Images from a stereo camera pair will be used to
    detect objects in the scene and estimate the
    distance between them and the subject.
  • A binaural audio signal will be synthesized for
    each object, so that the subject can hear the
    objects in the scene in their corresponding
    locations.

4
Scene sonification system
  • The system will consist of the following stages
  • Stereo image acquisition
  • Disparity map estimation
  • Disparity map segmentation (object detection)
  • Binaural sonification of objects in the scene
  • Here we will focus only on the segmentation of a
    given disparity map, and sonification stages.

5
Estimation of disparity maps
  • Images from a pair of cameras, separated by a
    certain distance, form a stereo image pair.
  • The position of a certain object in one of the
    images will be shifted in the other image by an
    amount inversely proportional to the distance
    between the object and the camera arrangement.
  • This displacement is called disparity, and can be
    computed for each pixel to form a disparity map.
  • We are currently working on a technique to
    compute disparity maps in realtime.

6
Segmentation of disparity maps
  • Given a disparity map D(x,y), we perform a seeded
    region-growing segmentation to detect the objects
    in the scene.
  • To choose the seeds, the algorithm uses a fitness
    measure given bywhere N(x,y) is the set of
    nearest-neighbors of (x,y), and q is a quality
    parameter (increases robustness to noise).
  • This measure favors homogeneous regions (low
    dq)with the highest disparity (nearest objects).

7
Region-growing algorithm
  • The algorithm performs the following steps
  • Choose the pixel (x,y) with highest fitness,
    add it to a queue Q, and label it with a new
    region label k.
  • Let rk D(x,y) be the representative value for
    region k.
  • While Q is not empty, do
  • Pull the first pixel (x,y) in Q
  • For each unlabeled neighbor (x,y) of (x,y) such
    that rk - D(x,y) lt e, add (x,y) to region
    k and to the queue Q.
  • Recompute rk as the average disparity of the
    pixels in region k.

8
Object sonification
  • Sound coming from a specific location will suffer
    a series of degradations before it reaches our
    ears.
  • These degradations provide various cues that our
    brain uses to locate the sound sorce.
  • Binaural spatialization attempts to model these
    cues, in order to allow the listener to hear a
    sound as if it were coming from a specific point
    in space, which is typically defined in spherical
    coordinates (see below).

9
Object sonification
  • We represent each object in the scene with a
    ping-like sound whose frequency depends on the
    disparity, so that the sound becomes more
    alerting as the object becomes closer.
  • The audio signal corresponding to each object is
    fed through a binaural spatialization system
    whose parameters depend on the objects
    position.
  • Spatialization is performed by modeling azimuth
    and range cues. Elevation cues have not been
    implemented (yet).

10
Azimuth cues
  • Inter-aural Time Difference
  • The sound source is delayed by a different amount
    for each ear
  • Tn a a sin(q), Tf a at.
  • Inter-aural Level Difference (head-shadow)
  • The sound is attenuated when passing through the
    head.
  • This cue can be modeled with a one-pole one-zero
    filter

Brown et al., 1998
11
Range cues
  • Artificial Reverberation
  • Reverberation is the result of a large number of
    echoes originated from the reflection of the
    sound in flat surfaces such as walls.
  • The level of reverberation is roughly constant
    and independent of source location.
  • We use a simple model composed of 4 parallel
    delay lines with feedback.
  • Attenuation
  • The audio signal is attenuated according to the
    inverse quadratic law.
  • The ratio between the signal and reverberation
    levels provides an additional cue for range.

12
Test application
  • We simulate a moving scene by taking a 160 x 100
    sub-frame of a precomputed disparity map.
  • The 10 most relevant objects are segmented but
    only objects that are near enough are sonified.

13
Preliminary results
  • Fast segmentation times
  • 5 ms per 160 x 100 frame in a 2.4 GHz dual core
    CPU
  • Over 100 frames per second including sonification
    stage (but without disparity map estimation)
  • Embedded implementation is viable
  • Good azimuth representation object direction is
    easily perceived.
  • Object range is perceived in a relative manner
    (e.g., one object is nearer than another), but
    not in an absolute way.
  • Between 3 and 5 objects can be sonified before
    too much clutter is heard.

14
Future work
  • Camera setup and calibration
  • Realtime estimation of disparity maps
  • Elevation cues in binaural spatialization
  • Optimization of sonification system
  • Implementation in an embedded device

15
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com