Audio, Video and Multimodal Person Tracking for Meetings - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

Audio, Video and Multimodal Person Tracking for Meetings

Description:

Audio, Video and Multimodal Person Tracking for Meetings. Alessio Brutti, Roberto Brunelli, Oswald Lanz. FBK-irst, Trento, Italy. Audio Person Tracking ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 9
Provided by: bru67
Category:

less

Transcript and Presenter's Notes

Title: Audio, Video and Multimodal Person Tracking for Meetings


1
Audio, Video and Multimodal Person Tracking for
Meetings
  • Alessio Brutti, Roberto Brunelli, Oswald Lanz
  • FBK-irst, Trento, Italy

2
Audio Person Tracking
  • Based on Global Coherence Field
  • 2D sound maps
  • Spatial resolution 2x2 cm
  • Time step 0.09s
  • Speech Activity Detection ? Acoustic Event
    Detection (speech)
  • Tracking smoothing of single GCF peaks
  • Adaptive smoothing filter
  • Reset and move when speaker changes
  • Data
  • Each horizontal microphone pair in T-shaped
    arrays
  • 8 NIST-MarkIII channels only for IRST and UPC
    meetings.

3
Audio Person Tracking results
2007 Evaluation results
2006 vs 2007 results (no silence)
Effect of SAD and smoothing
4
Features of the VisualTracker
  • Sequential Bayesian estimator
  • 5D state space (pos, vel, standing/sitting)
  • linear dynamical model
  • generative likelihood based on colour
    histogram match
  • particle filter implementation

Hybrid multi-target approach efficient
propagation of marginals using joint visual
likelihood
  • Multi-view fusion and output
  • product of single-view likelihoods
  • outputs posterior mean

5
Visual Tracker model acquisition
  • Multi-view detection on hot spots in the
    panoramic ceiling camera view
  • generate hypotheses that carry different
    positions and heights
  • score them using 3D shape model and contour
    likelihood
  • accepted detection if likelihood is above a
    threshold
  • Parameters to be stored target height body
    part histograms for front, side and back view

6
Video tracker results
  • Occlusions were not a major concern for tracking
  • Detection very challenging, need further works
    on
  • model adaptation
  • non-instantaneous acquisition.
  • Full integration of ceiling camera ? flexible
    management of target model

7
Audio-Video Tracking
  • Fusion of single modality outputs
  • It takes the visual track closest to the acoustic
    track
  • It suffers the weaknesses of both systems
  • No time to devise a real multi-modal tracker.

8
  • Thank you, questions?
Write a Comment
User Comments (0)
About PowerShow.com