MultiModal Fusion - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

MultiModal Fusion

Description:

Work on multi-modal fusion in information streams. More info http://www.noulo.net ... [3] P. Viola and M. Jones. Robust real-time object detection. ... – PowerPoint PPT presentation

Number of Views:621
Avg rating:3.0/5.0
Slides: 26
Provided by: athan8
Category:

less

Transcript and Presenter's Notes

Title: MultiModal Fusion


1
Multi-Modal Fusion
  • Athanasios Noulas
  • Ben Kröse
  • Intelligent Systems Laboratory
  • University of Amsterdam

2
Presentation Layout
  • Introduction
  • Problem Description
  • Multi-Modal Person Model
  • Audio
  • Video
  • Fusion
  • Learning Inference
  • Probabilistic Framework
  • Learning Inference
  • Results Contributions

3
Introduction
  • My name is Athanasios Noulas
  • PhD Student in I.A.S. group
  • Work on multi-modal fusion in information streams
  • More info http//www.noulo.net

4
Problem Description
  • Focus on Audio Visual fusion
  • Scenario Speaker diarization

5
Multi-Modal Person Models
  • A person generates cues
  • Audio Modality (voice)
  • Video Modality (face)
  • Joint Audio-Visual Space (correlations)

6
Audio Modality
  • Person audio state
  • Generates audio observation vectors
  • Modeled as a Gaussian Mixture Model

7
Audio Modality
  • Person audio state
  • Evolves over time based on a person-specific
    transition table

8
Audio Modality
  • Multi-Speaker Situations
  • Assuming independence between different person
    states yields
  • However, we cannot define product factors for

9
Audio Modality
  • We use a distribution over the number of speakers
    whose parameters are learned off-line1,2

10
Video Modality
  • Person video state
  • Generates video observation vectors
  • Modeled as a binary histogram over a codebook
  • Faces detected using the VJ face detector3

11
Video Modality
  • Person video state
  • Evolves over time based on a person-specific
    transition table

12
Video Modality
  • Multi-camera, multi-person situations require
    dealing with false detections and missed face
    windows in a dummy variable W1

13
Fusion
  • Exploit the correlations between the two
    modalities
  • Extract informative features in terms of Mutual
    Information
  • Intuitively Mutual Information is a quantity that
    measures how much one random variable tells us
    about another

14
Mutual Information
  • In terms of Entropy
  • If X, Y are considered Random Variables sampled
    from a Gaussian

15
Mutual Information
  • Let X be a pixels value over 5 frames, and Y the
    corresponding Average Acoustic Energy
  • Two features, comparing regions 1 and 2 in Jt4

16
Probabilistic Fusion
  • Final Model

17
Learning Inference
  • We learn the person models directly from the test
    data with an adjusted version of the Baum-Welch
    Algorithm
  • We make off-line inference with the Viterbi
    algorithm

18
Results
19
Results
20
Results
21
Results
22
Results
23
Results
24
Contributions
  • State-of-the art results in terms of both
    accuracy and precision1,2,4,5
  • Dealing with multi-speaker recordings in a
    principled way1,2
  • Account for misdetection in the video modality1
  • Learn the model parameters efficiently adjusting
    the Cross-Entropy method in the field of DBNs5
  • On-line approximation suitable for HCI6

25
Citations
  • 1 A.K.Noulas, Gwenne Englebienne, Ben Kröse,
    Multi-Modal Speaker Diarization, IEEE
    Transactions on Audio and Speech Processing,
    Special Issue on Multi-Modal Speech processing
    2008, submitted
  • 2 A.K.Noulas, Gwenne Englebienne, Ben Kröse,
    Generative Modeling of Speaker Diarization, MLMI
    08, submitted
  • 3 P. Viola and M. Jones. Robust real-time
    object detection. Technical Report 2001/01,
    Compaq CRL, February 2001.
  • 4 A.K.Noulas,Ben Kröse, EM detection of
    multi-modal cues, ICMI 06
  • 5 A.K.Noulas,Ben Kröse, On-line Speaker
    Diarization, ICMI 07
  • 6 A.K.Noulas,Ben Kröse, Cross Entropy for
    learning in DBNs, MLMI 07
Write a Comment
User Comments (0)
About PowerShow.com