A Novel ApproachforRecognizing Auditory Events - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

A Novel ApproachforRecognizing Auditory Events

Description:

Create a palette of sounds. Epitomes (Jojic et al) for audio ... Generate distributions over the palette. Use the distribution for classification/detection etc ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 21
Provided by: ResearchM53
Category:

less

Transcript and Presenter's Notes

Title: A Novel ApproachforRecognizing Auditory Events


1
A Novel Approach for Recognizing Auditory Events
Scenes
  • Ashish Kapoor

2
Problem Description
  • How can we represent arbitrary environments, so
    that we can
  • Label scene elements
  • Classify environments
  • Synthesize environmental sounds
  • Example Coffee Shop
  • Basic spectral texture
  • Glasses clinking, doors opening, etc.

3
Outline of Our Approach
  • Create a palette of sounds
  • Epitomes (Jojic et al) for audio
  • Given an audio segment
  • Generate distributions over the palette
  • Use the distribution for classification/detection
    etc

4
Representation Palette of Sounds
World
Palette
Features To Represent Audio
Input Audio
5
Epitomes for images
  • Epitome
  • Jojic, Frey, and Kannan, ICCV 2003
  • Developed for images

6
Epitomes for Audio
  • 1-D signal
  • 2-D representation, but little vertical
    self-similarity
  • Lots of redundancy (silence, repeated background)
  • Much longer inputs, bigger ratio of input to
    epitome size
  • Hours of data gt 10-30 second epitome

7
Informative Sampling of Patches
  • Original epitome take patches at random
  • Our approach try to maximize coverage
  • reduce sampling likelihood of patches similar to
    those we have covered

f
t
t
probabilityof patchselection
t
8
Examples Toy Sequence
600 frame (10 sec) epitome from 3700 frames (2
min)
Informative Sampling
Random Sampling
9
Random Vs Informative
  • Simulation on the toy dataset
  • 2 secs long epitome
  • Likelihood Vs of patches
  • Averaged over 10 runs

10
Examples Outdoor Sequence
1800 frame (1 min) epitome from 15000 frames (8
min)
11
Classification of Events/Scenes
  • Look at distributions over the epitome
  • Given a audio segment to classify
  • For all the patches in the audio
  • Recover the transformations given the epitome
  • Look at the distribution of the transformations
    to classify

P(Te,c1)
Speech
P(Te,c2)
Cars
classifying c
P(Te,cc)
???
12
Experiments
  • 3 Different Environments
  • Highway, Kitchen, Outdoor Parking
  • 6 Minutes of data to train 30 sec long epitome
  • 4 Events to Detect (manually segmented)
  • Speech (22 examples)
  • Car (17 examples)
  • Utensil Knife Chopping Vegetables (29 examples)
  • Bird Chirp (24 examples)
  • None of the above (30 examples)

13
Car
Speech
Knife/Utensil
Chirp
14
Detection Example
  • Speech Detection (hard case)
  • Very noisy environment (148th Ave)
  • Only 5 labeled examples of speech

15
Performance Comparison
  • Mixture of Gaussians
  • For each audio segment to classify
  • Classify every frame using the mixture
  • Vote among the results
  • Nearest Neighbor
  • Same method as for mixture of Gaussians
  • Computationally too expensive!

16
Performance Vs Amount of Training Data
17
Knife/Utensil
Speech
Car
Chirp
18
Contributions
  • Framework for Acoustic Event Detection and Scene
    Classification
  • Epitomes for Audio
  • Informative Sampling (Can be applied to any
    domain)
  • Distributions over epitomic indexes for
    discrimination

19
Future Work
  • Informative Sampling
  • Maximizing the Minimum Likelihood
  • Discriminative Epitomes
  • Novel Scene Classification
  • Rich Representation using Epitomes
  • Boosting, other ensemble techniques
  • Hierarchical Acoustic Sound Analysis
  • Same Model for
  • Acoustic Event Detection, Scene Classification
    Synthesis
  • clustering mechanisms for scene retrieval

20
Acknowledgments
  • Sumit Basu
  • Nebojsa Jojic
  • My friends and fellow interns
Write a Comment
User Comments (0)
About PowerShow.com