Interactive Event Detection in Video and Audio - PowerPoint PPT Presentation

About This Presentation
Title:

Interactive Event Detection in Video and Audio

Description:

Interactive Event Detection in Video and Audio Rahul Sukthankar Intel Research Pittsburgh & Carnegie Mellon University – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 27
Provided by: oreg77
Category:

less

Transcript and Presenter's Notes

Title: Interactive Event Detection in Video and Audio


1
Interactive Event Detection in Video and Audio
  • Rahul SukthankarIntel Research Pittsburgh
    Carnegie Mellon University

2
Contributors
  • Diamond team L. Huston, Satya, L. Mummert, C.
    Helfrich, L. Fix
  • Forensic video retrievalJ. Campbell, P. Pillai,
    Diamond team
  • Volumetric video analysis Y. Ke, M. Hebert
  • Sound object detection in soundtracksD. Hoiem,
    Y. Ke
  • Interactive search-assisted diagnosis for breast
    cancerY. Liu, R. Jin, B. Zheng, D. Jukic

3
Why Interactive Event Detection?
  • Events of interest are often not known a priori
  • Data exploration find me more things like this
  • Users requirements change based on partial
    results
  • Surveillance Alert me if you see X hmm
    actually I want Y
  • Challenges
  • Limited training data
  • can we still learn good event detectors?
  • Efficiency
  • how best to organize/index/pre-process the data?

4
Outline
  • Event detection in audio
  • sound object detection from a few examples
  • Diamond
  • efficient search of non-indexed data
  • Event detection in video
  • forensic video surveillance
  • volumetric analysis for action detection

5
Example Sound Object Detection
  • Applications of sound object detection
  • Alert me if you hear a gunshot. (monitoring)
  • Fast forward to the next swordfight in LotR
    (search and retrieval)
  • Approach
  • Learn boosted classifier from 5-10 examples of
    the object
  • Scan windowed classifier over all possible
    locations

Clip 1
Clip Classifier

Classify each clip as object or non-object
Return locations of detected sound object
Audio stream
Clip N
D. Hoiem, Y. Ke, R. Sukthankar, ICASSP 2005
6
Sound Object Detection Clip Classifier
  • Feature extraction
  • Weak classifier small decision trees on
    features
  • Learn classifier cascade using Adaboost

D. Hoiem, Y. Ke, R. Sukthankar, ICASSP 2005
7
Sound Object Detection Results
  stage 1 stage 1 stage 2 stage 2 stage 3 stage 3
  pos neg pos neg pos neg
meow 0.0 1.4 0.0 1.2 2.2 0.8
phone 0.0 0.4 4.3 0.1 5.9 0.0
car horn 0.0 3.9 0.6 2.2 3.6 1.3
door bell 1.4 2.1 2.1 0.4 6.3 0.1
swords 6.1 1.3 6.7 0.1 6.7 0.0
scream 0.3 5.5 2.7 1.4 5.3 1.1
dog bark 0.7 1.0 6.0 0.3 7.7 0.2
laser gun 0.0 6.8 4.4 5.1 6.7 0.9
explosion 4.1 5.2 7.5 1.5 12.0 0.5
light saber 4.8 6.8 9.7 1.0 13.9 0.2
gunshot 8.1 6.1 12.5 2.3 14.5 1.1
close door 7.9 7.8 14.5 4.8 17.6 2.3
male laugh 4.3 14.7 9.5 9.7 13.3 7.0
average 2.9 4.4 6.0 2.2 8.5 1.1
8
Framework for Interactive Event Detection
  • Interactive event detection ? non-indexed
    search
  • Search and indexing
  • If queries can be predicted in advance, indexing
    is possible(e.g., Google for text data)
  • Alternative is brute-force search through
    non-indexed data
  • How to perform efficient non-indexed search?
  • May need to execute arbitrary code (learned event
    detector)

9
Brute-Force Search
  • Event detection vast majority of the data is
    useless
  • BFS scales poorly with storage volume

Search app
Storage
User
10
Diamond Early Discard
  • Reject as close to storage as possible
  • Reduce volume of data transferred
  • Scales much better!

Search app
Storage
User
11
Diamond Architecture
Assoc DMA
Searchlet
App Code (proprietary or open)
Filter API
Storage Runtime
Diamond API (open)
Diamond code (open)
Assoc DMA
Searchlet
Storage access protocol (open)
Filter API
Storage Runtime
Assoc DMA
Searchlet
Diamond is a collaborative projectbetween Intel
Research CMU
Filter API
Storage Runtime
12
Anatomy of a Diamond Searchlet
  • Sequence of partially-ordered filters
  • each filter can pass or drop an object
  • filters share state through attributes
  • Diamond determines an optimal filter order

13
Example Application Forensic Video Surveillance
  • Timely reconstruction of a crime scene
  • large quantities of video surveillance data
  • current practice gather manually scan video
    tapes
  • obvious optimization transfer data to central
    site
  • Better solution send your detector to the data

J. Campbell et al., VSSN 2004
14
Video Action Detection Goal
15
Idea Treat Video as a Volume
16
Related work Recognition usingSVMs on
Space-Time Interest Points
Space-time interest points
Figures courtesy Schuldt et al., ICPR 2004
17
Problem with Space-Time Interest PointsToo
Sparse
Two examples of smooth motions where no stable
space-time interest points are detected.
18
Problem with Space-Time Interest Points
Dependent on lighting conditions
19
Volumetric Features on Optical Flow
20
Our Features 3D Extension of Viola-Jones
Volumetric features
Integral Volume
(x, y, t)
Volumetric features can be efficiently computed
using integral volumes, with only 8 memory
accesses per feature. The sum of the volume ise
a f g b c h d.
21
Classifier cascade learned usingDirect Feature
Selection, Wu et al., NIPS, 2002
Millions of potential features for selection, so
Adaboost is too slow.
An example of the features learned by the
classifier to recognize the hand-wave action in a
detection volume
22
Detection
  • Use a sliding volume over video sequence
  • Model true event as a cluster of detections with
    Gaussian distribution.

23
Generic Volumetric Features
  • Processing non-indexed video is slow lots of
    data
  • Are there application-independent representations
    for video?
  • Goal pre-process video once, support multiple
    video event apps.

Y. Ke, unpublished 2006
24
Related workSpace-Time Behavior Based
Correlation
Figures courtesy Shechtman Irani, CVPR 2005
25
Interactive Search-Assisted Diagnosis
ISAD Results
Rank1 benignbiopsy
CLOSE?
suspiciousmass (query)
Rank2 benignbiopsy
Rank3 malignantbiopsy
CollaboratorsB. Zheng, D. Jukic, L. Yang, R. Jin
26
Query-adaptive Local Distance Learning
  • Previously
  • Various Lp norms Euclidean distance is typically
    not the best
  • Global metric learning
  • Learn metric that best satisfies user-given
    pairwise data constraints
  • Fares poorly with multimodal data
  • Local metric learning
  • Learn metric that does above, but weighs nearby
    constraints higher
  • Chicken egg problem
  • Whats new
  • Learn a metric for the given query based on
    neighborhood

27
Summary
  • Many real applications require interactive event
    detection
  • Good for ML algorithms that
  • operate with limited training data
  • train quickly/incrementally
  • exploit unlabeled data
  • Diamond infrastructure for efficient
    non-indexed search
  • http//diamond.cs.cmu.edu/
  • Interactive event detection in video is still
    painful
  • Good general-purpose representation for event
    detection?
Write a Comment
User Comments (0)
About PowerShow.com