Title: Automated Video Event Detection and Classification AVEDaC
1Automated Video Event Detection and
Classification (AVEDaC)
2Data Collection
TIBURON
VENTANA
Photos from http//www.mbari.org/muse/platforms/v
entana.htm
http//www.mbari.org/dmo/vessels/tiburon.html
3Acquiring Video
- ROV Ventana
- 1989 to present
- max depth 1850 m
- Sony DXC 3000 Sony HDC-750
Panasonic WVE550 3-chip camera - 510x492 (h/v) pixels 3 CCD RGB 752x592
(h/v) pixel 3 CCD RGB960x600 (h/v) pixels
Broadcast-quality video from ROV-mounted
cameras
ROV Tiburon 1998 to present max depth gt 4000 m
4Tiburon science camera and lights
Science Camera -Panasonic WVE550 3-chip 752x592
(h/v) pixel 3 CCD RGB
Lighting - Deep Sea Power Light 2 - 400 w HMI
fixed 2 - 400 w HMI on pan tilts 2 - 400 w
HMI optional Daylight Color Temperature (5600
K)
5Problem we are trying to solve
- MBARI ROVs have proven that high quality video is
a useful quantitative scientific instrument for
ocean ecology research. - BUT today analyzing and interpreting science
video is a labor intensive human activity. The
questions we ask are limited by the time and
talent it takes to do the detailed analysis. - Scaling from 2 ROVs to multiple AUVs to 100s of
Observatory cameras cries out for automated video
analysis.
6ROV Video processing
- MBARI Collects 300 days per year of video from
its ROVs about 1,000 tapes, 1,000 hours. - 16,000 tapes 12,000 hours of undersea video
from broadcast-quality cameras - Need to enable integration of results over many
dives over many years. Over 1,000,000 total
annotations in database. (100 annotations/hour) - Annotating video is time-consuming and tedious,
especially quantitative annotation. Can we supply
tools to make the analysts more productive? Can
we do automated annotation (at least for some
things)? - Can we build systems for real time analysis
event response at sea?
7Automated analysis flow
Detection Classification
Processing on Beowulf cluster GB ethernet between
nodes
Video collected by ROV
SDI over fiber. Interlaced
SDI over fiber
Sony digital BetaCAM recorder
Capture control
8SDI Serial Digital Interface
- The Society of Motion Picture and Television
Engineers (SMPTE) has defined a family of
interfaces called serial digital interface (SDI)
for transmission of data between video equipment.
It is a widely used interconnect mechanism in
video production facilities and studios.
Variations of SDI have been defined for different
data rates and data formats. - 270-megabits per second full duplex (Mbps)
standard definition (SD) SDI as defined by SMPTE
259M-1997 10-Bit 422 Component Serial Digital
Interface - 1.485/1.4835 gigabits per second full duplex
(Gbps) high definition (HD) SDI, as defined by
SMPTE292M-1998 Bit-Serial Digital Interface for
High Definition Television Systems.
9Approach to the problem
- We know that humans can detect and identify
animals in the underwater video. How do humans do
it? Clues from neurobiology? - Starting in 1999 survey machine vision and
artificial intelligence algorithms for natural
scene video image analysis - 2001-2002 zeroed in on Neuromorphic
Engineering approach to artificial vision - 2002-now partnered with research labs (Caltech
Koch, Perona. USC Itti).
10Midwater transect video
1529_04_06_56_04.mpeg
11Core approach
- Saliency model of attention detecting events in
the visual scene (Koch and Itti) - Early Vision model of classification analysis
of image features to recognize objects (Perona)
12General outline of processing
- Preprocessing
- Detection
- Tracking
- Classification
- Visualization
13Can you spot the siphonophores?
14Annotator could spot the siphonophores
15Detection
- Look for strong signals in the scene
- Color contrast
- Illumination contrast
- Edges
- Select strongest of these signals
- Midwater bias for edges, color
- The objects yielding the strongest signals marked
as salient
16Does the algorithm detect animals in the scene?
- Analyzed several hundred video frames from
midwater transects - Animal in the scene?
- Yes What was the most salient event marked by
the algorithm? - Animal?
- Snow?
17Detection combine with tracking
- Use the saliency-based attention model
- Track only salient objects
- Keep of tracked objects relatively small
- Classify as event based on persistence and
degree of interest - Problem
- Low-contrast elongated structures (e.g.
siphonophores) are often eclipsed by
high-contrast snow - Enhanced saliency-based attention model with a
model of lateral inhibition
18Our scene again
19Orientation filters
20Feature map with lateral inhibition
Stronger signal for the two faint siphonophores
with lateral inhibition
For comparison feature map without lateral
inhibition
21Tracking using Kalman filters
- Track based on prediction of trajectory (optical
flow) - Assign salient objects to trackers
- Manageable since we dont track too many objects
at once
22X and Y measured and estimated
Using two independent Kalman filters for x and y
coordinates
23The annotated movie
Kalman_clip1.mpeg
Yellow actual location Green Kalman filter
estimate
24Comparing detected events with annotation
- Analyzed midwater video transects that had been
fully annotated - Asked
- What percentage of annotations did the program
detect? - What percentage did it miss?
- Did the program detect any animals that the
annotator missed? - What else did the program detect?
25Apply to benthic transects
2526_00_18_53_05.results.mpeg
26Another example
2344_00_15_40_25.results.mpeg
27Comparison with professional annotation
28Classification
- Use a classification program developed by Perona
student MarcAurelio Ranzato at Caltech and
Universita degli studi di Padova - Developed to analyze biological particles
- Based on extracting features using
- local jets (Schmid et al. 1997) (convolution of
the image with a derivative of Gaussian kernel) - image and power spectrum principal components
(Torralba et al. 2003) - Model training data with mixture of Gaussians
(Choudrey and Roberts 2003) - Implemented in Matlab
- processes grayscale square subimages of the
segmented scene containing the object to be
classified
29Sample images
Rathbunaster californicus
other
Parastichopus leukothele
Poeobius mereses
30Classification preliminary results
- Analyzed 7.5 minutes of benthic transect data at
Smooth ridge - Trained classifier with
- 6000 images,
- including 2600 images of Rathbunaster
- Extracted 210 events (7250 images) from transect
data - Program classified
- 90 of the Rathbunaster events correctly
- 10 misclassified
31Next steps
- Collecting, training and analyzing 7 hours of
midwater transect video for Poeobius for a
seasonal / El Nino event science study. - Evaluating and improving classification system.
- Evaluate automatically adjusting weights of low
level detection feature maps from training images
of target of interest (Navalpakkam Itti, 2004).
32What we have learned
- Collecting, processing, analyzing video presents
unusual challenges - Quantity of data
- Quality of data
- Difficulty in ground truthing the results
- Modern machine vision research products can be
applied to real ocean science problems - Shorten the time to useful results by partnering
with the academic research labs developing the
research systems
33Contributors include
Alexis Wilson Ishbel Kerkez
Mike Risi Dorothy Oliver Karen Salamy
Dirk Walther (Caltech) Danelle Cline Dan
Davis Rob Sherlock
Bruce Robison Nancy Jacobsen Stout
MarcAurelio Ranzato (Caltech/NYU) Laurent Itti
(USC)
Christof Koch (Caltech) Pietro Perona (Caltech)
34Sponsors
- David and Lucile Packard Foundation
- NSF Research Coordination Network Institute for
Neuromorphic Engineering