Detecting Pedestrians Using Patterns of Motion and Appearance Viola - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Detecting Pedestrians Using Patterns of Motion and Appearance Viola

Description:

Detecting Pedestrians Using Patterns of Motion and Appearance (Viola & Jones) ... of a state of the art pedestrian detection system which operates on low-res ... – PowerPoint PPT presentation

Number of Views:227
Avg rating:3.0/5.0
Slides: 22
Provided by: csTor
Category:

less

Transcript and Presenter's Notes

Title: Detecting Pedestrians Using Patterns of Motion and Appearance Viola


1
Detecting Pedestrians Using Patterns of Motion
and Appearance (Viola Jones)
  • Jasper Snoek

2
Closely Related Work
  • P. Viola M. Jones - Robust Real-time Object
    Detection, Workshop on Statistical and
    Computational Theories of Vision, July 2001
  • P. Viola M. Jones Rapid Object Detection
    Using a Boosted Cascade of Simple Features,
    ICCVPR, 2001.
  • P. Viola M. Jones Robust Real-Time Face
    Detection, IJCV, 2003
  • P. Viola, M. Jones D. Snow Detecting
    Pedestrians Using Patterns of Motion and
    Appearance, ICCV 2003

3
The Goals
  • Development of a representation of image motion
    which is extremely efficient.
  • Implementation of a state of the art pedestrian
    detection system which operates on low-res images
    under difficult conditions.

The Approach
  • Find extremely basic features of the images that
    can be computed very quickly. (Real-time)
  • Get a huge set of features, and then use machine
    learning techniques (AdaBoost) to find the best
    distinguishing features.

4
The Features
  • First 5 images are created from the original 2
    (It It1) to represent motion
  • ?, U, D, L, R by shifting It 1 pixel in the
    corresponding direction (e.g. U means Up, ? means
    no shift, its the temporal gradient) and taking
    the absolute difference with It1.
  • These images represent crude gradients in motion.
    The sum of the pixels of the images going in the
    direction of motion will be greater than those
    that dont.

5
The Features
  • A feature is a thresholded filter, fi.
  • a if fi(It, ?, U, D, L, R) gt ti
  • ß otherwise
  • For some constants a, ß, ti
  • There are essentially 3 types of filters.
  • 1. fi ri (S)
  • 2. fi abs(ri (?) ri (S))
  • 3. fj ?j(S)
  • ?m represents a sum of pixels over a rectangular
    filter m.
  • S is one of It, ?, U, D, L or R.
  • ri (S) is a sum of pixel values over a box region
    of image S.

6
Examples
  • Take two images

It
It1
7
Representing Motion (Examples)
  • Compute U, D, L, R by shifting image It over 1
    pixel and taking the absolute difference with
    It1. ? is computed as just abs(It - It1).

D has a sum of 121,020 U has a sum of 62,126. So
motion is in the upward direction
D
U
8
Filter Type 1
  • fi ri (S) S is any of It, ?, U, D, L, R.
  • ri (S) is the sum of
    pixel values over a box region.

L
9
Filter Type 2
  • 1. fi abs(ri (?) ri (S))

?
U
S is any of U, D, L, R. ri (S) is the sum of
pixel values over a box region.
10
Rectangular Features (Filter Type 3)
  • fi ?i(S), ? represents a rectangular filter
  • The total difference in pixel values between the
    dark and light parts of the rectangles are the
    filters.

Difference 224
Difference 6,683
Difference 5476
If we set the threshold to 300 this filter can
recognize the symmetry between eyes.
11
Classifier
  • A classifier is a thresholded sum of features.
  • C(It, It1) 1 iff Si Fi(It, ?, U, D, L, R) gt T,
  • A feature is a thresholded filter.
  • a if fi(It, ?, U, D, L, R) gt ti
  • ß otherwise
  • This gives us 4 parameters to select (a, ß, ti,
    T) in addition to choosing what subset of filters
    to use.

12
AdaBoost
  • 1990 - The Strength of Weak Learnability
    (Schapire)
  • 1997 Generalized version of AdaBoost (Schapire
    Singer)
  • AdaBoost is an algorithm for constructing a
    strong classifier as linear combination
  • of simple weak classifiers ht(x).

13
Cascaded Classifier
  • Using all the features in the classifier would
    take too long.
  • Instead a cascade of classifiers was used where
    each subsequent level of the cascade contains
    more features.
  • This way image patches that are very different
    from actual pedestrians can be thrown out using
    only a few features.

14
Experiments
  • Train each classifier in the cascade using 2250
    positive examples and 2250 false positives from
    the previous stages of cascade. (This lowers the
    false positive rate at each stage)
  • Each stage is trained so that 99.5 of true
    positives from previous stage are kept while 10
    of false positives are eliminated (if this cant
    be done, more features are added).

15
Experiments
  • Two detectors (dynamic and static).
  • Dynamic trained using 54,624 filters on the
    original image It and the motion images ?, U, D,
    L, R.
  • Static trained using 24,328 filters on only the
    original image It.

16
Results
  • ROC curves for the classification (by adjusting
    the number of features)

17
Results
  • Correct detections - 80
  • False positives (the total number of false
    positives / the total number of patches tested)
  • 1/400,000 for the dynamic detector which
    corresponds to 1 false positive every 2 frames.
  • 1/15,000 for the static detector which
    corresponds to 13 false positives per frame.

18
Results
Dynamic detector
Static detector
19

Dynamic Detector
Static Detector
20
Comments
  • Using more complex features such as optical flow
    would likely be more successful (but might make
    things slower).
  • Why not use basic background subtraction? It
    would greatly reduce the amount of pixels the
    detector would have to search over.

21
Comments
  • Using information about where pedestrians were in
    previous frames would improve the detector and
    help against occlusions, etc. (i.e. tracking).
  • Is overfitting a problem? AdaBoost can succumb
    to overfitting the training data (thus
    generalizing badly) by picking too many features.
    Here we have 2250 training examples and 54,624
    features. Is 24.3 features per training example
    not too much?
Write a Comment
User Comments (0)
About PowerShow.com