Learning the Appearance and Motion of People in Video - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Learning the Appearance and Motion of People in Video

Description:

hedvig_at_nada.kth.se, www.nada.kth.se/~hedvig/ Michael Black, ... Jan-Olof Eklundh, KTH joe_at_nada.kth.se. Goal. Tracking and reconstruction of human motion in 3D ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 39
Provided by: hedvigsi
Learn more at: http://www.ai.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Learning the Appearance and Motion of People in Video


1
Learning the Appearance and Motion of People in
Video
  • Hedvig Sidenbladh, KTH
  • hedvig_at_nada.kth.se, www.nada.kth.se/hedvig/
  • Michael Black, Brown University
  • black_at_cs.brown.edu, www.cs.brown.edu/people/black/

2
Collaborators
  • David Fleet, Xerox PARC
    fleet_at_parc.xerox.com
  • Dirk Ormoneit, Stanford University
    ormoneit_at_stat.stanford.edu
  • Jan-Olof Eklundh, KTH
    joe_at_nada.kth.se

3
Goal
  • Tracking and reconstruction of human motion in 3D
  • Articulated 3D model
  • Monocular sequence
  • Pinhole camera model
  • Unknown, cluttered
    environment

4
Why is it Important?
  • Human-machine interaction
  • Robots
  • Intelligent rooms
  • Video search
  • Animation, motion capture
  • Surveillance

5
Why is it Hard?
6
Why is it Hard?
  • People move fast and non-linearly
  • 3D to 2D projection ambiguities
  • Large occlusion
  • Similar appearance of different limbs
  • Large search space

Extreme case
7
Bayesian Inference
Exploit cues in the images. Learn likelihood
models p(image cue model) Build
models of human form and motion. Learn priors
over model parameters
p(model) Represent the posterior distribution
p(model cue) p(cue model) p(model)
8
Human Model
  • Limbs truncated cones in 3D
  • Pose determined by parameters ?

9
Bregler and Malik 98
State of the Art.
  • Brightness constancy cue
  • Insensitive to appearance
  • Full-body required multiple cameras
  • Single hypothesis

10
Brightness Constancy
I(x, t1) I(xu, t) h
Image motion of foreground as a function of the
3D motion of the body. Problem no fixed model of
appearance (drift).
11
Cham and Rehg 99
State of the Art.
  • Single camera, multiple hypotheses
  • 2D templates (no drift but view dependent)

I(x, t) I(xu, 0) h
12
Multiple Hypotheses
  • Posterior distribution over model parameters
    often multi-modal (due to ambiguities)
  • Represent whole distribution
  • sampled representation
  • each sample is a pose
  • predict over time using a particle filtering
    approach

13
Deutscher, North, Bascle, Blake 00
State of the Art.
  • Multiple hypotheses
  • Multiple cameras
  • Simplified clothing, lighting and background

14
Sidenbladh, Black, Fleet 00
State of the Art.
  • Multiple hypotheses
  • Monocular
  • Brightness constancy
  • Activity specific prior
  • Significant changes in view and depth,
    template-based methods will fail

15
How to Address the Problems
  • Bayesian formulation
  • p(model cue) p(cue model) p(model)

16
What do people look like?
Changing background
Varying shadows
Occlusion
Deforming clothing
Low contrast limb boundaries
What do non-people look like?
17
Edge Detection?
  • Probabilistic model?
  • Under/over-segmentation, thresholds,

18
Key Idea 1
  • Use the 3D model to predict the location of limb
    boundaries in the scene.
  • Compute various filter responses steered to the
    predicted orientation of the limb.
  • Compute likelihood of filter responses using a
    statistical model learned from examples.

19
Key Idea 2
Explain the entire image.
p(image foreground, background)
Generic, unknown, background
Foreground person
20
Key Idea 2
p(image foreground, background) ?
p(foreground part of image foreground)
p(foreground part of image background)
Do not look in parts of the image considered
background
Foreground part of image
21
Training Data
Points on limbs
Points on background
22
Edge Distributions
Edge response steered to model edge
Similar to Konishi et al., CVPR 99
23
Edge Likelihood Ratio
24
Ridge Distributions
Ridge response steered to limb orientation
Ridge response only on certain image scales!
25
Ridge Likelihood Ratio
26
Motion Training Data
xu
x
Motion response I(x, t1) - I(xu, t)
Motion response temporal brightness change
given model of motion
noise term in brightness constancy assumption
27
Motion distributions
Different underlying motion models
28
Fg, Bg Likelihood
29
Likelihood Formulation
  • Independence assumptions
  • Cues p(image model) p(cue1 model) p(cue2
    model)
  • Spatial p(image model) ? p(image(x) model)
  • Scales p(image model) ? p(image(?) model)
  • Combines cues and scales!
  • Simplification, in reality there are dependencies

x?image
?1,...
30
Likelihood
Foreground pixels
Background pixels
31
Step One Discussed
  • Bayesian formulation
  • p(model cue) p(cue model) p(model)

32
Models of Human Dynamics
  • Model of dynamics are used to propagate the
    sampled distribution in time
  • Constant velocity model
  • All DOF in the model parameter space, ?,
    independent
  • Angles are assumed to change with constant speed
  • Speed and position changes are randomly sampled
    from normal distribution

33
Models of Human Dynamics
  • Action-specific model - Walking
  • Training data 3D motion capture data
  • From training set, learn mean cycle and common
    modes of deviation (PCA)

Mean cycle
Small noise
Large noise
34
Step Two Also Discussed
  • Bayesian formulation
  • p(model cue) p(cue model) p(model)

35
Particle Filter
  • Problem Expensive represententation of
    posterior!
  • Approaces to solve problem
  • Lower the number of samples. (Deutsher et al.,
    CVPR00)
  • Represent the space in other ways (Choo and
    Fleet, ICCV01)

36
Tracking an Arm
1500 samples 2 min/frame
Moving camera, constant velocity model
37
Self Occlusion
1500 samples 2 min/frame
Constant velocity model
38
Walking Person
samples from 15000 to 2500 by using the learned
likelihood
2500 samples 10 min/frame
Walking model
39
Ongoing and Future Work
  • Learned dynamics
  • Correlation across scale
  • Estimate background motion
  • Statistical models of color and texture
  • Automatic initialization

40
Lessons Learned
  • Probabilistic (Bayesian) framework allows
  • Integration of information in a principled way
  • Modeling of priors
  • Particle filtering allows
  • Multi-modal distributions
  • Tracking with ambiguities and non-linear models
  • Learning image statistics and combining cues
    improves robustness and reduces computation

41
Conclusions
  • Generic, learned, model of appearance
  • Combines multiple cues
  • Exploits work on image statistics
  • Use the 3D model to predict features
  • Model of foreground and background
  • Exploits the ratio between foreground and
    background likelihood
  • Improves tracking

42
Other Related Work
J. Sullivan, A. Blake, M. Isard, and
J.MacCormick. Object localization by Bayesian
correlation. ICCV99.
J. Sullivan, A. Blake, and J.Rittscher.
Statistical foreground modelling for object
localisation. ECCV00.
J. Rittscher, J. Kato, S. Joga, and A. Blake. A
Probabilistic Background Model for Tracking.
ECCV00.
S. Wachter and H. Nagel. Tracking of persons in
monocular image sequences. CVIU, 74(3), 1999.
Write a Comment
User Comments (0)
About PowerShow.com