Learning the Appearance and Motion of People in Video - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Learning the Appearance and Motion of People in Video

Description:

hedvig_at_nada.kth.se, www.nada.kth.se/~hedvig/ Michael Black, ... Jan-Olof Eklundh, KTH joe_at_nada.kth.se. Goal. Tracking and reconstruction of human motion in 3D ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 39

Provided by: hedvigsi

Learn more at: http://www.ai.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning the Appearance and Motion of People in Video

1
Learning the Appearance and Motion of People in
Video

Hedvig Sidenbladh, KTH
hedvig_at_nada.kth.se, www.nada.kth.se/hedvig/
Michael Black, Brown University
black_at_cs.brown.edu, www.cs.brown.edu/people/black/

2
Collaborators

David Fleet, Xerox PARC
fleet_at_parc.xerox.com
Dirk Ormoneit, Stanford University
ormoneit_at_stat.stanford.edu
Jan-Olof Eklundh, KTH
joe_at_nada.kth.se

3
Goal

Tracking and reconstruction of human motion in 3D
Articulated 3D model
Monocular sequence
Pinhole camera model
Unknown, cluttered
environment

4
Why is it Important?

Human-machine interaction
Robots
Intelligent rooms
Video search
Animation, motion capture
Surveillance

5
Why is it Hard?
6
Why is it Hard?

People move fast and non-linearly
3D to 2D projection ambiguities
Large occlusion
Similar appearance of different limbs
Large search space

Extreme case
7
Bayesian Inference
Exploit cues in the images. Learn likelihood
models p(image cue model) Build
models of human form and motion. Learn priors
over model parameters
p(model) Represent the posterior distribution
p(model cue) p(cue model) p(model)
8
Human Model

Limbs truncated cones in 3D
Pose determined by parameters ?

9
Bregler and Malik 98
State of the Art.

Brightness constancy cue
Insensitive to appearance
Full-body required multiple cameras
Single hypothesis

10
Brightness Constancy
I(x, t1) I(xu, t) h
Image motion of foreground as a function of the
3D motion of the body. Problem no fixed model of
appearance (drift).
11
Cham and Rehg 99
State of the Art.

Single camera, multiple hypotheses
2D templates (no drift but view dependent)

I(x, t) I(xu, 0) h
12
Multiple Hypotheses

Posterior distribution over model parameters
often multi-modal (due to ambiguities)
Represent whole distribution
sampled representation
each sample is a pose
predict over time using a particle filtering
approach

13
Deutscher, North, Bascle, Blake 00
State of the Art.

Multiple hypotheses
Multiple cameras
Simplified clothing, lighting and background

14
Sidenbladh, Black, Fleet 00
State of the Art.

Multiple hypotheses
Monocular
Brightness constancy
Activity specific prior
Significant changes in view and depth,
template-based methods will fail

15
How to Address the Problems

Bayesian formulation
p(model cue) p(cue model) p(model)

16
What do people look like?
Changing background
Varying shadows
Occlusion
Deforming clothing
Low contrast limb boundaries
What do non-people look like?
17
Edge Detection?

Probabilistic model?
Under/over-segmentation, thresholds,

18
Key Idea 1

Use the 3D model to predict the location of limb
boundaries in the scene.
Compute various filter responses steered to the
predicted orientation of the limb.
Compute likelihood of filter responses using a
statistical model learned from examples.

19
Key Idea 2
Explain the entire image.
p(image foreground, background)
Generic, unknown, background
Foreground person
20
Key Idea 2
p(image foreground, background) ?
p(foreground part of image foreground)
p(foreground part of image background)
Do not look in parts of the image considered
background
Foreground part of image
21
Training Data
Points on limbs
Points on background
22
Edge Distributions
Edge response steered to model edge
Similar to Konishi et al., CVPR 99
23
Edge Likelihood Ratio
24
Ridge Distributions
Ridge response steered to limb orientation
Ridge response only on certain image scales!
25
Ridge Likelihood Ratio
26
Motion Training Data
xu
x
Motion response I(x, t1) - I(xu, t)
Motion response temporal brightness change
given model of motion
noise term in brightness constancy assumption
27
Motion distributions
Different underlying motion models
28
Fg, Bg Likelihood
29
Likelihood Formulation

Independence assumptions
Cues p(image model) p(cue1 model) p(cue2
model)
Spatial p(image model) ? p(image(x) model)
Scales p(image model) ? p(image(?) model)
Combines cues and scales!
Simplification, in reality there are dependencies

x?image
?1,...
30
Likelihood
Foreground pixels
Background pixels
31
Step One Discussed

Bayesian formulation
p(model cue) p(cue model) p(model)

32
Models of Human Dynamics

Model of dynamics are used to propagate the
sampled distribution in time
Constant velocity model
All DOF in the model parameter space, ?,
independent
Angles are assumed to change with constant speed
Speed and position changes are randomly sampled
from normal distribution

33
Models of Human Dynamics

Action-specific model - Walking
Training data 3D motion capture data
From training set, learn mean cycle and common
modes of deviation (PCA)

Mean cycle
Small noise
Large noise
34
Step Two Also Discussed

Bayesian formulation
p(model cue) p(cue model) p(model)

35
Particle Filter

Problem Expensive represententation of
posterior!
Approaces to solve problem
Lower the number of samples. (Deutsher et al.,
CVPR00)
Represent the space in other ways (Choo and
Fleet, ICCV01)

36
Tracking an Arm
1500 samples 2 min/frame
Moving camera, constant velocity model
37
Self Occlusion
1500 samples 2 min/frame
Constant velocity model
38
Walking Person
samples from 15000 to 2500 by using the learned
likelihood
2500 samples 10 min/frame
Walking model
39
Ongoing and Future Work

Learned dynamics
Correlation across scale
Estimate background motion
Statistical models of color and texture
Automatic initialization

40
Lessons Learned

Probabilistic (Bayesian) framework allows
Integration of information in a principled way
Modeling of priors
Particle filtering allows
Multi-modal distributions
Tracking with ambiguities and non-linear models
Learning image statistics and combining cues
improves robustness and reduces computation

41
Conclusions

Generic, learned, model of appearance
Combines multiple cues
Exploits work on image statistics
Use the 3D model to predict features
Model of foreground and background
Exploits the ratio between foreground and
background likelihood
Improves tracking

42
Other Related Work
J. Sullivan, A. Blake, M. Isard, and
J.MacCormick. Object localization by Bayesian
correlation. ICCV99.
J. Sullivan, A. Blake, and J.Rittscher.
Statistical foreground modelling for object
localisation. ECCV00.
J. Rittscher, J. Kato, S. Joga, and A. Blake. A
Probabilistic Background Model for Tracking.
ECCV00.
S. Wachter and H. Nagel. Tracking of persons in
monocular image sequences. CVIU, 74(3), 1999.

Write a Comment

User Comments (0)