Tracking People by Learning Their Appearance

About This Presentation

Title:

Tracking People by Learning Their Appearance

Description:

Tracking People by Learning Their Appearance Deva Ramanan David A. Forsuth Andrew Zisserman Introduction Problem: to track the articulations of people from video ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 30

Provided by: cvlIceCy

Category:

more less

Transcript and Presenter's Notes

Title: Tracking People by Learning Their Appearance

1
Tracking People by Learning Their Appearance

Deva Ramanan
David A. Forsuth
Andrew Zisserman

2
Introduction

Problem to track the articulations of people
from video sequence.
Need to determine both the number of people in
each frame.
Estimate their configuration.
Two stage automatic system
Build a model of appearance of each person in a
video.
Track by detecting those models in each frame.

3
Approach

Under our model, the focus on tracking becomes
not so much identifying where an object is, but
learning what it looks like.
Bottom-up approach
Look for candidate body parts in each frame, then
cluster the candidates to find assemblies of
parts that might be people.
Top-down approach
Look for entire person in a single frame. We
assume people tend to occupy certain key poses,
and so we build models from those poses that are
easy to detect

4
Temporal Pictorial Structures

First-order Markov model, we replicate the
standard model T times, once for each frame

5
Temporal Pictorial Structures
6
Temporal Pictorial Structures
7
Temporal Pictorial Structures
8
Building Models by Clustering

An important observation is that we have some a
priori notion of part appearance Ci as having
rectangular edges.
Detect candidate parts in each frame with an
edge-based part detector.
Cluster the resulting image patches to identify
body parts that look similar across time.
Prune clusters that move too fast in some frames.

9
Building Models by Clustering
10
Detecting Parts with Edges

In the experiments, we used a detector threshold
that was manually set between 10 and 50 (assuming
edge filters are L1-normalized and images are
scaled to 255).

11
Clustering Image Patches

Mean-shift method.
We create a feature vector for each candidate
segment, consisting of a 512-dimensional RGB
color histogram(8 bins for each color axis). We
scale the feature vector by empirically-determined
value to yield a unit-variance model in (3).

12
Enforcing a Motion Model

For each cluster, we want to find a sequence of
candidates that obeys our bounded velocity motion
model defined in (5).
We obtain a sequence of segments, at most one per
frame, where the segments are within a fixed
velocity bound of one another and where all lie
close to the cluster center in appearance.
Prune the sequences that are too small or that
never move.

13
Learning Multiple Appearance Models

We use the learned appearance to build better
segment detectors.
We search for new candidates using the medoid
image patch of the valid clusters from Fig. 5c as
a template.
Link up those candidates that obey our velocity
constraints into the final torso track in Fig.
5d.

14
Learning Multiple Appearance Models
15
Learning Multiple Appearance Models
16
Approximate Inference

If the torso localization (and estimated
appearance) is poor, the resulting appearance and
localization estimates for the limbs will suffer.
One remedy might be to continually pass messages
in Fig. 9 in a loopy fashion (e.g., reestimate
the torso appearance given the arm appearance).

17
Building Models with Stylized Detectors
18
Detecting Lateral Walking Poses
19
Discriminative Appearance Models
20
Track by Model Detection

Multiple scales
System searches over an image pyramid. It selects
the largest scale at which a person was detected.
Occlusion

21
Track by Model Detection

Spatial Smoothing( better than direct MAP)
the smoothed pose tends to be stable since nearby
poses also have high posterior values.
the smoothed pose contains sub-pixel accuracy
since it is a local average.
Temporal Smoothing
By feeding the pose posterior at each frame into
a formal motion model.
Multiple people
Multiple instances

22
Results - Building Models by clustering

Self-starting
Multiple activities

23
Results - Building Models by clustering

Lack of background subtraction

24
Results - Building Models by clustering

Multiple people, recovery from occlusion and
error (see Fig. 18.)

25
(No Transcript)
26
Results - Building Models with a Stylized Detector

Lateral-walking pose detection
Appearance model detection

27
(No Transcript)
28
Discussion

Comparison of model-building algorithms.
We find the two model-building algorithms
complementary.
If we can observe people for a long time, or if
we expect them to behave predictably, detecting
stylized poses is likely the better approach.

29
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Tracking People by Learning Their Appearance - PowerPoint PPT Presentation

Tracking People by Learning Their Appearance

Tracking People by Learning Their Appearance Deva Ramanan David A. Forsuth Andrew Zisserman Introduction Problem: to track the articulations of people from video ... – PowerPoint PPT presentation