Title: Summer Work at Vidient, 2006
1Summer Work at Vidient, 2006
- Ensemble Tracking,Part-Based Trackingand
merging Mean-Shift with theEarth Movers Distance
2Ensemble Tracking Sampling
- Collect many pixels in the region of the object
- Build a feature for each pixel, label each ,-
3The Feature Type
- Histograms of Gradients (HoG)
- Calculated over a 5x5 pixel region centered on
the pixel of interest - Concatenate the 9D HoG with the average RGB
values over the 5x5 pixel region to form a 12D
feature vector
4Ensemble Tracking Training
- Train a weak classifier using Linear SVM on this
collection of pixels - Weight the weak classifier using AdaBoost
- Combine the weak classifiers into a strong
classifier
5Ensemble Tracking Tracking
- Output of strong classifier is where T number
of classifiers - Convert classifier output to a psuedo-probability
using a sigmoid function and create a confidence
map - Track the object by applying mean-shift on the
confidence map
6Ensemble Tracking Update
- Collect a new set of samples based on the new
object position - Remove the oldest weak classifier in the
Ensemble - Re-train remaining weak classifiers using
AdaBoost - Train a new weak classifier and add it to the
Ensemble
7Some Minor Notes
- We used an outlier rejection scheme which
sometimes threw out nearly every sample. We limit
the number that can be removed - We give classifiers zero weight if they have an
error greater than 0.45 - If during update too many classifiers have a zero
weight we train an extra classifier to ensure
there are enough good classifiers to track the
object in the next frame
8Ensemble Tracking Results
9Ensemble Tracking Failings
- Difficult to represent the overall object using
such tiny (5x5 pixel) regions - Typical mean-shift problem of finding a local
maximum and therefore obtaining poor tracking
results - Imperfect tracking is made worse since we train a
new classifier on background regions
10Part Tracking A New Approach
- Try and find the parts of an object (arm, leg,
hood, wheel) and keep a list of these parts - Build an object template based on the spatial
relations between these parts - Track the object in future frames by sliding the
template and finding the best match - Update the template by removing either the oldest
or poorly performing parts and training
replacement parts
11The Feature Types
- Maintain two separate part ensembles, one of HoG
features and one of color features - Each list has the same number of parts
- HoG features can be of any size and are have 9
bins - Color features are 4x4x4 histograms
12The Part Representation
- Each part contains A feature vector, the
position of the part in relation to the first
part in the list, an age, An average Euclidean
distance between feature vector and similar-sized
parts in the background (this is used for
evaluating performance and weighting)
13Choosing the Best Part
- An exhaustive search of every possible part
- Calculate the average Euclidean distance between
proposed part and the background - Choose the part with the highest average distance
14Tracking Using the Part Ensemble
- Slide the parts as a whole, find the best match
based on a weighted vote of all parts - Similar to a template matching where one
generates a template on-the-fly
15Updating the Part Ensemble
- Remove the oldest part
- Train a new part using the exhaustive search
method discussed previously - We also tried removing poorly performing
(low-weight) parts, but results degraded - Poorly performing parts will only last several
frames before they become the oldest and will be
removed
16Results of the Part Tracker
17Failure of the Part Tracker
- Features are too sparse, difficult to track using
just a few weak, unstable features - Difficult to handle partial occlusions. If the
majority of parts became occluded quickly (less
than N/2 frames) then unable to track the object - Drifting problem again. How do we know when it is
OK to train a new part, and when we are training
on background introduced from drifting?
18Mean Shift Tracking
- Obtain mean-shift vector y by maximizing the
Bhattacharyya coefficient, which is equivalent to
minimizing the distancemaximizewhere - First term in p is independent of y so only need
the second term
Bhattacharyya coefficient for a single bin u
19The Bhattacharyya Coefficient
- Compares bin i from distribution A with bin i
from distribution B, so only corresponding bins
are matched - So the distance between two distributions is
20The Earth Movers Distance
- Compares bins in distribution A with near-by bins
in distribution B - Allows for close-matches, not as strict as
Bhattacharyya coefficient - Where c is a cost function (distance between
histogram bins), fiJ is the amount of flow from
bin i to bin J, and yJ is the total amount of
flow to bin j (a normalization factor)
21Combining EMD with Mean Shift
- So the original equation of maximizing the
Bhattacharyya coefficientbecomes a matter of
minimizing the EMDwhereand D(x, y) is the
Euclidean distance function
EMD for a single bin u
22Results of the EMD-MS Tracker
23Comparisons and Conclusions
- No ground truth, so cannot make an absolute
comparison, only subjective!!! - Part-based tracker tends to get better
localization than the Ensemble Tracker, and the
length of time the object is able to be tracked
before being lost is roughly equal - Part-based tracker has fewer user-defined
parameters and is more ad-hoc, Ensemble Tracker
was developed by several people and refined - EMD-MS tracks for more frames than both the
Ensemble Tracker and the part-based tracker but
suffers from high-speed small-scale drifting (ie,
it jitters) - EMD-MS 25 Hz Ensemble Tracker 10 Hz
Part Tracker 7 Hz (?) - Tests were performed over 18 video sequences
24References
- S. Avidan. Ensemble Tracking. In Proc. IEEE
Conf. on Computer Vision and Pattern
Recognition, San Diego, CA, 2005. - N. Dalal and B. Triggs. Histograms of oriented
gradients for human detection. Conference on
Computer Vision and Pattern Recognition (CVPR),
2005. - Q. Zhu, S. Avidan, M.C. Yeh and K.T. Cheng. Fast
Human Detection Using a Cascade of Histograms of
Oriented Gradients. IEEE Computer Vision and
Pattern Recognition 2006 (CVPR 2006) June, NYC,
USA. - P. Viola and M. Jones. Rapid Object Detection
using a Boosted Cascade of Simple Features.
Conference on Computer Vision and Pattern
Recognition 2001 (CVPR 2001). - D. Comaniciu, V. Ramesh and P. Meer. Real-Time
Tracking of Non-Rigid Objects using Mean Shift.
IEEE Conf. on Computer Vision and Pattern
Recognition (CVPR), Hilton Head Island, South
Carolina, 2000. - Y. Rubner, C. Tomasi and L.J. Guibas. A Metric
for Distributions with Applications to Image
Databases. IEEE International Conference on
Computer Vision (CVPR), Bombay, India, 1998. - D. Wojtaszek, R. Laganiére. Tracking and
Recognizing People in Colour using the Earth
Movers Distance. IEEE International Workshop,
2002.