Title: I. Introduction
1Computer Science
I. Introduction
3D motion capture sequence from an archive.
2D Features
3. Experiments and Results
2D-3D Matching Algorithm.
Problem Definition Given a sequence of tracked
2D feature locations, find the best matching 3D
motion capture sequence from an archive.
Motivation Hand signals are commonly used for
communication in noisy environments or when
people are out of voice range. Examples include
directing an airplane to the runway for take off,
controlling traffic flow, basketball referee
signals, etc.
Contribution No direct 3D structure estimation
is needed. The most relevant work is Parameswaran
and Chellapa 1. We proposed a simpler
alternative to the 2D-3D motion matching problem
that also offers viewpoint invariance.
Figure 1 Basketball Referee Hand Signal
Assumptions We focus on the recognition part of
the algorithm, thus we assume that the video
sequence has been temporally segmented and the
desired 2D feature locations can be reliably
tracked over the whole sequence. Within each
sequence of 2D features, we further assume that
there is only one hand signal.
Data 45 motion capture sequences of basketball
referee gestures http//mocap.cs.cmu.edu. 2D
image features were synthesized from the 3D
motion capture sequences using a frontal view and
scaled to unit height. Approximately half of the
data were used as prototypes in the archive and
the other half used for testing.
Why 3D Motion capture archive? The representation
is more complete than a 2D representation as
there is no need to sample the motion from
multiple views.
Classifier Experiments are conducted using the
nearest neighbor classifier. Hence given a
sequence of 2D features, the 3D motion sequence
with the lowest alignment score is deemed the
best match.
2. Algorithm Overview 2D vs 3D sequence
alignment using Dynamic Time Warping
Description of Experiments Three sets of
experiments are conducted with different set of
features at different noise level. The first
experiment uses all 31 feature shown in Fig 1,
with increasing noise. The second experiment uses
a set of more realistic features points indicated
by the shaded points in Fig 1. The last
experiment uses only shaded points in the upper
body of Fig 1.
Why DTW? The algorithm provides an optimal
alignment between sequences thus we do not have
to worry about variations in the speed of the
Significance of Noise Parameter In the
synthesized images, the person is of unit height.
Suppose we are tracking a person 300 pixels tall,
an error margin of 0.06 in normalized coordinates
simulates a tracker that reports tracked points
within a 36 pixel radius 95 of the time.
Dissimilarity score Given at least six pairs of
2D to 3D correspondences in a frame, the
projection matrix M can be estimated. Given M,
the back-projection error of the 3D points is
used as the dissimilarity score.
Future Work Currently there are no temporal
constraints on computing the projection matrix
from frame to frame. Temporal consistency can be
enforced during the matching process to improve
Figure 2 An example of a DTW matching
4. References
2D vs 3D alignment Once we are able to compute
the dissimilarity between a frame of 2D and 3D
features, the Dynamic Time Warping (DTW)
algorithm can proceed as usual. The DTW algorithm
finds the optimal alignment by minimizing the
dissimilarity cost.
1 V. Parameswaran and R. Chellapa. View
invariants for human action recognition. In CVPR