Title: Interactive Control of Avatars Animated with Human Motion Data
1Interactive Control of Avatars Animated with
Human Motion Data
- By Jehee Lee, Jinxiang Chai, Paul S. A. Reitsma,
Jessica K. Hodgins, Nancy S. Pollard - Presented by Nathan Hoobler
2Why do we use motion capture?
- Get realistic behavior for free
- An easy interface for generating control for high
DOF models - Can capture behavior far too complicated to model
by hand - Kung Fu, Acrobatics, other stylized motion
3What is the problem with motion capture?
- Motion capture data is inherently complicated
- Usually far more degrees of freedom than can be
easily controlled by hand - Not trivial to synthesize new behaviors
- Transitions between different types of motion are
hard - Often there are redundant behaviors
4What does this paper do?
- Identify distinct behaviors in the motion capture
data - Allow intuitive control of high DOF data with a
small DOF interface - Allow seamless transitions between different
behaviors
5System Overview
- Loosely-patterned data comes in
- A probabilistic transition matrix is built
- Simplified transition graph is used to determine
motion
6System Overview
7What kind of data can we use?
- Long, consistent motion recordings are required
for good transition generation - Does not handle sensor noise well
8System Overview
- Various datasets come in
- Low-Level transitions are generated
9Low-Level Representation
- At this level, the system is very similar to the
Video Textures technique - For each frame, find any other frames in the
dataset that are similar - Calculate the probability of a transition from
frame j to frame k based on how closely the two
frames match
10Low-Level Building the Matrix
- The probability of transitioning from frame i to
frame j is computed as
Where D(i, j) is the weighted distance from
frame i to frame j
And d(pi, pj) is
11So, how efficient is this?
- Since the matrix is just a 2D mapping from any
one frame to any other, the number of transitions
is O(n2)
12So, how efficient is this?
- Since the matrix is just a 2D mapping from any
one frame to any other, the number of transitions
is O(n2) - For 4000-12000 frames per dataset (!)
13So, how efficient is this?
- Since the matrix is just a 2D mapping from any
one frame to any other, the number of transitions
is O(n2) - For 4000-12000 frames per dataset (!)
- We need to reduce the number of transitions
14Low-Level Pruning
- We can take advantage of a few useful features of
the Motion Capture data - Contact with the world should be similar between
transitioning frames - Any interesting data is going to have mostly
low-probability transitions - There are many frames that are very similar to
others - We want to avoid going down dead-end routes
15Low-Level Pruning (Contact)
- Criteria 1 Contact
- Even if frames are very similar, so not
transition if the contact states are different - (Strict interpretation) Only allow transitions
during contact states
16Low-Level Pruning (Likelihood)
- Criteria 2 Likelihood
- Throw away transitions whose probability is less
than some threshold value
17Low-Level Pruning (Similarity)
- Criteria 3 Similarity
- If a frame has many transitions to states that
are all very similar to each other as well, throw
away all but the best fitting transition
18Low-Level Pruning (SCC)
- Criteria 4 Connectedness
- In theory, we want to avoid transitions that
dont lead to well-connected nodes - Only add transitions that remain within the
largest Strongly Connected Component of the graph - A maximal subgraph of a directed graph such that
for every pair of vertices u, v in the subgraph,
there is a directed path from u to v and a
directed path from v to u. (Mathworld)
19Low-Level Blending
- Need interpolation to avoid discontinuities
- Problem sharp changes are allowed at contact
points
20Low-Level Blending
- Need interpolation to avoid discontinuities
- Problem sharp changes are allowed at contact
points - Solution use a non-linear blend function
centered on the contact point and a moving average
21Low-Level Blending
- Case 1 Follow the incoming frame
- Case 2 Follow the outgoing frame
- Case 3 Choose the side closest to the contact
point - Case 4 Just let the foot slide itll look bad
no matter what
22Low-Level Coordinate System
- Fixed/Global versus Relative
- Each has an advantage, depending on the situation
- The paper uses both, depending on the example
23Fixed/Global Coordinates
- Advantages
- Good for spatial data (the recording environment
corresponds strongly with the simulated
environment) - Disadvantages
- Not good for synthesizing motion in new
environments
24Relative Coordinates
- Advantages
- Much easier to synthesize motions from anywhere
in the environment into new behaviors - Disadvantages
- Ignores orientation and position in three-space,
which may be important for some actions
25High-Level Representation
- Low-level representation is far too complicated
to interact with - Simplify the data by grouping like frames into
clusters - For each frame, find the possible clusters that
can be transitioned to in the near term
26High-Level Representation
- Various datasets come in
- Low-Level transitions are generated
- Frames are grouped into clusters
27Building Clusters
- We want a simplified data set
- Weight important joints (arms, legs, pelvis,
etc.) high - Weight less important joints (neck, etc.) low
- Using weighted values, find similar frames and
group them into clusters
28High-Level Representation
- Various datasets come in
- Low-Level transitions are generated
- Frames are grouped into clusters
- A transition tree is built for each frame
29Building the Cluster Forest
- Each frame has a tree of clusters representing
its valid transitions - Find the most probable transition from the
current frame to another cluster - If the number of frames required to reach that
cluster is within a time threshold, add it to the
forest - Repeat
30Caveats about Clustering
- Clustering is not always extremely useful
- Mostly a user interface issue
- Useful for directly selecting the next motion
(Direct Choice) - Not as useful for procedurally determining
behavior (Path Sketching, Mimic)
31Control Methods
- Several interface methods were used, depending on
how well they suited the example - Direct Choice
- Sketching
- Video-Capture
32Direct Choice
- Display valid states for the avatar, and let the
user choose
33Path Sketching
- Allow the user to specify a path to follow
- Find motions that will put the avatar in the
right place
34Video Mimic
- Determine limb and body orientation from video
input - Find closest matching frame(s), and imitate the
user
35Results
- Terrain
- Path Sketching
- Step Stool
- Path Sketching
- Direct Choice
- Playground
- Direct Choice
36Any Questions?