Title: ModelBased Hand Tracking with Texture, Shading and SelfOcclusions
1Model-Based Hand Tracking with Texture, Shading
and Self-Occlusions
- Martin de La Gorce
- Nikos Paragios
- David J. Fleet
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAAAAAAA
2Goal
- Recover 3D hand pose from monocular video.
Input
3D Model
3Challenges
- Search
- Hands have 30 degrees of freedom
- Fast accelerations make prediction difficult at
30fps - Depth uncertainty and reflection ambiguities
exist - Image measurement
- Parts of the hand have similar colors
- Surface texture is limited
- Edges often ambiguous
- Self-occlusions are ubiquitous
4Challenges
- E.g., edge and silhouette measurements often
leave depth uncertainty and reflection
ambiguities unresolved.
input
edges
color / silhouette
5Previous Methodologies
- Stenger et al (2004)
- efficient pruning in pose space
- robust but not accurate
- Sudderth et al (2004)
- loopy graphical model
- occlusions difficult to model
- Common limitations
- optimization remains challenging
- silhouette edges used, but not shading
- Lu et al (2003)
- silhouette, edges and optical flow
- optical flow requires small motions
- difficult to weight cues properly
(so independence is assumed)
6Our Approach
- Improved generative model
- an articulated skeleton overlaid with deformable
soft tissue - inclusion of shading and texture information
(no ad hoc terms in the likelihood function) - Effective optimization
- gradient-based Quasi-Newton energy minimization
7Our Approach
- Parameters Hand pose q, lighting L and texture T
Generative model produces expected image with
energy measure (negative log likelihood)
Estimation problem Find parameters (q, L, T)
that minimize the energy E.
8Rest of the talk
- Model details
- hand geometry
- shading and texture
- rendering
- Online tracking and optimization
- form of the energy function
- energy gradient (despite occlusions)
- constrained optimization and initial guess
- Experimental results
9Hand Geometry
- Skeleton 17 bones, 28 DOFs (22 joint angles 6
global) with joint limits - 3 scaling parameters per bone to adapt morphology
- Surface 1000 triangular facets
- Skeleton pose controls vertex displacements
(Lewis et al 00)
10Shading and Texture
- Shading
- Distant point source ambient light
- Lambertian reflectance
- Texture
- Albedo image mapped to surface mesh
- Captures surface texture and residual properties
of hand appearance (e.g., transient
wrinkles)
11Image Synthesis
- Hand Rendering
- Perspective projection
- Hidden surface removal
- Background
- median image from several frames for static
camera - RGB density function pbg(I) otherwise
12Objective Function
Residual image
13Objective Function
When background is modeled through a density
function over color space, pbg , we re-express
the residual as
synthetic silhouette interior
14Online Pose Tracking
- Initial frame
- Rough manual initialization with canonical pose
- Minimize E to find pose, lighting and
morphological parameters, with texture map equal
to mean skin color
- For each subsequent frame
- Initialize search by extrapolating previous
estimates - Minimize E to refine pose and lighting
- Update texture map given pose and lighting
15Gradient w.r.t. Lighting Pose
The gradient of E with respect to lighting L is
straightforward
Nevertheless it can be specified analytically ...
161D Example
Consider a 1D case in the vicinity of an
occlusion boundary, where the residual function
is discontinuous.
0
1
Residual function
Energy function
171D Example
Energy function
Energy derivative
18Gradient w.r.t. Pose
In 2D the derivative of E w.r.t. pose becomes
residual at occluded surface
residual at occluding surface
occlusion boundaries
boundary velocity
boundary normal
19Optimization (Pose Lighting)
- Optimization via Sequential Quadratic
Programming
20Texture Update
- Texture update by minimization of
- Smoothing term between neighboring texels
21Experimental Results
Initialization with canonical hand pose and
optimization to find, pose, lighting and
morphological parameters
First frame
Synthetic Image
Residual Image
22Experimental Results
Synthetic Image
Input Image
Residual Image
Synthetic sideview
23Stengers Data with Pose-Space Reduction
Synthetic Image
Input Image
Residual Image
Synthetic sideview
24Stengers Data without Pose Space Reduction
Synthetic Image
Input Image
Residual Image
Synthetic sideview
25Stengers Data with Pose-Space Reduction
Synthetic Image
Input Image
Residual Image
Synthetic sideview
26Stengers Data without Pose Space Reduction
Synthetic Image
Input Image
Residual Image
Synthetic sideview
27Experimental Results
28Conclusion
- Encouraging results
- efficient use of image information
- efficient local gradient-based search
- Limitations
- local search
- single hypothesis tracking
- no prior on plausible poses (except linear
constraints)