Title: Human Motion Capture
1Human Motion Capture
- Guest lecture by
- Thomas B. Moeslund
- Computer Vision and Media Technology
- Aalborg
2Agenda
- Part I General
- What is human motion capture?
- Different motion capture technologies
- Computer vision-based motion capture
- Part II Specific
- Motion capture of a human arm
3What is human motion capture?
- Motion Capture MoCap
- TBMs definition
- MoCap is the process of capturing the movements
of an object at some resolution - Digital representation
- at some resolution Many movable parts in a
human. -
4What is human motion capture?
- Often the main skeleton structure like arms or
legs, but MoCap of soft parts is also possible
5MoCap applications
- Group applications into
- three groups
- Analysis
- Diagnostics of motion
- disabilities, athletes
- Control
- Better HCI, special effects
- Surveillance
- Carpark monitoring, Shopping behaviors
Resolution
6MoCap technologies
- Accelerometer
- Gyroscopes
- Mechanical
- Acoustic
- Electromagnetic
- Optical
7MoCap technologies
- Accelerometer
- Gyroscopes
- Mechanical
- Acoustic
- Electromagnetic
- Optical
8MoCap technologies
- Accelerometer
- Gyroscopes
- Mechanical
- Acoustic
- Electromagnetic
- Optical
9MoCap technologies
- Accelerometer
- Gyroscopes
- Mechanical
- Acoustic
- Electromagnetic
- Optical
10Optical MoCap
11Optical MoCap
12Natural optical MoCap
- The long term goal because
- Controlled vs. uncontrolled environment
- Freedom of movement
- Non-invasive method
- We call this
- Computer Vision-Based Human MoCap
13CV-Based Human MoCap
14CV-Based Human MoCap
- The motion is captured by doing the following for
each image in a sequence - Find the pose parameters for each body part
- Pose parameters 3D translation 3D rotation
15Two different approaches to CV-based human MoCap
- 1) Find the different body parts in the image
- and combine them into a model
- 2) Model-based CV
- Beforehand we define a (3D) geometrical model of
the human - Analysis-by-synthesis (AbS) approach
- Project different 3D configurations of the model
into the image gt 2D - Assume camera calibration
- Compare projected model with image data via
similarity measure - Highest similarity gt current configuration
16Analysis-by-synthesis (AbS)
- Model representation
- Image data representation
- Matching
17AbS - Model Representation
- Cylinders, stick-figure, ellipsoids, cones,
18AbS - Model Representation
- Model representation state-space representation
- Degrees of freedom (DoF)
- External and internal DoF
- State-space is spanned by the DoF
- One point in state-space one state, which
defines one configuration of the object - For the human body fix the internal DoF,
external 25 DoF
19AbS Image Representation
- Typical image representations
- Edges Contours Silhouettes
20AbS - Matching
- Compare every possible model configuration
(external DoF) with the image data - This is done by projecting each discrete value in
the state-space into the image and calculate a
similarity measure - Match configuration most similar to image data
- E.G., compare silhouettes or edges
- Why is this in general difficult?
21Why is matching in general difficult?
- Huge state-space gt too many configurations
- Human skeleton e.g., 25 DoF
- Resolution 1cm and 1deg
- Limits
- Internal DoF Fixed
- External DoF 0500cm and 0360deg
- Size of state-space ( of different
configurations) - 500336022 1064 infinity
- Brute force gt whatever!!!
22What can we do about it? (1)
- Reduce
- Model e.g. only capture motion of the arms
- Movements e.g. 2D
- Resolution e.g. 10cm, 10deg
- DoF e.g., one blob, upper lower arm arm
- Constraints on state-space parameters
- Based on setup
- Kinematics
- Based on image pre-processing
23What can we do about it? (2)
- Assume a smooth and uni-modal
- surface in the solution space
- Apply an iterative search
- Coarser-to-finer search
- Gradient search in solution space
- Other methods exist
- Be aware of local minima!
24What to remember
- MoCap is the process of capturing the movements
of an object at some resolution - Different technologies, but natural CV best
- Model-based Computer Vision
- Analysis-by-synthesis approach
- Project model into the image and compare
- Model representation
- State-space representation, degrees-of-freedom
(DoF) - Image representation
- Edges, contours, silhouettes
- Matching
- Brute force is seldom possible!
- Apply constraints and some kind of search strategy
25CV-based MoCap of a human arm
26CV-based MoCap of a human arm
- Model-based approach
- Representing the model (arm)
- Representing the shoulder
- Representing the image data
- Matching
- Constraints
- Search strategy
27Representing the arm
- Monocular approach
- The arm is a sub problem of the entire human body
- Relevant in general and in HCI in particular
- Assumptions
- The hand is part of the lower arm
- Lengths of upper and lower arm known
- Shoulder position fixed and known
28State-Space of the arm
- Modeling the Human Arm
- Standrad approaches
- Cartesian coordinates (E,H) (6 par.)
- Angles (Eulers) (q1,q2,q3,q4) (4 par.)
29Local Screw Axis Model
- Screw axis (H,a) (4 par.)
- Color vision gt hand position in image
- Camera cali. gt Line in space H gt Hz
- Local screw axis model (Hz,a) (2 par.)
- Only two parameters gt state-space can be
visualized !!
30Size of the State-Spaces
- Cartesian coordinates 1011
- Angles (Eulers) 1010
- Screw axis 108
- Local screw axis model 104
31Is the Shoulder Joint Static?
- This is virtually always assumed
- Torso gt shoulder joint?
- Translation
32Joints in the Shoulder Complex
- SC-joint 3 DoF
- AC-joint 3 DoF
- ST-joint 4 DoF
- GH-joint 3 DoF
- Shoulder joint
- Shoulder complex
33Degrees of Freedom
- SC AC ST closed kinematic chain
- 4 DoF between Thorax and Glenoid
- 2 translations
- 2 rotations
- Governed by GH
- Resulting DoF
- 2 prismatic joints 2 DoF
- GH-joint 3 DoF
34Finding the Displacements
- Dashed AC-joint
- Dotted Additional
- Solid GH-joint
35Representing the arm and shoulder
- Local screw axis model (LSAM) gt 2 DoF
- Model the displacements of the shoulder
- Given by the LSAM gt 2 DoF in total !
- Very compact representation gt small state-space
- Model-based approach
- Representing the arm and the shoulder
- Representing the image data Silhouettes
- Matching
36Matching
- Resolution 1cm, 1deg.
- Local screw axis model 43200 conf.
- Constraints based on kinematics
- Constraints on Hz and a
- Search method
37Constraining Hz
Pruning
38Constraining a
39Total effect of constraints
40Matching Search strategy
- Euler angles -gt Local Screw Axis Model
- 1010 -gt 104
- Constraints
- 104 -gt 103 (avg.)
- Is that number low enough to do an exhaustive
search? - In general yes, however
- When combined with the torso many more possible
solutions exist - The LSAM is based on finding the hand in the
image. What if we cant find it or if the
position is uncertain?
41Sequential Monte Carlo (SMC) search
AKA Condensation, particle filter, multiple
hypotheses
- Multiple
- hypotheses
- at time t
- Multiple
- hypotheses
- at time t1
42Using (and improving) the SMC
- Find all the likely positions of the hand in the
image and correct the predictions accordingly - Algorithm
- Find all skin colour (hand) blobs in the image
- Color segmentation
- Find the likelihood of each blob being a hand
- Compare blob with ellipse
- Associate a number of particles with each blob in
accordance with their likelihoods - Predict and correct
43Correct the predictions
- For one blob
- For each particle associated to the blob
- Predict arm pose
- LSAM gt Hp,Ep
- Correct hand
- by Diffusion
- Along l
- Along H1Hp
- Around H1Hp
- Correct elbow
- Prediction error HpHc
- Au and Al
44Results
Corrected SMC
Standard SMC
45Results
Corrected SMC
Standard SMC
- Weight Black, white, thin black
- Captures ambiguity with fewer particles
46What to remember
- MoCap of the human arm
- Model-based approach
- Representation of the arm
- Eulers angles -gt Local Screw Axis Model
- 4DoF -gt 2DoF (can be visualized!)
- Size of state-space 1010 -gt 104
- Representation of the image data Silhouettes
- Matching
- 4 constraints on Hz and 2 on a gt
- Size of state-space 104 -gt 103 (avg.)
- Search strategy Sequential Monte Carlo
- Handles multiple hypotheses
- Handles uncertainties regarding the position of
the hand - Overall CV model-based approach is powerful, but
requires a solution to the problem of the huge
state-space
47The end!
48The Likelihood of a Hand
- The area of the blob
- The difference between the CoG and the center of
the hand. - Center of hand pos. with greatest distance to an
edge (distance transf.) - The shape of a hand can be approximated by an
ellipse - Measure area and perimeter gt ellipse
- Compare ellipse with blob
49Results
- Compare standard SMC w. our approach
- 50 particles are used
- Image measurements temporal edges
50Observation PDF
Upper arm
Lower arm
51HT - Convergence
52Finding the Displacements (2)
- Dy2 r(1 sin t)
- t 2700 a b 900 y
- Shoulder rhythm
- Rh/y, f hy
- 4 phases
- 1,2,4 b const. gt Da Dy
- 3 a const. gt Db Dy
- We now have Dy2(f)
- for each phase
53Finding the Displacements (1)
- Vertical displacement Dv Dy1 Dy2
- Horizontal displacement Dh Dx1 Dx2
- From the literature Dx1(f) and Dy1(f)
- Build a model of Dx2(f) and Dy2(f)
54Finding the Displacements (2)
- Find Dy2(f)
- f(y)
- Shoulder rhythm Rh/y, f hy
55Evaluating the Approach
56Analysis-by-Synthesis
- Model representation
- Image representation
- Matching
57Representation
- Pixels Template
- Particle Feature vector
- Low level token Geometric primitives
- Edges, lines, corners, circles, ellipses, etc.
- High level token
- Geometric
-
Complexity
58Representation