Title: Gesture Recognition
1Gesture Recognition
2Gesture Recognition
- Markov Models
- American Sign Language Recognition Using Hidden
Markov Models (HMMs) - Gesture RecognitionUsing Finite StateMachines
(FSMs) - Conclusion
3I. Markov Models
- Hidden Markov Models (HMMs)
- 3 typical problems
- HMM topologies
4a) Hidden Markov Models
- tuple ? (S,?,A,K,B)
- S S0,S1,...,SN - states
- ? p0,p1,...,pN - initial distribution
- A (aij) - transition probabilities
- K o1,o2,...,oN - output signs
- B b0(oi),b1(oi),...,bN(oi) - emission
probabilities - discrete in time
- state transition probability constant
5Weather example
- S high-pressure,low-pressure
- P 1.00
- A
- K sunshine, rain
- B Psunshine, high0.8 Prain,high
0.2 Psunshine,low 0.3 Prain,low 0.7
Psunshine0.8 Prain 0.2
Psunshine0.3 Prain 0.7
6Weather example
- Output sequence
- O (sunshine,rain,sunshine)
- State sequence X ???
7b) 3 typical problems
- Evaluation
- Decoding
- Training
8Evaluation
- Given HMM ?, output sequence O
- What is P(O ?)?
Trivial algorithm O(NT)
Forward Algorithm O(N2T)
9Decoding
- Given HMM ?, output sequence O
- Most probable state sequence ?
Viterbi-Algorithm Alignment output?state
10Training
- Given HMM ? (S,?,A,K,B) output sequence O
- Maximize P(O ?)... How?
- Global optimum ? inefficient
- Local optimum ?
- Forward-Backward-Algorithm
- Baum-Welch-Reestimator
11c) HMM topologies
12II. ASL Recognition (HMMs)
- What is American Sign Language?
- Related work
- HMM approach (MIT,1996)
- HMM topology
- System overview
- Feature extraction
- Desk-based recognizer
- Wearable-based recognizer
13a) What is American Sign Language?
- ? 6000 gestures
- Finger spelling
- Pace of spoken conversation
- Eyebrows
14b) Related work
- 1973, Gunnar Johansson
- Human gestures can be recognized solely by motion
information. - 1985, Sperling et al.
- Isolated signs remain intelligible when
subsampled to 24x16 pixels
15b) Related work
- System
- Instrumented gloves
- Desktop-based camera systems
- Methods
- Template matching
- Neural nets
- Model-based approach
16b) Related work
- HMMs used succesfully in
- speech recognition
- handwriting recognition
- Since 95 Several HMM-based recognizers
demonstrated
17c) HMM approach (MIT,1996)
- Thad Starner
- Joshua Weaver
- Alex Pentland
- 1 colour camera
- Unadorned hands
- Real time
18c) HMM approach (MIT,1996)
- Part-of-speech grammar
- pronoun verb noun adjective pronoun
19d) HMM topology
- Estimated number of different states 5
? handle less complicated signs Add skip
transitions
? fine tuning empirically 4 state HMM, 1 skip
transition
20e) System overview
- Goal Widely usable real-time system without
constraints - Two different mounting locations
- 320 x 240 pixels
- colour
- 10 fps
21e) System overview
- Second-person viewpoint
- First-person viewpoint
22f) Feature extraction
- Algorithm for hand segmentation
- Scan image for skin coloured pixel
- Grow region by checking neighbours
- Use centroid as seed for next frame
- ?
- Two blobs
23f) Feature extraction
? Second moment analysis
- ?
- Feature vector
- hands x,y position
- change in x,y between frames
- area/size (in pixels)
- angle of axis of least inertia
- length of eigenvector
- eccenctriciy of bounding ellipse
24g) Desk-based recognizer
- 384 training sentences
- 94 test sentences
- Training
- each sign ? separate HMM
- train output probabilities (means, variances)
25Training
- Divide sentence in 5 equal portions
- ?
- Use Viterbi alignment
- ? initial estimates for means variances
? Baum-Welch re-estimator ? Optimized means
variances
26Testing
- Concatenate all HMMs in all combinations
- Calculate P(O ?)
- Recognize sequence with highest probability
27Testing
- box 0.117
- want
- You paper 0.165
- lose box 0.086
28g) Desk-based recognizer
29h) Wearable-based recognizer
- 400 training sentences
- 100 test sentences
- New grammar added (5-word restriction)
- Signer is to look forward
30h) Wearable-based recognizer
31III. Gesture Recognition (FSM)
- Finite State Machine approach
- Modelling using FSMs
- Training the gesture model
- Recognition
- Results
32a) FSM approach
- Pengyu Hong
- Matthew Turk
- Thomas S. Huang
- Goal
- Real-time gesture recognizer
- Technique to segment and align data automatically
University of Illinois at Urbana Microsoft
Research
33b) Modelling using FSMs
- Feature extraction
- Real-time skin-colour tracking algorithm
- ?
- 2D positions of face and hands
- ?
- Trajectories of the hands relative to the head
- Training data observing a repeated gesture
several times
34b) Modelling using FSMs
- Gesture ordered sequence of states
- state S ltûs,?s,ds,Tmin,s,Tmax,sgt
- ûs 2D centroid
- ?s spatial covariance matrix
- ds distance threshold
- Tmin,s,Tmax,s duration interval
35c) Training the gesture model
- Decoupletemporal information ? spatial
information - ?
- learn spatial information
- ?
- incorporate the temporal data
- ?
- refine spatial information
361. Spatial clustering
- Define a threshold for the spatial variance
- ?
- Begin with a model of two states
- ?
- Train with dynamic k-means algorithm
- ?
- Split state with largest variance...
371. Spatial clustering
- Wave left hand gesture without temporal
information
382. Temporal alignment
- Each data point
- is assigned
- a label ?
-
Manually specify the temporal sequence ?
structure of the FSM
392. Temporal alignment
- Segment training data into gesture samples
- 1 1 1 2 2 2 2 0 0 0 0 2 2 2 1 1
- 1 1 1 2 2 2 0 0 0 0 2 2 1 1 1
- 1 1 2 2 2 2 0 0 0 0 0 2 2 2 1 1
- ...
Number of samples per state duration?
Tmin,s,Tmax,s
40Training finished
- ltûs,?s,ds,Tmin,s,Tmax,sgt
41d) Recognition
- Real time
- Start all FSMs simultaneously
- Check sample after sample ? O(n)
FSM requirements violated ? Reset and ignore FSM
FSM requirements met
Final state reached ? Recognizer fires
42e) Results
43e) Results
44IV. Conclusion
- HMMs
- Use detailed features (orientation, speed, ...)
- Generalized extremely well
- Able to recognize large vocabulary
- FSMs
- Handle gestures with different lengths
- Computation complexity greatly reduced
- Works with small training sets
45References
- 1 T.Starner, J. Weaver, and A. Pentland.
Real-time American Sign Language recognition
using desk and wearable computer-based video.
IEEE Trans. Patt. Analy. and Mach. Intell., 1998. - 2 P. Hong, M. Turk, and T.S. Huang. Gesture
modeling and recognition using finite state
machines. Proc. Fourth IEEE International
Conference and Gesture Recognition, March 2000,
Grenoble, France. - 3 P. Hong, M. Turk, and T. S. Huang.
Constructing Finite State Machines for Fast
Gesture Recognition. 15th International
Conference on Pattern Recognition, Barcelona,
Spain, Sep 3-7, 2000.