Gesture Recognition - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Gesture Recognition

Description:

American Sign Language Recognition Using Hidden Markov Models (HMMs) ... box, car, book, table, paper, pants, bicycle, bottle, can, wristwatch, umbrella, ... – PowerPoint PPT presentation

Number of Views:269

Avg rating:3.0/5.0

Slides: 46

Provided by: Mart608

Category:

more less

Transcript and Presenter's Notes

Title: Gesture Recognition

1
Gesture Recognition

Martin Stein
26/5/04

2
Gesture Recognition

Markov Models
American Sign Language Recognition Using Hidden
Markov Models (HMMs)
Gesture RecognitionUsing Finite StateMachines
(FSMs)
Conclusion

3
I. Markov Models

Hidden Markov Models (HMMs)
3 typical problems
HMM topologies

4
a) Hidden Markov Models

tuple ? (S,?,A,K,B)
S S0,S1,...,SN - states
? p0,p1,...,pN - initial distribution
A (aij) - transition probabilities
K o1,o2,...,oN - output signs
B b0(oi),b1(oi),...,bN(oi) - emission
probabilities
discrete in time
state transition probability constant

5
Weather example

S high-pressure,low-pressure
P 1.00
A
K sunshine, rain
B Psunshine, high0.8 Prain,high
0.2 Psunshine,low 0.3 Prain,low 0.7

Psunshine0.8 Prain 0.2
Psunshine0.3 Prain 0.7
6
Weather example

Output sequence
O (sunshine,rain,sunshine)
State sequence X ???

7
b) 3 typical problems

Evaluation
Decoding
Training

8
Evaluation

Given HMM ?, output sequence O
What is P(O ?)?

Trivial algorithm O(NT)
Forward Algorithm O(N2T)
9
Decoding

Given HMM ?, output sequence O
Most probable state sequence ?

Viterbi-Algorithm Alignment output?state
10
Training

Given HMM ? (S,?,A,K,B) output sequence O
Maximize P(O ?)... How?

Global optimum ? inefficient
Local optimum ?
Forward-Backward-Algorithm
Baum-Welch-Reestimator

11
c) HMM topologies

Ergodic
Left-to-Right

12
II. ASL Recognition (HMMs)

What is American Sign Language?
Related work
HMM approach (MIT,1996)
HMM topology
System overview
Feature extraction
Desk-based recognizer
Wearable-based recognizer

13
a) What is American Sign Language?

? 6000 gestures
Finger spelling
Pace of spoken conversation
Eyebrows

14
b) Related work

1973, Gunnar Johansson
Human gestures can be recognized solely by motion
information.
1985, Sperling et al.
Isolated signs remain intelligible when
subsampled to 24x16 pixels

15
b) Related work

System
Instrumented gloves
Desktop-based camera systems
Methods
Template matching
Neural nets
Model-based approach

16
b) Related work

HMMs used succesfully in
speech recognition
handwriting recognition
Since 95 Several HMM-based recognizers
demonstrated

17
c) HMM approach (MIT,1996)

Thad Starner
Joshua Weaver
Alex Pentland
1 colour camera
Unadorned hands
Real time

18
c) HMM approach (MIT,1996)

Part-of-speech grammar
pronoun verb noun adjective pronoun

19
d) HMM topology

Estimated number of different states 5

? handle less complicated signs Add skip
transitions
? fine tuning empirically 4 state HMM, 1 skip
transition
20
e) System overview

Goal Widely usable real-time system without
constraints
Two different mounting locations
320 x 240 pixels
colour
10 fps

21
e) System overview

Second-person viewpoint
First-person viewpoint

22
f) Feature extraction

Algorithm for hand segmentation
Scan image for skin coloured pixel
Grow region by checking neighbours
Use centroid as seed for next frame
?
Two blobs

23
f) Feature extraction

Two blobs

? Second moment analysis

?
Feature vector
hands x,y position
change in x,y between frames
area/size (in pixels)
angle of axis of least inertia
length of eigenvector
eccenctriciy of bounding ellipse

24
g) Desk-based recognizer

384 training sentences
94 test sentences
Training
each sign ? separate HMM
train output probabilities (means, variances)

25
Training

Divide sentence in 5 equal portions
?
Use Viterbi alignment
? initial estimates for means variances

? Baum-Welch re-estimator ? Optimized means
variances
26
Testing

Concatenate all HMMs in all combinations
Calculate P(O ?)
Recognize sequence with highest probability

27
Testing

box 0.117
want
You paper 0.165
lose box 0.086

28
g) Desk-based recognizer
29
h) Wearable-based recognizer

400 training sentences
100 test sentences
New grammar added (5-word restriction)
Signer is to look forward

30
h) Wearable-based recognizer
31
III. Gesture Recognition (FSM)

Finite State Machine approach
Modelling using FSMs
Training the gesture model
Recognition
Results

32
a) FSM approach

Pengyu Hong
Matthew Turk
Thomas S. Huang
Goal
Real-time gesture recognizer
Technique to segment and align data automatically

University of Illinois at Urbana Microsoft
Research
33
b) Modelling using FSMs

Feature extraction
Real-time skin-colour tracking algorithm
?
2D positions of face and hands
?
Trajectories of the hands relative to the head
Training data observing a repeated gesture
several times

34
b) Modelling using FSMs

Gesture ordered sequence of states
state S ltûs,?s,ds,Tmin,s,Tmax,sgt
ûs 2D centroid
?s spatial covariance matrix
ds distance threshold
Tmin,s,Tmax,s duration interval

35
c) Training the gesture model

Decoupletemporal information ? spatial
information
?
learn spatial information
?
incorporate the temporal data
?
refine spatial information

36
1. Spatial clustering

Define a threshold for the spatial variance
?
Begin with a model of two states
?
Train with dynamic k-means algorithm
?
Split state with largest variance...

37
1. Spatial clustering

Wave left hand gesture without temporal
information

38
2. Temporal alignment

Each data point
is assigned
a label ?

Manually specify the temporal sequence ?
structure of the FSM
39
2. Temporal alignment

Segment training data into gesture samples
1 1 1 2 2 2 2 0 0 0 0 2 2 2 1 1
1 1 1 2 2 2 0 0 0 0 2 2 1 1 1
1 1 2 2 2 2 0 0 0 0 0 2 2 2 1 1
...

Number of samples per state duration?
Tmin,s,Tmax,s
40
Training finished

ltûs,?s,ds,Tmin,s,Tmax,sgt

41
d) Recognition

Real time
Start all FSMs simultaneously
Check sample after sample ? O(n)

FSM requirements violated ? Reset and ignore FSM
FSM requirements met
Final state reached ? Recognizer fires
42
e) Results

Hand gestures

43
e) Results

Mouse gestures

44
IV. Conclusion

HMMs
Use detailed features (orientation, speed, ...)
Generalized extremely well
Able to recognize large vocabulary
FSMs
Handle gestures with different lengths
Computation complexity greatly reduced
Works with small training sets

45
References

1 T.Starner, J. Weaver, and A. Pentland.
Real-time American Sign Language recognition
using desk and wearable computer-based video.
IEEE Trans. Patt. Analy. and Mach. Intell., 1998.
2 P. Hong, M. Turk, and T.S. Huang. Gesture
modeling and recognition using finite state
machines. Proc. Fourth IEEE International
Conference and Gesture Recognition, March 2000,
Grenoble, France.
3 P. Hong, M. Turk, and T. S. Huang.
Constructing Finite State Machines for Fast
Gesture Recognition. 15th International
Conference on Pattern Recognition, Barcelona,
Spain, Sep 3-7, 2000.