Title: Large Vocabulary Unconstrained Handwriting Recognition
1Large Vocabulary Unconstrained Handwriting
Recognition
- J Subrahmonia
- Pen Technologies
- IBM T J Watson Research Center
2Pen Technologies
- Pen-based interfaces in mobile computing
3Mathematical Formulation
- H Handwriting evidence on the basis of which a
recognizer will make its decision - H h1, h2, h3, h4,,hm
- W Word string from a large vocabulary
- W w1, w2, w3, w4,., wn
- Recognizer
-
4Mathematical Formulation
CHANNEL
SOURCE
5Source Channel Model
CHANNEL
FEATURE EXTRACTOR
WRITER
DIGITIZER
H
DECODER
6Source Channel Model
Handwriting Modeling HMMs
Language Modeling
SEARCH STRATEGY
7Hidden Markov Models
Memoryless Model
Add Memory
Hide Something
Mixture Model
Markov Model
Add Memory
Hide Something
Hidden Markov Model
Alan B Poritz Hidden Markov Models A Guided
Tour ICASSP 1988
8Memoryless Model
COIN Heads (1) probability p
Tails (0) probability 1-p Flip the coin 10
times (IID Random sequence) Sequence 1 0 1 0
0 0 1 1 1 1 Probability p(1-p)p(1-p)(1-p)(
1-p)pppp
9Add Memory Markov Model
2 Coins COIN 1 gt p(1) 0.9, p(0) 0.1
COIN 2 gt p(1) 0.1, p(0)
0.9 Experiment Flip COIN 1, Note the
outcome If ( outcome Head) Flip Coin
1 Else Flip Coin 2 End Sequence 110 0
Probability 0.90.90.10.9 Sequence 1010
Probability 0.90.10.10.1
10State Sequence Representation
1 0.9
0 0.9
0 0.1
1
2
1 0.1
Observed Output Sequence ? Unique State Sequence
11Hide the states gt Hidden Markov Model
0.9
0.9
0.9 0.1
0.1 0.9
0.9 0.1
0.1
s1
s2
0.1
0.1 0.9
12Why use Hidden Markov Models Instead of
Non-hidden?
- Hidden Markov Models can be smaller less
parameters to estimate - States may be truly hidden
- Position of the hand
- Positions of articulators
13Summary of HMM Basics
- We are interested in assigning probabilities p(H)
to feature sequences - Memoryless model
- This model has no memory of the past
- Markov noticed that is some sequences the future
depends on the past. He introduced the concept of
a STATE a equivalence class of the past that
influences the future - Hide the states HMM
14Hidden Markov Models
- Given a observed sequence H
- Compute p(H) for decoding
- Find the most likely state sequence for a given
Markov model (Viterbi algorithm) - Estimate the parameters of the Markov source
(training)
15Compute p(H)
p(a) p(b)
0.5
0.4
0.8 0.2
0.5 0.5
0.3 0.7
0.7 0.3
0.3
0.5
s1
s3
s2
0.2
0.1
16Compute p(H) contd.
- Compute p(H) where H a a b b
- Enumerate all ways of producing h1a
0.5x0.8
s1
s1
0.40
0.3x0.7
s2
0.21
0.2
0.2
s2
s2
0.04
0.4x0.5
s2
s3
0.03
0.5x0.3
17Compute p(H) contd.
- Enumerate all ways of producing
- h1a h2a
s1
0.5x0.8
0.5x0.8
s1
s1
0.3x0.7
s2
0.2
0.3x0.7
s2
s2
s2
0.2
0.2
0.4x0.5
s2
s3
0.2
s2
s2
0.5x0.3
0.4x0.5
0.4x0.5
s2
s3
s2
0.5x0.3
0.5x0.3
s3
18Compute p(H)
- Can save computation by combining paths
s1
s1
s1
s2
s2
s2
s2
s3
s2
s2
s3
s2
s3
19Compute p(H)
0
a
aa
aab
aabb
s1
.2
.2
.2
.2
.2
s2
.4x.5
.4x.5
.4x.5
.4x.5
.1
.1
.1
.1
.1
s3
20Basic Recursion
- Prob (Node)
- sum (Prob(predecessor) x Prob
(predecessor-gtnode) ) - Boundary condition Prob (s, 0) 1
0
a
aa
aab
aabb
s1, a 0.4
s1, a 0.4
s1, a 0.4
s1, a 0.4
1.0
s1
1.0
0.4
.16
.016
.0016
s1, 0 .08 s1, a .21 s2, a .04
s1, 0 .032 s1, a .084 s2, a .066
s1, 0 .0032 s1, b .0144 s2, b .0364
s1, 0 .00032 s1, b .00144 s2, b .0108
s1, 0 0.2
s2
0.2
0.33
.182
.054
.01256
s2, 0 .033 s1, a .03
s2, 0 .0182 s2, a .0495
s2, 0 .0054 s2, b .0637
s2, 0 .001256 s2, b .0189
s2, 0 0.02
s3
0.02
0.063
.0677
.0691
.020156
21More Formally Forward Algorithm
22Find Most Likely Path for aabb- Dynamic Prog. or
Viterbi
- Max Prob (Node)
- MAX(Max(predecessor) x Prob
(predecessor-gtnode) )
0
a
aa
aab
aabb
s1
s1,b .0016
s1, a 0.4
s1, a .16
s1, b .016
1.0
s1, 0 .0032 s1, b .0144 s2, b .0168
s1, 0 .08 s1, a .21 s2, a .04
s1, 0 .032 s1, a .084 s2, a .066
s1, 0 .00032 s1, b .00144 s2, b .00336
s2
s1, 0 0.2
s2, 0 .021 s1, a .03
s2, 0 .0084 s2, a .0315
s2, 0 .00168 s2, b .0294
s2, 0 .000336 s2, b .00588
s2, 0 0.02
s3
23Training HMM parameters
1/3
1/2
1/2 1/2
p(a) p(b)
H abaa
1/3
1/2
1/3
.000385
.000578
.000868
.001157
.002604
.001736
.001302
p(H) .008632
24Training HMM parameters
A posterior probability of path i
.045
.067
.134
.100
.201
.150
.301
25Training HMM parameters
26Training HMM parameters
.46
.60
.64 .36
.71 .29
.68 .32
.40
.34
.60 .40
.20
0.00108
0.00129
0.00404
0.00212
0.00253
0.00791
0.00537
Keep on repeating 600 iterations p(H)
.037037037 Another initial parameter set p(H)
0.0625
27Training HMM parameters
- Converges to local maximum
- There are 7 (atleast) local maxima
- Final solution depends on starting point
- Speed of convergence depends on starting point
28Training HMM parameters Forward Backward
algorithm
- Improves on enumerating algorithm by using the
Trellis - Results in reduction from exponential computation
to linear computation
29Forward Backward Algorithm
j
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . . . .
. . . . . . . . . .
30Forward Backward Algorithm
- Probability that hj is produced
by and the complete output is H -
Probability of being in state and producing
the output h1, .. hj-1
Probability of being in state and producing
the output hj1,..hm
31Forward Backward Algorithm
32Training HMM parameters
- Guess initial values for all parameters
- Compute forward and backward pass probabilities
- Compute counts
- Re-estimate probabilities
- BAUM-WELCH, BAUM-EAGON,
- FORWARD-BACKWARD, E-M