Large Vocabulary Unconstrained Handwriting Recognition - PowerPoint PPT Presentation

About This Presentation

Title:

Large Vocabulary Unconstrained Handwriting Recognition

Description:

Hidden Markov Models can be smaller less parameters to estimate. States may be ... Hide the states : HMM. Hidden Markov Models. Given a observed sequence H ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 33

Provided by: jayashrees1

Learn more at: http://www.cse.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Large Vocabulary Unconstrained Handwriting Recognition

1
Large Vocabulary Unconstrained Handwriting
Recognition

J Subrahmonia
Pen Technologies
IBM T J Watson Research Center

2
Pen Technologies

Pen-based interfaces in mobile computing

3
Mathematical Formulation

H Handwriting evidence on the basis of which a
recognizer will make its decision
H h1, h2, h3, h4,,hm
W Word string from a large vocabulary
W w1, w2, w3, w4,., wn
Recognizer

4
Mathematical Formulation
CHANNEL
SOURCE
5
Source Channel Model
CHANNEL
FEATURE EXTRACTOR
WRITER
DIGITIZER
H
DECODER
6
Source Channel Model
Handwriting Modeling HMMs
Language Modeling
SEARCH STRATEGY
7
Hidden Markov Models
Memoryless Model
Add Memory
Hide Something
Mixture Model
Markov Model
Add Memory
Hide Something
Hidden Markov Model
Alan B Poritz Hidden Markov Models A Guided
Tour ICASSP 1988
8
Memoryless Model
COIN Heads (1) probability p
Tails (0) probability 1-p Flip the coin 10
times (IID Random sequence) Sequence 1 0 1 0
0 0 1 1 1 1 Probability p(1-p)p(1-p)(1-p)(
1-p)pppp

9
Add Memory Markov Model
2 Coins COIN 1 gt p(1) 0.9, p(0) 0.1
COIN 2 gt p(1) 0.1, p(0)
0.9 Experiment Flip COIN 1, Note the
outcome If ( outcome Head) Flip Coin
1 Else Flip Coin 2 End Sequence 110 0
Probability 0.90.90.10.9 Sequence 1010
Probability 0.90.10.10.1
10
State Sequence Representation
1 0.9
0 0.9
0 0.1
1
2
1 0.1
Observed Output Sequence ? Unique State Sequence
11
Hide the states gt Hidden Markov Model
0.9
0.9
0.9 0.1
0.1 0.9
0.9 0.1
0.1
s1
s2
0.1
0.1 0.9
12
Why use Hidden Markov Models Instead of
Non-hidden?

Hidden Markov Models can be smaller less
parameters to estimate
States may be truly hidden
Position of the hand
Positions of articulators

13
Summary of HMM Basics

We are interested in assigning probabilities p(H)
to feature sequences
Memoryless model
This model has no memory of the past
Markov noticed that is some sequences the future
depends on the past. He introduced the concept of
a STATE a equivalence class of the past that
influences the future
Hide the states HMM

14
Hidden Markov Models

Given a observed sequence H
Compute p(H) for decoding
Find the most likely state sequence for a given
Markov model (Viterbi algorithm)
Estimate the parameters of the Markov source
(training)

15
Compute p(H)
p(a) p(b)
0.5
0.4
0.8 0.2
0.5 0.5
0.3 0.7
0.7 0.3
0.3
0.5
s1
s3
s2
0.2
0.1
16
Compute p(H) contd.

Compute p(H) where H a a b b
Enumerate all ways of producing h1a

0.5x0.8
s1
s1
0.40
0.3x0.7
s2
0.21
0.2
0.2
s2
s2
0.04
0.4x0.5
s2
s3
0.03
0.5x0.3
17
Compute p(H) contd.

Enumerate all ways of producing
h1a h2a

s1
0.5x0.8
0.5x0.8
s1
s1
0.3x0.7
s2
0.2
0.3x0.7
s2
s2
s2
0.2
0.2
0.4x0.5
s2
s3
0.2
s2
s2
0.5x0.3
0.4x0.5
0.4x0.5
s2
s3
s2
0.5x0.3
0.5x0.3
s3
18
Compute p(H)

Can save computation by combining paths

s1
s1
s1
s2
s2
s2
s2
s3
s2
s2
s3
s2
s3
19
Compute p(H)

Trellis Diagram

0
a
aa
aab
aabb
s1
.2
.2
.2
.2
.2
s2
.4x.5
.4x.5
.4x.5
.4x.5
.1
.1
.1
.1
.1
s3
20
Basic Recursion

Prob (Node)
sum (Prob(predecessor) x Prob
(predecessor-gtnode) )
Boundary condition Prob (s, 0) 1

0
a
aa
aab
aabb

s1, a 0.4
s1, a 0.4
s1, a 0.4
s1, a 0.4
1.0
s1
1.0
0.4
.16
.016
.0016
s1, 0 .08 s1, a .21 s2, a .04
s1, 0 .032 s1, a .084 s2, a .066
s1, 0 .0032 s1, b .0144 s2, b .0364
s1, 0 .00032 s1, b .00144 s2, b .0108
s1, 0 0.2
s2
0.2
0.33
.182
.054
.01256
s2, 0 .033 s1, a .03
s2, 0 .0182 s2, a .0495
s2, 0 .0054 s2, b .0637
s2, 0 .001256 s2, b .0189
s2, 0 0.02
s3
0.02
0.063
.0677
.0691
.020156
21
More Formally Forward Algorithm
22
Find Most Likely Path for aabb- Dynamic Prog. or
Viterbi

Max Prob (Node)
MAX(Max(predecessor) x Prob
(predecessor-gtnode) )

0
a
aa
aab
aabb

s1
s1,b .0016
s1, a 0.4
s1, a .16
s1, b .016
1.0
s1, 0 .0032 s1, b .0144 s2, b .0168
s1, 0 .08 s1, a .21 s2, a .04
s1, 0 .032 s1, a .084 s2, a .066
s1, 0 .00032 s1, b .00144 s2, b .00336
s2
s1, 0 0.2
s2, 0 .021 s1, a .03
s2, 0 .0084 s2, a .0315
s2, 0 .00168 s2, b .0294
s2, 0 .000336 s2, b .00588
s2, 0 0.02
s3
23
Training HMM parameters
1/3
1/2
1/2 1/2
p(a) p(b)

H abaa
1/3
1/2
1/3
.000385
.000578
.000868
.001157
.002604
.001736
.001302
p(H) .008632
24
Training HMM parameters
A posterior probability of path i

.045
.067
.134
.100
.201
.150
.301
25
Training HMM parameters
26
Training HMM parameters
.46
.60
.64 .36
.71 .29
.68 .32
.40
.34
.60 .40
.20

0.00108
0.00129
0.00404
0.00212
0.00253
0.00791
0.00537
Keep on repeating 600 iterations p(H)
.037037037 Another initial parameter set p(H)
0.0625
27
Training HMM parameters

Converges to local maximum
There are 7 (atleast) local maxima
Final solution depends on starting point
Speed of convergence depends on starting point

28
Training HMM parameters Forward Backward
algorithm

Improves on enumerating algorithm by using the
Trellis
Results in reduction from exponential computation
to linear computation

29
Forward Backward Algorithm
j
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . . . .
. . . . . . . . . .
30
Forward Backward Algorithm

Probability that hj is produced
by and the complete output is H

Probability of being in state and producing
the output h1, .. hj-1
Probability of being in state and producing
the output hj1,..hm
31
Forward Backward Algorithm