Title: Improving Chinese handwriting Recognition by Fusing speech recognition
1Improving Chinese handwriting Recognition by
Fusing speech recognition
- Zhang Xi-Wen
- CSE, CUHK and HCI Lab., ISCAS
- 2005.4.12
2Outline
- 1 Chinese handwriting recognition
- 2 Chinese speech recognition
- 3 Information fusion
- 4 Experimental results
3Handwriting Recognition
- Handwriting segmentation
- Character recognition
41.1 Handwriting segmentation
- It is more difficult for Chinese handwriting
segmentation
5Character extraction using histogram
- A histogram of between-stroke gaps.
- The dimidiate threshold of the histogram is to
extract lines of strokes. - The dimidiate threshold of the histogram of a
line of strokes is to extract characters.
6Figure 1. Handwriting segmentation
7Problems remained
- A Chinese character may be mis-segmented into
many characters. - Many Chinese characters may be mis-grouped as a
character. - The segmentation error will inevitably result in
handwriting recognition errors.
81.2 Character recognition
- Isolated character recognizer from HW
- Many candidates
9Handwriting.
Text recognized from the handwriting.
The ground-truth text.
Figure 2. Handwriting recognition
102 Speech recognition
- Chinese speech.
- On-line, microphone.
- Continuous speech recognizer from MS.
11Text recognized from the speech corresponding to
the handwriting.
The ground-truth text.
Figure 3. Speech recognition
123 Text fusion
- An optimization problem
- Dynamic Programming
133.1 Principles
- The fused text should contain more semantic
information. - Construct a text with the least characters and
the most semantic information.
143.2 Four ways
Text recognized from the handwriting.
Text recognized from the speech corresponding to
the handwriting.
Figure 4. Texts to be fused
153.3 Dynamic Programming
- A directed graph.
- Optimal paths.
16Figure 5. A directed graph with N levels.
17(a) Text recognized from the handwriting.
(b) Text recognized from the speech corresponding
to the handwriting.
(c) The optimal fused text corresponding to the
optimal path.
(d) The ground-truth text.
Figure 6. Text fusion using DP.
183.4 A language model
19Lexicon
20(No Transcript)
214 Experimental results
22(No Transcript)
23(No Transcript)
24- Thank you very much for
- your criticism, comments and suggestions!
- Email xwzhang_at_cse.cuhk.edu.hk
- Tel 3163-4260