Title: Development of an OCR System
1Development of an OCR System
Nathan Harmata TJHSST Computer Systems
Lab 2007-2008
2What is OCR?
Optical Character Recognition
Font and handwriting based
3Goals of My Project
Generic recognition for Latin-based fonts
System built from scratch
Proper handling of most formatting
4Overview of Idocrase System
5Image Processing
6Transformations
Attribute Character Model
7Transformations
Sector Vector - image is parsed into parts that
pass the vertical line test -
then each part is transformed into a collection
of line segments
Gap Vector - gaps, if any, are found on the four
sides of the image
8Transformations
Pixel Concentration Vector which sides, if any,
have a higher
concentration of pixels
9Character Recognition
GCDD Generic Character Definition Database
Averages of Character Models for every character
from many different fonts
0 PixelConcentrationVector balanced balanced
SectorVector 4 3 GapVector
10Character Recognition
For a single character
For words, dictionary and grammar references are
used.
11Idocrase Application
12Results
-Mediocre word recognition -Doesnt handle
formatting well -Doesnt handle small letters
well -Fairly accurate single character
recognition (93.7)?