Title: Presentation for CBDAR 2005
1Presentation for CBDAR 2005
Camera Document Restoration for OCR Shijian Lu
Chew Lim Tan School of Computing National
University of Singapore
2- New Document Capture Method
- Traditionally, document scanner is widely used
for document capture. As sensor resolution
increases in recent years, high-speed non-contact
text capture through a digital camera or hand
phone is becoming an alternate choice for
document capture and digitalization. - The Related Problems
- Unlike document images captured through a
document scanner, images captured by camera
generally contain two new types of distortions
including the perspective distortion introduced
during the capture process and the geometric
distortion resulting from the non-flat document
surface where text lies. Both distortions must be
removed before OCR.
3The sample images below show two new types of
distortions
Images captured through a digital camera the one
on the left contains only perspective distortion,
while the one on the right contains both
perspective geometric distortions
4Reported Methods
- Perspective Rectification
- 1. P. Clark, M. Mirmhedi
- Requires document boundary labeled with (2) in
the figure in previous page - 2. C.R. Dance
- Requires column boundary labeled with (1) in the
figure in previous page - Geometric Rectification
- M. S. Brown, W. B. Seales
- M. Pilu
- Both require the auxiliary hardware for 3D
measurements
5- Document Image Recognition
Document Image Recognition
- Text Line Segmentation
- Vertical Stroke Boundary (VSB)
Identification - Distortion Differentiation
- Skew Detection and Correction
- Perspective Distortion Detection and
Rectification - Geometric Distortion Detection and
Rectification - OCR Experimentation
6- Document Image Recognition
Text Line Extraction
Text line extraction is implemented through a
character tracing process, which categorize
characters to different text lines based on the
point-to-point and point-to-line distance
constraints illustrated below
Character tracing process
7- Document Image Recognition
Text Line Extraction
With classified character centroids, a group of
straight lines or conics can be fitted. The
figures below show the fitted lines
Straight lines and conics fitted using classified
character centroids
8- Document Image Recognition
VSB Identification
- Stroke Boundary Extraction
(a)-(d) Four sets of structuring elements
customized for stroke boundary extraction
9- Document Image Recognition
VSB Identification
- Stroke Boundary Extraction
- Figures (e) and (f) give extracted left-side and
right-side stroke boundaries using two equations
below
(a) Distorted character b (b)-(d) extracted
stroke boundaries using structuring elements
given in (a) of the figure in last page (e)-(f)
stroke boundaries determined using two equations
on the left
Symbol ? and represent the erosion and XOR
operations
10- Document Image Recognition
VSB Identification
- Stroke Boundary Extraction
- To facilitate the identification of VSB,
extracted stroke boundaries are filtered through
a size filter first, which removes the stroke
boundaries with small size.
Stroke boundary extraction the figure on the
left gives binarized sample word the one in the
middle shows the stroke boundaries extracted from
the left side of character strokes the one on
the right gives labeled stroke boundaries after
the size filtering
11- Document Image Recognition
VSB Identification
- Fuzzy Set Construction
- The desired VSB must be big, straight, and
properly posed. Two fuzzy sets characterizing
their size and linearity properties are firstly
constructed to determine the VSB candidates. The
desired VSB are then identified based on their
pose property. Size set is constructed using
Zadehs S-function.
Parameters a and c are taken as one and two times
of average stroke boundary size. Parameter b
refers to the crossover point.
12- Document Image Recognition
VSB Identification
Table 1 Constructed size sets (SMV size
membership value)
- Fuzzy Set Construction
- For the labeled stroke boundaries of the sample
word laboratory, the membership values of the
size set can be determined using the Zadehs
S-function.
13- Document Image Recognition
VSB Identification
- Fuzzy Set Construction
- The linearity set are constructed based on the
correlation coefficient of the least square
method.
(xi, yi), i 1n, refer to the ith extracted
boundary pixel. Parameters and
represent the average x and y coordinate of
extracted boundary pixels.
14- Document Image Recognition
VSB Identification
Table 2 Constructed linearity sets (LMV
linearity membership value)
- Fuzzy Set Construction
- For the labeled stroke boundaries of the sample
word laboratory, the membership values of the
linearity set can be determined as the
correlation coefficient of the least fitting
method.
15- Document Image Recognition
VSB Identification
- VSB Candidate Determination
- The desired VSB must be big and straight compared
to other stroke boundaries. VSB candidate can
thus be determined through the combination of the
size and linearity sets constructed, which is
carried out using an fuzzy aggregator
S and L refer to the constructed size and
linearity sets. Parameter stands
for the compensation factor indicating where the
actual operator is located between union and
intersection. VSB candidates can thus determined
based on the fact that VSB number is generally
half of character number.
16- Document Image Recognition
VSB Identification
- Fuzzy Set Construction
- For the labeled stroke boundaries of the sample
word laboratory, the membership values of the
aggregation set can be determined using the
aggregator operation. Based on the fact that VSB
number is generally half of the character number,
Stroke boundary 1, 6, 11, 19, and 21 are
determined as the VSB candidates.
Table 3 Constructed aggregation sets (AMV
aggregation membership value)
17- Document Image Recognition
VSB Identification
The desired VSB can be finally identified through
the use of pose value, which is determined as the
slope of the straight line fitted using the
determined VSB candidates. The pose expectation
for each determined VSB candidate can be
determined as
Parameter k represents the number of nearest
neighbors to the studied VSB candidate and it can
be taken as a number from 3 to 6.
18- Document Image Recognition
VSB Identification
- Pose value determination
- For the labeled stroke boundaries of the sample
word laboratory, the pose property can be
determined as the slope of the straight line
fitted using the determined VSB candidates. Based
on the pose expectation, stroke boundary 20 is
rejected, as its pose value deviates far from
that of the neighboring VSB candidates.
Table 4 Calculated pose values (PV pose value)
19- Document Image Recognition
VSB Identification
Table below gives the whole picture of the
proposed VSB identification process
20- Document Image Recognition
VSB Identification
- VSB Candidate Determination
- Based on the aggregation values as given in the
Table, stroke boundary 1, 6, 11, 19, and 20 can
be determined as VSB candidates for the following
example. VSB candidate 20 is further rejected
based on its pose property.
VSB identification process the figure on the
left gives labeled stroke boundaries the one in
the middle shows the determined vertical stroke
boundary candidates the one on the right gives
the identified VSB
21- Document Image Recognition
Distortion Differentiation
Document images with skew or perspective
distortions can be first differentiated from the
ones with geometric distortion based on the
linear fitting error, which can be evaluated
using the distance
where parameters n and m refer to the number of
the fitted middle lines and the number of
characters centroids within the ith classified
character centroid category. Parameter Li refers
to the middle line fitted with the character
centroids within the ith category. Function
Dist(Cj, Li) calculates the distance between the
jth character centroid Cj within the ith
character centroid category and Li.
22- Document Image Recognition
Distortion Differentiation
Skew and perspective distortions can be further
differentiated from each other based on relative
orientation of the fitted middle lines, which can
be evaluated with the following equation
where parameter n refers to the number of the
fitted middle lines. Parameter ?i refers to the
orientation angle of the ith fitted middle line.
For skewed document images, the relative
orientation RO is quite close to zero. But for
the perspective document images, RO is normally
much bigger.
23- Document Image Recognition
Skew Detection and Correction
Skew distortion can be simple removed based on
text line information. Skew angle can be
estimated based on the slope of fitted middle
line of text lines
Parameter are x and y coordinates
of character centroids within the i-th set.
are average of x and y coordinates.
24- Document Image Recognition
Skew Detection and Correction
We propose to exploit character eigen-points to
detect the upside down situation while skew angle
is bigger than 90 degrees or smaller than ? 90
degrees. For each character, eigen-points are
defined as the highest and lowest points in the
direction orthogonal to the orientation of fitted
middle line of text lines.
Detected character eigen-points
25- Document Image Recognition
Skew Detection and Correction
The orientation of text lines can thus be
determined based on the fact that the number of
ascenders is much bigger than that of descenders
for Roman letters. The top line and base line can
also be fitted using the eigen-points of
characters with no ascender or descender.
Detected character ascender descender, and the
top line base line fitted using the
eigen-points of characters with no ascender and
descender
26- Document Image Recognition
Skew Detection and Correction
Document images with upside down skew can thus be
detected and restored
27- Document Image Recognition
Perspective Distortion Detection and Correction
We propose to correct the perspective distortion
through quadrilateral correspondence
construction. The source quadrilateral is
determined using the top base line and the
identified VSB. The target rectangle is
determined based on the number of characters
enclosed within the source quadrilateral and the
approximation of character aspect ratio 11.
Homography determination the figure on the left
gives the quadrilateral determined based on the
top line base line the one on the right gives
the estimated target rectangle
28- Document Image Recognition
Perspective Distortion Detection and Correction
With four point correspondences, the homography
can be estimated using the following equation
where the four point correspondences are given in
the figure in previous page
29- Document Image Recognition
Perspective Distortion Detection and Correction
Multiple homography can thus be determined. The
best one minimizes the following distance
where m is the number of detected text lines and
n is the number of the identified VSB. Sli is the
orientation of ith restored text line and Savg is
the orientation average. ptxj and pbxj represent
two x coordinates of vertices of jth restored VSB
and the component abs((ptxj- pbxj)/ Distavg) is
the normalized distance in the horizontal
direction between the vertices of that VSB.
30- Document Image Recognition
Perspective Distortion Detection and Correction
With the optimal homography, camera documents
with perspective distortion can be rectified.
Distorted and corrected camera document
31- Document Image Recognition
Geometric Distortion Detection and Correction
With the detected top base line and the
identified VSB, camera documents with geometric
distortion can be segmented into multiple smaller
patches.
(a) Fitted top line and base line and identified
VSB (b-c) VSB processing (d) segmentation of
distorted sample word
32- Document Image Recognition
Geometric Distortion Detection and Correction
Based on the features including character span,
character ascender and descender, and character
intersection numbers, characters can be
categorized into 6 categories with 6 different
height-width ratios
Table 6 The classification of characters and
their width-height-ratios
33- Document Image Recognition
Geometric Distortion Detection and Correction
Based on the aspect ratios Ri, the width of
target rectangle can thus be determined as
where VBSavg represents the average size of
identified VSB and parameter n represents the
number of characters and inter-word blanks
enclosed within the partitioned image patches.
34- Document Image Recognition
Geometric Distortion Detection and Correction
From segmented small image patches, target
rectangles can be restored. Based on the vertex
of the quadrilateral correspondences,
rectification homography can be estimated for
each segmented patch and camera documents with
geometric distortions can be rectified patch by
patch.
The segmentation of text lines with geometric
distortion
35- Document Image Recognition
Geometric Distortion Detection and Correction
The figure below gives the segmentation of camera
document with geometric distortion
Camera document sample and the segmentation result
36- Document Image Recognition
Geometric Distortion Detection and Correction
The figure below gives the restored target
rectangles and the corrected document image
Restored target rectangles and rectified document
37Restored text images are then input to an OCR
engine. 150 sample images with skew, perspective
and geometric distortions are tested.
38- Conclusion-Hardware Outlook
- Document processing may be embedded into the
future mobile devices provided - This is possible as resolution of camera
sensor on mobile phone, PDA, and webcam has been
greatly improved. - Some dedicated mobile document analysis chip
has been designed and available on the market. - Camera documents may be analyzed and
recognized based on menu command or a simple
button-clicking operation.
39- Conclusion-Research Direction
- For analysis and understanding of documents
captured by mobile devices, - Skew, perspective and geometric (non-flat
documents) distortions are inevitable in real
applications as images are often captured in a
hurry. Therefore, restoration is always required. - One research direction is to propose
recognition algorithm that is tolerant of
distortion to bypass the process of restoration.
40- Conclusion-Software Embedding
- Currently, the proposed technique works well on
the desktop computer. It has the potential to be
embedded into mobile devices - The restoration is fairly fast and the
process takes around 2-4 seconds for 640480
document images. The speed can be further
improved through code optimization. - The proposed method requires only a single
document image captured by the mobile device.
There is no large memory requirement.
41- Future Outlook
- With improved resolution of webcam, we envisage
rapid remote document capture and text
dissemination through webcams, for subsequent
information processing. - With availability of mobile document chips on the
market, document restoration and recognition
algorithms can be adapted and embedded into
mobile devices such as mobile phones and PDAs.