Omnipage Pro - PowerPoint PPT Presentation

About This Presentation
Title:

Omnipage Pro

Description:

B/W Scanning. Gray Scanning. Color Scanning. Load from image ... Photo. Text recognition. User assisted correction. Result exportation. 8/29/09. Recosoft Ltd ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 43
Provided by: RECO6
Category:
Tags: omnipage | pro

less

Transcript and Presenter's Notes

Title: Omnipage Pro


1
Omnipage Pro
SSIP 2002, Budapest
Internal Structure of the Character Recognition
Engine used inside
  • Dr. István Marosi
  • Recosoft Ltd., Hungary

2
Some Marketing talk
  • Main tasks of an OCR system
  • Image acquisition
  • Layout recognition
  • Text recognition
  • User assisted correction
  • Result exportation

3
Some Marketing talk
  • Main tasks of an OCR system
  • Image acquisition
  • Get image
  • B/W Scanning
  • Gray Scanning
  • Color Scanning
  • Load from image file
  • Preprocess image
  • Layout recognition
  • Text recognition
  • User assisted correction
  • Result exportation

4
Some Marketing talk
  • Main tasks of an OCR system
  • Image acquisition
  • Get image
  • Preprocess image
  • Color separation
  • Thresholding
  • Despeckling
  • Rotation
  • Deskewing
  • Layout recognition
  • Text recognition
  • User assisted correction
  • Result exportation

5
The Preprocessed Image
Joined chars
6
The Preprocessed Image
Joined chars
7
The Preprocessed Image
Broken chars
8
The Preprocessed Image
Broken chars
9
Some Marketing talk
  • Main tasks of an OCR system
  • Image acquisition
  • Layout recognition
  • Text zones
  • Columns of flowed text
  • Tables
  • Inverse text
  • Graphic zones
  • Text recognition
  • User assisted correction
  • Result exportation

10
Some Marketing talk
  • Main tasks of an OCR system
  • Image acquisition
  • Layout recognition
  • Text zones
  • Graphic zones
  • Line Art
  • Photo
  • Text recognition
  • User assisted correction
  • Result exportation

11
Some Marketing talk
  • Main tasks of an OCR system
  • Image acquisition
  • Layout recognition
  • Text recognition
  • ... Lets do it when the marketing staff is
    over...
  • User assisted correction
  • Result exportation

12
Some Marketing talk
  • Main tasks of an OCR system
  • Image acquisition
  • Layout recognition
  • Text recognition
  • User assisted correction
  • By the users random editing...
  • Pop-up verifier
  • Manual Training
  • By proofreading of doubtful words
  • Result exportation

13
Some Marketing talk
  • Main tasks of an OCR system
  • Image acquisition
  • Layout recognition
  • Text recognition
  • User assisted correction
  • By the users random editing...
  • By proofreading of doubtful words
  • Correct User dictionary
  • Changed IntelliTrain
  • Remember trained characters
  • Apply them on following pages
  • Result exportation

14
IntelliTrain
  • Recognized word sorneUüng

15
IntelliTrain
  • Recognized word sorneUüng
  • Fixed word something

16
IntelliTrain
  • Recognized word sorneUüng
  • Fixed word something

17
IntelliTrain
  • Recognized word sorneUüng
  • Fixed word something
  • Substitutions found m ? rn
  • thi ? Uü

18
IntelliTrain
  • Recognized word sorneUüng
  • Fixed word something
  • Substitutions found m ? rn
  • thi ? Uü
  • Perform automatically
  • Learn image pattern and substitution info
  • Find similar substituted (blue) text on actual
    page
  • Match against pattern of substitution and correct
  • Find such errors on following pages, too

19
Some Marketing talk
  • Main tasks of an OCR system
  • Image acquisition
  • Layout recognition
  • Text recognition
  • User assisted correction
  • Result exportation
  • Combine pages into a Document
  • Header / Footer recognition
  • Page numbers
  • Hyperlinks (e.g. See Table 20)
  • Save results

20
Some Marketing talk
  • Main tasks of an OCR system
  • Image acquisition
  • Layout recognition
  • Text recognition
  • User assisted correction
  • Result exportation
  • Combine pages into a Document
  • Save results
  • doc file
  • e-mail
  • Speech synthesizer

21
OP11 Internals
  • Text recognition in ScanSofts OP11
  • OCR Engines available
  • Caeres engine (codename Salt Pepper)
  • Recognitas engine (codename Paprika)

22
OP11 Internals
  • Text recognition in ScanSofts OP11
  • OCR Engines available
  • Caeres engine (Salt Pepper)
  • Uses a Matrix Matching based algorithm
  • feature set 40 cells of an 8x5 grid
  • good overall description of a shape
  • weaker at detailed structure
  • Recognitas engine (Paprika)
  • Uses a Contour Tracing based algorithm
  • feture set convex and concave arcs on the
    contour
  • good detailed description of a shape
  • weaker at overall structure

23
OP11 Internals
  • Text recognition in ScanSofts OP11
  • OCR Engines available
  • Caeres engine (Salt Pepper)
  • Recognitas engine (Paprika)
  • Segmentation algorithms

24
Segmentation
What are those pixel groups belonging to a
single letter?
25
Segmentation
What are those pixel groups belonging to a
single letter?
26
Segmentation
What are those pixel groups belonging to a
single letter?
27
Segmentation
What are those pixel groups belonging to a
single letter?
28
Segmentation
What are those pixel groups belonging to a
single letter?
29
Segmentation
What are those pixel groups belonging to a
single letter?
30
OP11 Internals
  • Text recognition in ScanSofts OP11
  • OCR Engines available
  • Caeres engine (Salt Pepper)
  • Recognitas engine (Paprika)
  • Segmentation algorithms
  • Developed by independent groups
  • Have different strengths and weaknesses

31
OP11 Internals
  • Text recognition in ScanSofts OP11
  • OCR Engines available
  • Caeres engine (Salt Pepper)
  • Recognitas engine (Paprika)
  • Segmentation algorithms
  • Conclusion
  • They are complementary
  • Lets create a voting system

32
OP11 Internals
Image
  • Voting strategies
  • External Black boxvoting

Paprika
Salt Pepper
Txt 2
Txt 1
Vote?
Final Txt
33
OP11 Internals
Image
  • Voting strategies
  • External Black boxvoting

Paprika
Salt Pepper
Txt 2
Txt 1
Dict
Vote
Final Txt
34
OP11 Internals
Image
  • Voting strategies
  • External Black boxvoting15 gain

Paprika
Salt Pepper
Txt 2
Txt 1
Dict
Vote
Final Txt
35
OP11 Internals
Image
  • Voting strategies
  • External Black boxvoting
  • Internal Shapevoting

Salt Pepper
Paprika
Txt 1
Txt 2
Dict
Bronze
Final Txt
36
OP11 Internals
Image
Recognize originalsegmentation
  • Paprika
  • Original segmentation
  • Every independent connected component is a
    character
  • Good segmentation recognize
  • Bad segmentation reject

K.B.
37
OP11 Internals
Image
Recognize originalsegmentation
  • Paprika

K.B.
Train adaptive classifierfrom original shapes
Txt 1
AdaptiveK.B.
38
OP11 Internals
Image
Recognize originalsegmentation
  • Paprika
  • Try several segmentations
  • Loop if unrecognizable

K.B.
Train adaptive classifierfrom original shapes
Txt 1
Recognize broken andjoined shapes
AdaptiveK.B.
39
OP11 Internals
Image
Recognize originalsegmentation
  • Paprika

K.B.
Train adaptive classifierfrom original shapes
Txt 1
Recognize broken andjoined shapes
AdaptiveK.B.
Train adaptive classifierfrom ugly shapes
40
OP11 Internals
Image
Recognize originalsegmentation
  • Paprika

K.B.
Train adaptive classifierfrom original shapes
Txt 1
Recognize broken andjoined shapes
AdaptiveK.B.
Train adaptive classifierfrom ugly shapes
Recognize more brokenand joined shapes
  • Try several segmentations
  • Loop if unrecognizable

Txt 2
41
OP11 Internals
Image
  • Voting strategies
  • 45 gain

Salt Pepper
Paprika
Txt 1
Txt 2
Dict
Bronze
Final Txt
42
OP12
Image
  • Voting strategies
  • 20 gain

Fire- worx
Salt Pepper
Paprika
Txt 1A
Txt 1B
Txt 2
Dict
Bronze
Final Txt
Write a Comment
User Comments (0)
About PowerShow.com