Character Recognition Experimental Design - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Character Recognition Experimental Design

Description:

Collection style geared toward specifics of the CR ... (Oxford Dictionary) 1. Q. 16.1. P. 1.0. J. 17.3. D. 1.4. Z. 18.5. U. 1.5. X. 23.1. C. 5.1. V. 28.0. L ... – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 17
Provided by: michaels75
Category:

less

Transcript and Presenter's Notes

Title: Character Recognition Experimental Design


1
Character Recognition Experimental Design
  • Michael S. Spiegel
  • DePauw University

2
Techniques of the Past
3
Generalized Character Recognition
  • Alphabet is created (2 common ways)
  • Character samples from human writers
  • Expert intelligence about the letterforms
  • A character is drawn
  • Character compared against alphabet
  • CR system returns the letter represented by the
    most equivalent alphabet character

4
Types of CR
  • On-line
  • eg, Palm OS
  • Alphabet Independent
  • General pattern recognizer
  • User Dependent
  • Off-line
  • eg, OCR
  • Alphabet Dependent
  • eg, Sanskrit or English recognition
  • User Independent

5
Testing a CR System
  • Alphabet and test data obtained from
  • Private collection study
  • Collection style geared toward specifics of the
    CR system (advantage)
  • Provides apple and orange comparisons between
    system results (disadvantage)
  • Existing data corpus
  • At the mercy of the collection style
    (disadvantage)
  • Affords reliable comparisons between system
    results (advantage)

6
Data Corpuses
7
Testing a CR System
  • Collections often approximate either
  • Language letter frequencies (alphabet and
    language dependent)
  • Realistic writing conditions (email or text)
  • Difficult to have both in one collection

8
Testing a CR System
  • Data set is often divided into
  • Training set / alphabet
  • Test set
  • Each letter/case must exist at least twice in a
    data set
  • Frequencies 2 count minimum usually manifest in
    a single static training set!

9
Problem Statement
  • Design a collection and experiment methodology
    that provides for
  • User independent (and therefore user dependent)
    systems
  • Alphabet and language independence
  • Character and traditional letter frequency
    analysis

10
Our Collection Method
  • Requires 30 instances of each character
  • Upper- and lower-case
  • Training data is not included in test data
  • More instances needed
  • Testing alphabet sizes 1 to 3
  • 30 3 33 instances of each character of each
    case from each participant needed for statistical
    significance
  • 33 26 2 1716 characters per user!

11
To Apply Letter Frequency
  • 33 instances of each character
  • q 33 e 1878
  • Too much writing!!
  • 16,820 total characters
  • Extremely difficult to collect
  • Write for an entire day
  • Conduct multiple testing sessions per participant
  • (Oxford Dictionary)

12
Letter Frequency Weighting
  • Sample error 27.4
  • Frequency weighted error 10.1

13
Varied Size of Alphabet
  • Chose multiple random alphabets of each size per
    user
  • Alphabets of size
  • 1 3326 3.0e39
  • 2 (3332)26 4.1e78
  • 3 (333231)26 2.5e117
  • Chose 900 examples to reach 99 confidence of the
    users mean accuracy within .086

14
Example In Action
  • Previous Graffiti study found a user to have
    97.5 accuracy with
  • 1 alphabet containing 3 of each letter, and one
    test run
  • Same user with our new method

15
Conclusions
  • On-line
  • User-dependent
  • Alphabet-independent
  • Novel approach to satisfy these things
  • 60 individuals tested
  • 33 instances per character/case
  • Alphabets size 1 to 3

16
Questions?
Write a Comment
User Comments (0)
About PowerShow.com