Math Literate Computers - PowerPoint PPT Presentation

About This Presentation
Title:

Math Literate Computers

Description:

Challenges in Math Recognition. Symbol recognition ( C O 0 7 S ... Special handling of overlapping symbols: Procedurally-coded math syntax Coordinate grammar ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 25
Provided by: dorothea2
Category:

less

Transcript and Presenter's Notes

Title: Math Literate Computers


1
Math Literate Computers
  • Dorothea Blostein
  • School of Computing, Queens University
  • CICM 2009

Math Literacy The ability to read and write math
notation.
2
In people, understanding precedes literacy.
Computers are fairly literate, but with shallow
understanding. People learn to read before they
learn to write. Computers are better at
writing than reading.
Math literacy relates to literacy in other
diagram notationstwo-dimensional,
domain-specific, natural languages.
3
  • Math Notation - A Tool to Support Reasoning
  • Evolved over centuries
  • Additional notation is invented as needed
  • Many dialects

Goal Smooth conversion between paper and
electronic documents
Freedom to think with paper and pencil. Computer
support for typesetting, search, automated
reasoning.
Four Color Theorem, Appel and Haken, 1976
4
Topics
  • Notational Conventions
  • The mapping between information and ink.
  • What is Math Notation, anyway?
  • Semi-standardized. Not formally defined.
  • Approaches to Recognizing Math Notation
  • User Interface Issues
  • People think about meaning, not about ink
    marks.

5
Notational conventions map between information
and ink.
Writing (Generation)
Reading (Recognition)
6
Writing (Generation)
Conventions geared toward generation
Conventions geared toward recognition
Reading, Recognition
Reading (Recognition)
7
Many Diagrams Represent the Same Information
Hard conventions how to encode information.
Soft conventions how to make it readable.
8
Topics
  • Notational Conventions
  • The mapping between information and ink.
  • What is Math Notation, anyway?
  • Semi-standardized. Not formally defined.
  • Approaches to Recognizing Math Notation
  • User Interface Issues
  • People think about meaning, not about ink
    marks.

9
Sources of Information about Math Notation
Sample Documents Math notation defined by use
in society. Introspection.
geared toward manual typesetting.
Written Descriptions
Chaundy, Barrett, Batey, The Printing of
Mathematics, 1957. Wick, Rules for Typesetting
Mathematics, 1965. Higham, Handbook of Writing
for the Math. Sciences, 1993.
By example. People use their judgment .
geared toward computational typesetting.
Knuth, Mathematical Typography, Bulletin of the
AMS, 1979.
for recognizing and generating math notation.
Program Code
Recognition Contests
define datasets and evaluation metrics. Contests
at ICDAR and GREC Arc segmentation, symbol
recognition, segmenting text and graphics, raster
to vector conversion, signature verification,
document binarization, page segmentation.
10
  • Statistics about Math Notation An Example

WangFaure, ICPR 1988
Spatial relations for pairs of bounding boxes.
Ambiguity due to unknown baseline
Gather statistics from training data. Almost
matches human performance in labeling bounding
boxes.
Top labels most likely, based on statistics.
11
Topics
  • Notational Conventions
  • The mapping between information and ink.
  • What is Math Notation, anyway?
  • Semi-standardized. Not formally defined.
  • Approaches to Recognizing Math Notation
  • User Interface Issues
  • People think about meaning, not about ink
    marks.

12
Challenges in Math Recognition
  • Compilers easily handle math notation in
    programming languages.
  • 2D math notation is harder
  • Noise causes errors in segmenting and identifying
    symbols.
  • Cant blame the user for mistakes.
  • Hard to capture 2D relationships effectively in a
    string.
  • Symbol recognition ( C O 0 7 gt
    S 5 / 1 l
  • Several roles for symbols
  • Spatial relationships
  • Little redundancy

Handwritten notation is particularly difficult
13
Survey by Blostein and Grbavec, 1997
Evaluate/compare these approaches?
  • The choice of software architecture is difficult
    to make and defend.

14
No explicit definition of math syntax. Update
code in response to recognition errors. Can get
good recognition performance.
15
Anderson 1969 in Fu 77
Attributes xmin, ymin, xmax, ymax, xcenter m
encodes meaning
Apply a rule to a set of symbols create subsets
with syntactic subgoals.
A clear, well-structured representation of
notational conventions.
16
Okamoto and Miao, 1992
The order of cuts provides the tree-structure of
the expression. A simple and efficient
technique. Can be applied prior to OCR. Special
handling of overlapping symbols
17
2D stochastic context-free grammar Chou 1989
Find the most likely parse of the image, without
segmentation.
Hidden Markov Model Kopec, Chou 1994 An
explicit image-generation model, to drive
recognition. Applied to yellow pages music
notation.
18
Rewrite rules replace one subgraph by another
Blostein, Schürr, Software Practice and
Experience, 1999
Write a graph schema to define the structure of
valid graphs. The PROGRES execution environment
flags violations.
19
Compiler-inspired approach, using tree
rewriting Zanibbi, Blostein, Cordy ICPR 2002
and PAMI 2002 Separate analysis of layout,
lexical, syntactic, and semantic aspects. Get
partial results even if there are syntax
errors. Find linear structures in the input, and
create a tree from them.
Operation of a compiler
Recognition of math notation
20
Topics
  • Notational Conventions
  • The mapping between information and ink.
  • What is Math Notation, anyway?
  • Semi-standardized. Not formally defined.
  • Approaches to Recognizing Math Notation
  • User Interface Issues
  • People think about meaning, not about ink
    marks.

21
Goal seamless transition between - real world
(stylus and paper) - electronic world
Electronic Paper is more advanced than
Paper Electronic
Many paper documents are produced from electronic
sources. Eventually include digitally-encoded
contents? Methods used in digital watermarking
are relevant.
22
Entering math expressions
Method 1 Use Recognition Software Scan a
document image or write on a data tablet
Recognition software
Information
Method 2 Enter information directly
Type the information (e.g. LaTeX)
or use a structure-based editor
Generate math notation
User proofreads and corrects
  • How much user time?
  • How many residual errors?
  • How much frustration?

23
User Frustration
Talk at ICDAR 2001
  • The Argh is a unit of frustration.
  • Kilarghs. Megarghs.
  • Arghometers need to be developed.
  • Document recognition is frustrating because
  • Users dont like to correct errors made by the
    stupid computer. Better to correct errors they
    made themselves.
  • Users dont like to think about the marks on the
    paper.They would rather think about the document
    contents.
  • Users dont like unpredictable systems. Better
    to adapt themselves (even if inconvenient) to
    achieve predictability.

People eventually feel comfortable with
irritating interfaces.
24
Conclusion
  • Topics Notational Conventions What is Math
    Notation, anyway?
  • Math Recognition Approaches User Interface
    Issues
  • Possible research directions
  • Precisely define math literacy tasks.
  • Use soft conventions in recognition.
  • Use statistics know about likely versus unlikely
    expressions.
  • Exploit the advanced state of generation, to
    improve recognition.

A group effort is required.
Write a Comment
User Comments (0)
About PowerShow.com