Title: Understanding Sketches and Diagrams on the Tablet PC
1Understanding Sketches and Diagrams on the Tablet
PC
- Balaji Krishnapuram
- In collaboration with
- Tablet PC Group (Redmond), and
- Collaborative Handwritten Ink Recognition Project
group - Martin Szummer, Chris Bishop,
- Michel Gangnet, Markus Svensen
2Background
- Extensive work on recognizing hand-written text
already - Some problems remain, but works reasonably for
the most part - Much more to user interface than simply text!
3Project Objective
- Assume the text has been separated from the
figures in earlier pre-processing step - Ongoing Research Markus Svensen
- I focus on sketch and diagram understanding
4Practical Applications
Interest from several product groups MS Office,
Visio,
5Understanding Figures Subtasks
- Fitting Identify best affine transformation of
model for sample of ink
Scoring Which template has been drawn?
Segmentation What is the best explanation of the
whole page of ink?
6Model for generating ink from templates
x1
7Model for generating ink from templates
Fitting/Scoring What is the probability of
generating all user ink while drawing the
template? Assume independence of sampling the ink
points.
8Fitting algorithm
Log-Probability of generating all user drawn ink
while drawing the template under a specific A
9Noise Immunity
10Fitting/Recognizing Segments
Fit templates
Recognize
Original ink
11Segmentation Wrapper Approach
- Stroke from pen down to pen up
- Assume figures are drawn in a continuous sequence
of strokes - Assume existence of temporal ordering information
- i.e. S1, S2, S3, ..., ST
- Further assume that max. number of strokes used
to draw a template, NS, is reasonably small (e.g.
10 or less)
12Segmentation Divide Conquer
Recursive function to identify optimal
partition/score on S1, S2, S3, ..., ST
- score,partitionf(S1, S2, S3, ..., ST , NS)
- Base case
- if Tlt NS consider fitting/recognising the entire
set of strokes as a single figure - For all k2 to T-1 how good is it to divide it
at k? - score1,partition1f(S1, S2, S3, ..., Sk , NS)
- score2,partition2f(Sk1, Sk2, ..., ST , NS)
- Total_score(k)score1score2
- Total_partition(k)partition1partition2
- Return best score/partition out of all the
possibilities considered.
13Square or 4 Lines?
14Over-explaining / Under-explaining
15Gets it right most of the time
16 but some mistakes too
17Current limitations/problems
- Works fine most of the time! Mistakes when
figures are confusingly close or very small - Slow
- Approx. 5 seconds for each of the previous figs.
- Each fitting takes about 0.1 seconds,
combinatorial explosion in partitioning the image
into segments - We use information about temporal sequence of
strokes! - Temporal information lost during cut paste
operations - Users do go back and add things to figures later
- Only considers Affine transform based fitting.
- Arrows and other complicated templates may need
other (non-affine) fitting
18Further work
- Scoring seems to be perfectly fine
- Main focus on partitioning the image
- how to order the search through the set of all
partitions, - guaranteed to reach best interpretation
eventually. - Speed gains in fitting/recognizing individual
figures - Line based (instead of point based)
- Randomized algorithms like RANSAC (Phil, Antonio)
- Discriminative approach (feature extraction,
learn classifiers for parallelograms, ellipses
etc)
19Acknowledgements
- Martin Szummer, Chris Bishop, Michel Gangnet,
Markus Svensen, Hannah Pepper - Antonio Criminisi, Mike Tipping, Phil Torr
- The whole MLP group
- All those who provided us ink samples from real,
human users!
20Questions / Suggestions !?!
Please help us collect more data! Contact Martin
Szummer for more information