From Tessellations to Table Interpretation - PowerPoint PPT Presentation

About This Presentation
Title:

From Tessellations to Table Interpretation

Description:

Rectangular Tessellations. Partition of an isothetic rectangle into rectangles. ... Applicable to any XY tessellation. Input Excel Table. Copy and paste or Import. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 48
Provided by: Ram164
Learn more at: https://tango.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: From Tessellations to Table Interpretation


1
From Tessellations to Table Interpretation
  • R. C. Jandhyala1, M. Krishnamoorthy1,
  • G. Nagy1, R. Padmanabhan1,
  • S. Seth2, W. Silversmith1
  • 1DocLab, Rensselaer Polytechnic Institute
  • 2Computer Science and Engineering, University of
    Nebraska-Lincoln
  • (Supported by NSF Grants 044114854 and 0414644,
    and Rensselaer Center for Open Source Software)

2
Goal Construction of a narrow-domain ontology
from semi-structured web data (table
understanding )
3
Outline
Tilings (rectangular tessellations) X-Y trees
(1984)
Grammars
Tables
Wang Categories (1996)
A B C D
4
Outline
Tilings (rectangular tessellations) X-Y trees
(1984)
Grammars
Tables
Wang Categories (1996)
A B C D
5
Web tables
  • Cannot precisely define human-understandable
    tables.
  • Convert to smaller set of admissible tables.
  • Why? Algorithmic ease.

6
Admissible Tables
  • Have stub, headings and data cells.

7
Factor out layout-equivalent tables
8
Outline
Tilings (rectangular tessellations) X-Y trees
(1984)
Grammars
Tables
Wang Categories (1996)
A B C D
9
Rectangular Tessellations
  • Partition of an isothetic rectangle into
    rectangles.
  • Uniquely defined by junction points (location and
    type).
  • Number of tessellations increases rapidly with
    table size.

10
XY Tessellations
  • Special case of rectangular tessellations.
  • Successive horizontal and vertical cuts.
  • Easily represented by trees.

11
A tiling and its X-Y Tree(aka slicing structure,
puzzle tree, tree map)
12
Non-slicing structures No XY tree
In fact, X-Y tilings are an infinitesimal
fraction of all tilings. This helps, because
tables never contain this spiral structure.
13
Fundamental Idea
  • Use XY trees to automate table processing and
    understanding.

14
Table to XY tree EX2XY
  • Applicable to any XY tessellation.
  • Input Excel Table
  • Copy and paste or Import.
  • Edit to make admissible.
  • Output XY tree
  • as XML for portability.
  • as parenthesized string for grammars.

15
Example
(http//www40.statcan.ca/l01/cst01/econ50-eng.htm)
16
After import into Excel
17
After Editing
18
Output - XML
  • ltblock id'1.1.2.1' range'17,230,2'gt
  • ltcontentgt
  • Real gross domestic product, expenditure-based,
    by province and territory (millions of chained
    (2002) dollars)
  • lt/contentgt
  • lt/blockgt

19
Outline
Tilings (rectangular tessellations) X-Y trees
(1984)
Grammars
Tables
Wang Categories (1996)
A B C D
20
Table Grammars
  • Can characterize entire families of tables.
  • Developed grammar for one family.
  • Input - Nested parenthesized notation .
  • Output Accept/Reject as example of family.

21
Grammar
  • For parsing column headers
  • S A (Rule 1)
  • A B (Rule 2)
  • B c X B c X (Rules 3 and 4)
  • X c X A X A c (Rules 5, 6, 7 and
    8)
  • S is start symbol.
  • A generates all admissible column headers.
  • B generates category trees.
  • c is a root category.
  • X generates sub-categories.

22
Table Grammars
  • Cannot check if table is consistent.
  • Need further geometric alignment and lexical
    checks.

23
Outline
Tilings (rectangular tessellations) X-Y trees
(1984)
Grammars
Tables
Wang Categories (1996)
A B C D
24
Logical Structure of Tables
  • How to interpret a table?
  • Describe relationship between header cells and
    content cells Wang, U. Waterloo,1996.
  • Wang notation
  • Elegant description.
  • Dimensionality Number of category trees.
  • Cartesian product maps categories to data.

25
Layout independent Wang Notation
Different layout and same information means same
Wang Notation
26
Wang Category Trees for either table
  • characteristic
  • gonsity
  • hepth
  • fleck burlam falder multon
  • Any data cell can be designated by a path
    through each category tree.
  • Leaves correspond to row or column headings.

27
Real Table Understanding
  • Analyzing logical structure not sufficient.
  • Need additional information from title,
    footnotes, captions, etc.
  • Semantic analysis of the labels also important
    need external knowledge.

28
Does Wang Notation always exist?
  • Not always!
  • Inconsistent tables do not have Wang Notation.
  • Others can be edited using virtual headers.

29
XY tree to Wang Notation Algorithm
  • Input XY trees.
  • Output XML version of Wang Notation.
  • Checks for table consistency.

30
Algorithm
  • Locate principal regions - stub, headers and
    content cells.
  • Extract Wang categories.
  • Compute Cartesian product of category paths.
  • Match each key to the content of a delta cell.

31
Conclusions
  • Admissible layouts identified for ease of
    processing.
  • Algorithms developed for
  • extracting XY trees from tables.
  • extracting Wang notation from XY trees.
  • Family of tables identified using a grammar.

32
Future work
  • Augmentations - captions, aggregates, units, etc.
  • Expand the grammar.
  • Automate conversion of table to admissible
    formats.

(http//www40.statcan.ca/l01/cst01/agri111a-eng.ht
m)
33
THANK YOU
34
Goal construction of a narrow-domain
ontologyfrom semi-structured web data(table
understanding )
  • Currently multon is the best choice for rapitting
    velters. It is about 25 better than burlam or
    falder, which have the same girby (hepth/gonsity
    ratio).
  • Check another table to see whether elmer is even
    better.
  • NOT TODAY!

35
H-first tree can be transformed into V-first
tree(and vice-versa)
36
EX2XY Algorithm
  • Two workhorses
  • Vertical_cut returns leftmost sub-rectangle of
    a given rectangle.
  • Horizontal_cut returns topmost sub-rectangle of
    a given rectangle.

37
EX2XY Algorithm (contd.)
  • Used in a pair of procedures P1 and P2.
  • P1 cuts vertically and submits first
    sub-rectangle to P2 for horizontal cuts.
  • Similarly with P2.

38
Parenthesized notation
  • P-notation has 11 correspondence with general
    trees.
  • For above table, the XY tree sentence is
  • Sxy c c c c c c c c c c c c.

39
A table with six Wang dimensions
40
XY2WANG Other features
  • Handles more complex scenarios
  • Higher dimensionality.
  • Deeper nesting of headers.
  • Repetitive headers.

41
(http//www40.statcan.ca/l01/cst01/econ50-eng.htm)
42
Table Augmentations Example
43
Raghavs Experiment
44
Results
45
Results (Contd.)
46
Conclusion
  • Average total time to process a table - 231
    seconds.
  • Average table size - 587 cells before
    preprocessing.
  • Average preprocessing time - 104 seconds.
  • 3 category tables took approximately 27 seconds
    more than 2 category tables.

47
Conclusion (Contd.)
  • Tables with aggregates and footnotes - more time
    to process.
  • Strong correlation between processing time and
    table size.
  • For future automatically segmenting
    augmentations, categories and delta cells using
    visual cues.
Write a Comment
User Comments (0)
About PowerShow.com