Title: Semi-Automatic Semantic Annotation for Hidden-Web Tables
1Semi-Automatic Semantic Annotation for Hidden-Web
Tables
- Cui Tao David W. Embley
- Data Extraction Research Group
- Department of Computer Science
- Brigham Young University
Supported by NSF
2Semantic Annotation
- The Hidden Web
- Hidden behind forms
- Hard to query
3Semantic Annotation
- The Hidden Web
- Hidden behind forms
- Hard to query
to find the protein and the animo-acids
information for gene cdk-4"
4Semantic Annotation
- The Hidden Web
- Hidden behind forms
- Hard to query
- Semantic annotation
- Machine-understandable
- Publicly accessible
5System Overview
- Initial semantic annotation
- Manually annotate a sample page
- With respect to a selected ontology
- Table interpretation
- Automatic
- Tables from hidden web pages
- Final semantic annotation
- Automatic
- Annotate interpreted tables
6Initial Semantic Annotation
- SMORE Semantic Markup, Ontology and RDF Editor
Maryland information and network dynamics lab
7(No Transcript)
8Table Interpretation
- Table interpretation
- Locate label and value
- Pair label-value pairs
- Remember path
- TISP Table Interpretation by Sibling Pages
9TISP
10Interpretation Technique Sibling Page Comparison
Same
11Interpretation Technique Sibling Page Comparison
Almost Same
12Interpretation Technique Sibling Page Comparison
Different
Same
13Interpretation Technique Sibling Page Comparison
Structure Pattern of a Table
Label Path Identification.Gene model(s).Gene
Model
Xpath html1//table3/tr1/td2/table1/tr
6/td2/table1/tr2/td1
14Annotation
Protein Name
Protein Name
Protein Name
Protein Name
Protein Name
15Annotation Split
Nucleotide Size
Nucleotide Size
Nucleotide Size
Nucleotide Size
Nucleotide Size
16Annotation Merge
Protein Information
Protein Information
Protein Information
17AnnotationUnion
Name
Name
18AnnotationSelection
Molecular Function
Molecular Function
19Generated RDF Annotation
20Querying Annotated Data
to find the protein and the animo-acids
information for gene cdk-4"
21Summary
- Semi-automatic semantic annotation for hidden web
tables - Facilitate large-scale annotation to the web