Title: AeroDAML%20Applying%20Information%20Extraction%20to%20Generate%20DAML%20Annotations
1AeroDAMLApplying Information Extractionto
Generate DAML Annotations
- Dr. Paul Kogut
- Lockheed Martin
- Management Data Systems
2What is Information Extraction?
Information Extraction
Text or web pages
Entities
Relationships
Co-references
Events
Linguistic Knowledge
3Extraction and Semantic Annotation
- Consumer-side extraction - 3rd party text -gt
database - Advantages
- Applicable to raw documents (most of the web)
- Disadvantages
- Must deal with full complexity of natural
language - Semantic annotation proposed to overcome
difficulty of consumer-side extraction - but
annotation is labor intensive - Producer-side extraction - authored text -gt
annotation - Advantages
- Partial-automation - reduces manual effort
- Human assisted disambiguation
- Domain customization for intranets and B2B
e-commerce - Disadvantages
- Requires manual effort to correct and add rich
set of relationships - Domain customization requires up-front effort
from the author/webmaster - Both types of extraction will coexist.
4AeroDAML Architecture
UBOT
Annotation Editor
refined annotation
basic annotation
basic annotation
Extraction to DAML Translation
DAML annotated text or web pages
DAML Ontologies
Text or web pages
Text Extraction
5Client-Server AeroDAML
- Users
- personnel who routinely produce documents (e.g.,
intelligence analysts) - personnel who have a large collection of legacy
documents
6Web-based AeroDAML
- Users
- novice/infrequent DAML annotators
- people who want to do quick/simple annotation of
a web page
7AeroDAML Output Entities
ltaacABSOLUTEDATE rdfabout"December19,1997"gt
ltdamllabelgtlt!CDATADecember 19,
1997gtlt/damllabelgt lt/aacABSOLUTEDATEgt
ltaacAIRCRAFT rdfabout"Dash8Series400"gt
ltdamllabelgtlt!CDATADash 8 Series
400gtlt/damllabelgt lt/aacAIRCRAFTgt ltaacMEASURE
rdfabout"61-foot"gt ltdamllabelgtlt!CDATA61-
footgtlt/damllabelgt lt/aacMEASUREgt
8AeroDAML Output Relationships
ltaacNATION rdfabout"Austria"gt
ltdamllabelgtlt!CDATAAustriagtlt/damllabelgt lt/aac
NATIONgt ltaacORGANIZATION rdfabout"TyroleanAir
ways"gt ltaacOrgToLoc rdfresource"Austria"/gt
ltdamllabelgtlt!CDATATyrolean
Airwaysgtlt/damllabelgt lt/aacORGANIZATIONgt
9AeroDAML Output Co-reference
ltaacPERSON rdfabout"PierreLortie"gt
ltaacPersToOrg rdfresource"BombardierRegionalAir
craft"/gt ltdamlequivalentTo rdfresource"Lortie"/
gt ltdamllabelgtlt!CDATAPierre
Lortiegtlt/damllabelgt lt/aacPERSONgt
ltaacPERSON rdfabout"Lortie"gt
ltdamllabelgtlt!CDATALortiegtlt/damllabelgt
lt/aacPERSONgt
10AeroDAML Plans
- Integrate with annotation editor
- Improve Web-based AeroDAML
- Allow user to select other ontologies besides the
current AeroDAML default ontology for annotation
generation - OpenCyc or Cyc Upper Ontology
- CIA World Fact Book
- IEEE Standard Upper Ontology
- Dublin Core
- UNSPSC...
- Try AeroDAML!
- http//ubot.lockheedmartin.com/ubot/hotdaml/aeroda
ml.html