Title: Semantic Understanding
1Semantic Understanding
- An Approach Based on
- Information-Extraction Ontologies
David W. Embley Brigham Young University
2Presentation Outline
- Grand Challenge
- Meaning, Knowledge, Information, Data
- Fun and Games with Data
- Information Extraction Ontologies
- Applications
- Limitations and Pragmatics
- Summary and Challenges
3Grand Challenge
Semantic Understanding
4Grand Challenge
Semantic Understanding
If ever there were a technology that could
generate trillions of dollars in savings
worldwide , it would be the technology that
makes business information systems
interoperable. (Jeffrey T. Pollock, VP of
Technology Strategy, Modulant Solutions)
5Grand Challenge
Semantic Understanding
The Semantic Web content that is meaningful
to computers and that will unleash a revolution
of new possibilities Properly designed, the
Semantic Web can assist the evolution of human
knowledge (Tim Berners-Lee, , Weaving the
Web)
6Grand Challenge
Semantic Understanding
20th Century Data Processing 21st Century
Data Exchange The issue now is mutual
understanding. (Stefano Spaccapietra, Editor in
Chief, Journal on Data Semantics)
7Grand Challenge
Semantic Understanding
The Grand Challenge of semantic understanding
has become mission critical. Current solutions
wont scale. Businesses need economic growth
dependent on the web working and scaling (cost
1 trillion/year). (Michael Brodie, Chief
Scientist, Verizon Communications)
8What is Semantic Understanding?
Semantics The meaning or the interpretation of
a word, sentence, or other language form.
Understanding To grasp or comprehend
whats intended or expressed.
- Dictionary.com
9Can We Achieve Semantic Understanding?
A computer doesnt truly understand anything.
But computers can manipulate terms in ways that
are useful and meaningful to the human user.
- Tim Berners-Lee
Key Point it only has to be good enough. And
thats our challenge and our opportunity!
10Presentation Outline
- Grand Challenge
- Meaning, Knowledge, Information, Data
- Fun and Games with Data
- Information Extraction Ontologies
- Applications
- Limitations and Pragmatics
- Summary and Challenges
11Information Value Chain
Translating data into meaning
12Foundational Definitions
- Meaning knowledge that is relevant or activates
- Knowledge information with a degree of certainty
or community agreement - Information data in a conceptual framework
- Data attribute-value pairs
- Adapted from Meadow92
13Foundational Definitions
- Meaning knowledge that is relevant or activates
- Knowledge information with a degree of certainty
or community agreement (ontology) - Information data in a conceptual framework
- Data attribute-value pairs
- Adapted from Meadow92
14Foundational Definitions
- Meaning knowledge that is relevant or activates
- Knowledge information with a degree of certainty
or community agreement (ontology) - Information data in a conceptual framework
- Data attribute-value pairs
- Adapted from Meadow92
15Foundational Definitions
- Meaning knowledge that is relevant or activates
- Knowledge information with a degree of certainty
or community agreement (ontology) - Information data in a conceptual framework
- Data attribute-value pairs
- Adapted from Meadow92
16Data
- Attribute-Value Pairs
- Fundamental for information
- Thus, fundamental for knowledge meaning
17Data
- Attribute-Value Pairs
- Fundamental for information
- Thus, fundamental for knowledge meaning
- Data Frame
- Extensive knowledge about a data item
- Everyday data currency, dates, time, weights
measures - Textual appearance, units, context, operators,
I/O conversion - Abstract data type with an extended framework
18Presentation Outline
- Grand Challenge
- Meaning, Knowledge, Information, Data
- Fun and Games with Data
- Information Extraction Ontologies
- Applications
- Limitations and Pragmatics
- Summary and Challenges
19?
Olympus C-750 Ultra Zoom Sensor Resolution 4.2
megapixels Optical Zoom 10 x Digital Zoom 4
x Installed Memory 16 MB Lens Aperture F/8-2.8/3
.7 Focal Length min 6.3 mm Focal Length
max 63.0 mm
20?
Olympus C-750 Ultra Zoom Sensor Resolution 4.2
megapixels Optical Zoom 10 x Digital Zoom 4
x Installed Memory 16 MB Lens Aperture F/8-2.8/3
.7 Focal Length min 6.3 mm Focal Length
max 63.0 mm
21?
Olympus C-750 Ultra Zoom Sensor Resolution 4.2
megapixels Optical Zoom 10 x Digital Zoom 4
x Installed Memory 16 MB Lens Aperture F/8-2.8/3
.7 Focal Length min 6.3 mm Focal Length
max 63.0 mm
22?
Olympus C-750 Ultra Zoom Sensor Resolution 4.2
megapixels Optical Zoom 10 x Digital Zoom 4
x Installed Memory 16 MB Lens Aperture F/8-2.8/3.7
Focal Length min 6.3 mm Focal Length max 63.0 mm
23Digital Camera
Olympus C-750 Ultra Zoom Sensor Resolution 4.2
megapixels Optical Zoom 10 x Digital Zoom 4
x Installed Memory 16 MB Lens Aperture F/8-2.8/3
.7 Focal Length min 6.3 mm Focal Length
max 63.0 mm
24?
Year 2002 Make Ford Model Thunderbird Mileage
5,500 miles Features Red ABS 6 CD
changer keyless entry Price 33,000 Phone (916
) 972-9117
25?
Year 2002 Make Ford Model Thunderbird Mileage
5,500 miles Features Red ABS 6 CD
changer keyless entry Price 33,000 Phone (916
) 972-9117
26?
Year 2002 Make Ford Model Thunderbird Mileage
5,500 miles Features Red ABS 6 CD
changer keyless entry Price 33,000 Phone (916
) 972-9117
27?
Year 2002 Make Ford Model Thunderbird Mileage
5,500 miles Features Red ABS 6 CD
changer keyless entry Price 33,000 Phone (916
) 972-9117
28Car Advertisement
Year 2002 Make Ford Model Thunderbird Mileage
5,500 miles Features Red ABS 6 CD
changer keyless entry Price 33,000 Phone (916
) 972-9117
29?
Flight Class From Time/Date To
Time/Date Stops Delta 16 Coach JFK
605 pm CDG 735 am 0
16 06 06
17 06 06 Delta 119 Coach CDG
1020 am JFK 100 pm 0
24 06 06
24 06 06
30?
Flight Class From Time/Date To
Time/Date Stops Delta 16 Coach JFK
605 pm CDG 735 am 0
02 01 04
03 01 04 Delta 119 Coach CDG
1020 am JFK 100 pm 0
09 01 04
09 01 04
31Airline Itinerary
Flight Class From Time/Date To
Time/Date Stops Delta 16 Coach JFK
605 pm CDG 735 am 0
02 01 04
03 01 04 Delta 119 Coach CDG
1020 am JFK 100 pm 0
09 01 04
09 01 04
32?
Monday, October 13th Group A W L T GF GA Pts. USA
3 0 0 11 1 9 Sweden 2 1 0 5 3
6 North Korea 1 2 0 3 4 3 Nigeria 0 3 0
0 11 0 Group B W L T GF GA Pts. Brazil 2 0 1
8 2 7
33?
Monday, October 13th Group A W L T GF GA Pts. USA
3 0 0 11 1 9 Sweden 2 1 0 5 3
6 North Korea 1 2 0 3 4 3 Nigeria 0 3 0
0 11 0 Group B W L T GF GA Pts. Brazil 2 0 1
8 2 7
34World Cup Soccer
Monday, October 13th Group A W L T GF GA Pts. USA
3 0 0 11 1 9 Sweden 2 1 0 5 3
6 North Korea 1 2 0 3 4 3 Nigeria 0 3 0
0 11 0 Group B W L T GF GA Pts. Brazil 2 0 1
8 2 7
35?
Calories 250 cal Distance 2.50 miles Time 23.35
minutes Incline 1.5 degrees Speed 5.2 mph Heart
Rate 125 bpm
36?
Calories 250 cal Distance 2.50 miles Time 23.35
minutes Incline 1.5 degrees Speed 5.2 mph Heart
Rate 125 bpm
37?
Calories 250 cal Distance 2.50 miles Time 23.35
minutes Incline 1.5 degrees Speed 5.2 mph Heart
Rate 125 bpm
38Treadmill Workout
Calories 250 cal Distance 2.50 miles Time 23.35
minutes Incline 1.5 degrees Speed 5.2 mph Heart
Rate 125 bpm
39?
Place Bonnie Lake County Duchesne State Utah Typ
e Lake Elevation 10,000 feet USGS Quad Mirror
Lake Latitude 40.711ºN Longitude 110.876ºW
40?
Place Bonnie Lake County Duchesne State Utah Typ
e Lake Elevation 10,000 feet USGS Quad Mirror
Lake Latitude 40.711ºN Longitude 110.876ºW
41?
Place Bonnie Lake County Duchesne State Utah Typ
e Lake Elevation 10,000 feet USGS Quad Mirror
Lake Latitude 40.711ºN Longitude 110.876ºW
42Maps
Place Bonnie Lake County Duchesne State Utah Typ
e Lake Elevation 10,100 feet USGS Quad Mirror
Lake Latitude 40.711ºN Longitude 110.876ºW
43Presentation Outline
- Grand Challenge
- Meaning, Knowledge, Information, Data
- Fun and Games with Data
- Information Extraction Ontologies
- Applications
- Limitations and Pragmatics
- Summary and Challenges
44Information Extraction Ontologies
Source
Target
Information Extraction
Information Exchange
45What is an Extraction Ontology?
- Augmented Conceptual-Model Instance
- Object relationship sets
- Constraints
- Data frame value recognizers
- Robust Wrapper (Ontology-Based Wrapper)
- Extracts information
- Works even when site changes or when new sites
come on-line
46CarAds Extraction Ontology
ltObjectSet x"329" y"51" lexical"true"
name"Mileage" id"osmx50"gt ltDataFramegt
ltInternalRepresentationgt
ltDataType typeName"String"/gt
lt/InternalRepresentationgt
ltValuePhraseListgt ltValuePhrase
hint"Mileage Pattern 1"gt
ltValueExpression color"ffffff"gt
ltExpressionTextgt1-9\d0,2kKlt/Expressio
nTextgt lt/ValueExpressiongt
ltLeftContextExpression
color"ffffff"gt
ltKeywordPhraseListgt
ltKeywordPhrase hintNew phrase 1gt
ltKeywordExpression colorffffffgt
ltExpressionTextgt\bmiles\blt/Expressi
onTextgt
ltObjectSet x"329" y"51" lexical"true"
name"Mileage" id"osmx50"gt ltDataFramegt
ltInternalRepresentationgt
ltDataType typeName"String"/gt
lt/InternalRepresentationgt
ltValuePhraseListgt ltValuePhrase
hint"Mileage Pattern 1"gt
ltValueExpression color"ffffff"gt
ltExpressionTextgt1-9\d0,2kKlt/Expressio
nTextgt lt/ValueExpressiongt
ltLeftContextExpression
color"ffffff"gt
ltKeywordPhraseListgt
ltKeywordPhrase hintNew phrase 1gt
ltKeywordExpression colorffffffgt
ltExpressionTextgt\bmiles\blt/Expressi
onTextgt
47Extraction OntologiesAn Example ofSemantic
Understanding
- Intelligent Symbol Manipulation
- Gives the Illusion of Understanding
- Obtains Meaningful and Useful Results
48Presentation Outline
- Grand Challenge
- Meaning, Knowledge, Information, Data
- Fun and Games with Data
- Information Extraction Ontologies
- Applications
- Limitations and Pragmatics
- Summary and Challenges
49A Variety of Applications
- Information Extraction
- Semantic Web Page Annotation
- Free-Form Semantic Web Queries
- Task Ontologies for Free-Form Service Requests
- High-Precision Classification
- Schema Mapping for Ontology Alignment
- Accessing the Hidden Web
- Ontology Generation
- Challenging Applications (e.g. BioInformatics)
50Application 1Information Extraction
51Constant/Keyword Recognition
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles.
Previous owner heart broken! Asking only
11,995. 1415. JERRY SEINER MIDVALE, 566-3800
or 566-3888
Descriptor/String/Position(start/end)
Year9723 MakeCHEV58 MakeCHEVY59 ModelCav
alier1118 FeatureRed2123 Feature5
spd2630 Mileage7,0003842 KEYWORD(Mileage)mil
es4448 Price11,995100105 Mileage11,9951001
05 PhoneNr566-3800136143 PhoneNr566-38881481
55
52Heuristics
- Keyword proximity
- Subsumed and overlapping constants
- Functional relationships
- Nonfunctional relationships
- First occurrence without constraint violation
53Keyword Proximity
Year9723 MakeCHEV58 MakeCHEVY59 ModelCav
alier1118 FeatureRed2123 Feature5
spd2630 Mileage7,0003842 KEYWORD(Mileage)mil
es4448 Price11,995100105 Mileage11,9951001
05 PhoneNr566-3800136143 PhoneNr566-38881481
55
D 2
D 52
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles
on her. Previous owner heart broken! Asking
only 11,995. 1415. JERRY SEINER MIDVALE,
566-3800 or 566-3888
54Subsumed/Overlapping Constants
Year9723 MakeCHEV58 MakeCHEVY59 ModelCav
alier1118 FeatureRed2123 Feature5
spd2630 Mileage7,0003842 KEYWORD(Mileage)mil
es4448 Price11,995100105 Mileage11,9951001
05 PhoneNr566-3800136143 PhoneNr566-38881481
55
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles.
Previous owner heart broken! Asking only
11,995. 1415. JERRY SEINER MIDVALE, 566-3800
or 566-3888
55Functional Relationships
Year9723 MakeCHEV58 MakeCHEVY59 ModelCav
alier1118 FeatureRed2123 Feature5
spd2630 Mileage7,0003842 KEYWORD(Mileage)mil
es4448 Price11,995100105 Mileage11,9951001
05 PhoneNr566-3800136143 PhoneNr566-38881481
55
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles
on her. Previous owner heart broken! Asking
only 11,995. 1415. JERRY SEINER MIDVALE,
566-3800 or 566-3888
56Nonfunctional Relationships
Year9723 MakeCHEV58 MakeCHEVY59 ModelCav
alier1118 FeatureRed2123 Feature5
spd2630 Mileage7,0003842 KEYWORD(Mileage)mil
es4448 Price11,995100105 Mileage11,9951001
05 PhoneNr566-3800136143 PhoneNr566-38881481
55
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles
on her. Previous owner heart broken! Asking
only 11,995. 1415. JERRY SEINER MIDVALE,
566-3800 or 566-3888
57First Occurrence without Constraint Violation
Year9723 MakeCHEV58 MakeCHEVY59 ModelCav
alier1118 FeatureRed2123 Feature5
spd2630 Mileage7,0003842 KEYWORD(Mileage)mil
es4448 Price11,995100105 Mileage11,9951001
05 PhoneNr566-3800136143 PhoneNr566-38881481
55
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles
on her. Previous owner heart broken! Asking
only 11,995. 1415. JERRY SEINER MIDVALE,
566-3800 or 566-3888
58Database-Instance Generator
Year9723 MakeCHEV58 MakeCHEVY59 ModelCav
alier1118 FeatureRed2123 Feature5
spd2630 Mileage7,0003842 KEYWORD(Mileage)mil
es4448 Price11,995100105 Mileage11,9951001
05 PhoneNr566-3800136143 PhoneNr566-38881481
55
insert into Car values(1001, 97, CHEVY,
Cavalier, 7,000, 11,995,
556-3800) insert into CarFeature values(1001,
Red) insert into CarFeature values(1001, 5
spd)
59Application 2Semantic Web Page Annotation
60Annotated Web Page
61OWL
- ltowlClass rdfID"CarAds"gt
- ltrdfslabel xmllang"en"gtCarAdslt/rdfslabelgt
- ......
- ltrdfssubClassOfgt
- ltowlRestrictiongt
- ltowlonProperty rdfresource"hasMileage"
/gt - ltowlminCardinality rdfdatatype"xsdnonNeg
ativeInteger"gt0lt/owlminCardinalitygt - lt/owlRestrictiongt
- lt/rdfssubClassOfgt
- ltrdfssubClassOfgt
- ltowlRestrictiongt
- ltowlonProperty rdfresource"hasMileage"
/gt - ltowlmaxCardinality
rdfdatatype"xsdnonNegativeInteger"gt1lt/owlmaxC
ardinalitygt - lt/owlRestrictiongt
- lt/rdfssubClassOfgt
- ltrdfssubClassOfgt
- ltowlRestrictiongt
- ltowlonProperty rdfresource"hasMile
age" /gt - ltowlallValuesFrom rdfresource"Mile
age" /gt
- ltCarAds rdfID"CarAdsIns2"gt
- ltCarAdsValue rdfdatatype"xsdstring"gt2lt/CarAds
Valuegt - lt/CarAdsgt
-
- ltMileage rdfID"MileageIns2"gt
- ltStartingCharPosition rdfdatatype"xsdnonNegat
iveInteger"gt237lt/StartingCharPositiongt - ltEndingCharPosition rdfdatatype"xsdn
onNegativeInteger"gt241lt/EndingCharPositiongt - lt/Mileagegt
- .
- ltowlThing rdfabout"CarAdsIns2"gt
- lthasMake rdfresource"MakeIns2" /gt
- lthasModel rdfresource"ModelIns2" /gt
- lthasYear rdfresource"YearIns2" /gt
- lthasMileage rdfresource"MileageIns2" /gt
- lthasPhoneNr rdfresource"PhoneNrIns2" /gt
- lthasPrice rdfresource"PriceIns2" /gt
- lt/owlThinggt
-
62Application 3Free-Form Semantic Web Queries
63Step 1. Parse Query
Find me the and of all
s I want a
price
mileage
red
Nissan
1998
or newer
gt Operator
64Step 2. Find Corresponding Ontology
Find me the price and mileage of all red Nissans
I want a 1998 or newer
gt Operator
Similarity value 6
Similarity value 2
65Step 3. Formulate XQuery Expression
- Conjunctive queries run over selected
ontologys extracted values
66Step 3. Formulate XQuery Expression
- Value-phrase-matching words determine conditions
- Conditions
- Color red
- Make Nissan
- Year gt 1998
gt Operator
67Step 3. Formulate XQuery Expression
1 for doc in document("file///c/ontos/owlLib
/Car.OWL")/rdfRDF 2 for Record in
doc/owlThing 3 4 let id
substring-after(xsstring(Record/_at_rdfabout),
"CarIns") 5 let Color doc/carColor_at_rdfID
concat("ColorIns", id)/carColorValue/text() 6
let Make doc/carMake_at_rdfIDconcat("MakeI
ns", id)/carMakeValue/text() 7 let Year
doc/carYear_at_rdfIDconcat("YearIns",
id)/carYearValue/text() 8 let Price
doc/carPrice_at_rdfIDconcat("PriceIns",
id)/carPriceValue/text() 9 let Mileage
doc/carMileage_at_rdfIDconcat("MileageIns",
id)/carMileageValue/text() 10 11
where(Color"red" or empty(Color)) and 12
(Make"Nissan" or empty(Make)) and 13
(Yeargt"1998" or empty(Year)) 14 return
ltRecord ID"id"gt 15 ltPricegtPricelt/Pric
egt 16 ltMileagegtMileagelt/Mileagegt 17
ltColorgtColorlt/Colorgt 18
ltMakegtMakelt/Makegt 19 ltYeargtYearlt/Yeargt
20 lt/Recordgt
For each owlThing
Get the instance ID and extracted values
Check conditions
Return values
68Step 4. Run XQuery Expression Over
Ontologys Extracted Data
- Uses Qexo 1.7, GNUs XQuery engine for Java
- Use XSLT to transform results to HTML table
69Application 4Task Ontologies for Free-Form
Service Requests
70Example Appointment Request
71Example Car Purchase Request
72Example Apartment Request
73Application 5High-Precision Classification
74An Extraction Ontology Solution
75Density Heuristic
76Expected Values Heuristic
77Vector Space of Expected Values
D1
- OV ______ D1 D2
- Year 0.98 16 6
- Make 0.93 10 0
- Model 0.91 12 0
- Mileage 0.45 6 2
- Price 0.80 11 8
- Feature 2.10 29 0
- PhoneNr 1.15 15 11
- D1 0.996
- D2 0.567
ov
D2
78Grouping Heuristic
79Grouping
Car Ads ---------------- Year Year Make Model ----
---------- 3 Price Year Model Year ---------------
3 Make Model Mileage Year ---------------4 Model M
ileage Price Year ---------------4 Grouping
0.875
Sale Items ---------------- Year Year Year Mileage
-------------- 2 Mileage Year Price Price -------
--------3 Year Price Price Year ---------------2 P
rice Price Price Price ---------------1 Grouping
0.500
Expected Number in Group floor(? Ave
) 4 (for our example)
1-Max
Sum of Distinct 1-Max Object Sets in each
Group Number of Groups Expected Number in a
Group
80Application 6Schema Mapping forOntology
Alignment
81Problem Different Schemas
- Target Database Schema
- Car, Year, Make, Model, Mileage, Price,
PhoneNr, PhoneNr, Extension, Car, Feature - Different Source Table Schemas
- Run , Yr, Make, Model, Tran, Color, Dr
- Make, Model, Year, Colour, Price, Auto, Air
Cond., AM/FM, CD - Vehicle, Distance, Price, Mileage
- Year, Make, Model, Trim, Invoice/Retail, Engine,
Fuel Economy
82Solution Remove Internal Factoring
Discover Nesting Make, (Model, (Year, Colour,
Price, Auto, Air Cond, AM/FM, CD))
83Solution Replace Boolean Values
ACURA
ACURA
Legend
84Solution Form Attribute-Value Pairs
ACURA
ACURA
Legend
ltMake, Hondagt, ltModel, Civic EXgt, ltYear, 1995gt,
ltColour, Whitegt, ltPrice, 6300gt, ltAuto,
Autogt, ltAir Cond., Air Cond.gt, ltAM/FM, AM/FMgt,
ltCD, gt
85Solution Adjust Attribute-Value Pairs
ACURA
ACURA
Legend
ltMake, Hondagt, ltModel, Civic EXgt, ltYear, 1995gt,
ltColour, Whitegt, ltPrice, 6300gt, ltAutogt,
ltAir Condgt, ltAM/FMgt
86Solution Do Extraction
ACURA
ACURA
Legend
87Solution Infer Mappings
ACURA
ACURA
Legend
Car, Year, Make, Model, Mileage, Price,
PhoneNr, PhoneNr, Extension, Car, Feature
88Solution Infer Mappings
ACURA
ACURA
Legend
Car, Year, Make, Model, Mileage, Price,
PhoneNr, PhoneNr, Extension, Car, Feature
89Solution Do Extraction
ACURA
ACURA
Legend
pPriceTable
Car, Year, Make, Model, Mileage, Price,
PhoneNr, PhoneNr, Extension, Car, Feature
90Solution Do Extraction
ACURA
ACURA
Legend
? Colour?Feature p ColourTable U ? Auto?Feature p
Auto ß AutoTable U ? Air Cond.?Feature p Air
Cond. ß Air Cond.Table U ? AM/FM?Feature p AM/FM
ß AM/FMTable U ? CD?Featurep CDß CDTable
Yes,
Yes,
Yes,
Yes,
Car, Year, Make, Model, Mileage, Price,
PhoneNr, PhoneNr, Extension, Car, Feature
91Application 7Accessing the Hidden Web
92Obtaining Data Behind Forms
- Web information is stored in databases
- Databases are accessed through forms
- Forms are designed in various ways
93Hidden Web Extraction System
Find green cars costing no more than 9000.
Site Form
User Query
Input Analyzer
Application Extraction Ontology
Extracted Information
Retrieved Page(s)
Output Analyzer
94Application 8Ontology Generation
95TANGO Table Analysis for Generating Ontologies
- Recognize and normalize table information
- Construct mini-ontologies from tables
- Discover inter-ontology mappings
- Merge mini-ontologies into a growing ontology
96Recognize Table Information
Religion
Population Albanian
Roman Shia
Sunni Country (July 2001 est.) Orthodox
Muslim Catholic Muslim Muslim
other Afganistan 26,813,057
15
84 1 Albania
3,510,484 20 70 10
97Construct Mini-Ontology
98Discover Mappings
99Merge
100Application 9Challenging Applications(e.g.
BioInformatics)
101Large Extraction Ontologies
102Complex Semi-Structured Pages
103Additional Analysis Opportunities
- Sibling Page Comparison
- Semi-automatic Lexicon Update
- Seed Ontology Recognition
104Sibling Page Comparison
105Sibling Page Comparison
Attributes
106Sibling Page Comparison
107Sibling Page Comparison
108Semi-automatic Lexicon Update
Additional Source Species or Organisms
Additional Protein Names
109Seed Ontology Recognition
Homo sapiens human
nucleus zinc ion binding nucleic acid binding
9606
Eukaryota Metazoa Chorata Craniata Vertebrata
Euteleostomi Mammalia Eutheria Primates Catar
rhini Hominidae Homo
zinc ion binding nucleic acid binding
NP_079345
nucleus
linear
NP_079345
FLJ14299
GTTTTTGTGTT.ATAAGTGCATTAACGGCCCACATG
msdspagsnprtpessgsgsggtagpyyspyalygqrlasasalgyq
8 eight
8?p\s?12 8?p11.2 8?p11.23
hypothetical protein FLJ14299
37,?612,?680
37,?610,?585
110Seed Ontology Recognition
111Presentation Outline
- Grand Challenge
- Meaning, Knowledge, Information, Data
- Fun and Games with Data
- Information Extraction Ontologies
- Applications
- Limitations and Pragmatics
- Summary and Challenges
112Limitations and Pragmatics
- Data-Rich, Narrow Domain
- Ambiguities Context Assumptions
- Incompleteness Implicit Information
- Common Sense Requirements
- Knowledge Prerequisites
113Busiest Airport?
Chicago - 928,735 Landings (Nat. Air Traffic
Controllers Assoc.) - 931,000 Landings
(Federal Aviation Admin.) Atlanta -
58,875,694 Passengers (Sep., latest numbers
available) Memphis - 2,494,190 Metric Tons
(Airports Council Intl.)
114Busiest Airport?
Chicago - 928,735 Landings (Nat. Air Traffic
Controllers Assoc.) - 931,000 Landings
(Federal Aviation Admin.) Atlanta -
58,875,694 Passengers (Sep., latest numbers
available) Memphis - 2,494,190 Metric Tons
(Airports Council Intl.)
115Busiest Airport?
Chicago - 928,735 Landings (Nat. Air Traffic
Controllers Assoc.) - 931,000 Landings
(Federal Aviation Admin.) Atlanta -
58,875,694 Passengers (Sep., latest numbers
available) Memphis - 2,494,190 Metric Tons
(Airports Council Intl.)
116Busiest Airport?
Chicago - 928,735 Landings (Nat. Air Traffic
Controllers Assoc.) - 931,000 Landings
(Federal Aviation Admin.) Atlanta -
58,875,694 Passengers (Sep., latest numbers
available) Memphis - 2,494,190 Metric Tons
(Airports Council Intl.)
Ambiguous Whom do we
trust?
(How do they count?)
117Busiest Airport?
Chicago - 928,735 Landings (Nat. Air Traffic
Controllers Assoc.) - 931,000 Landings
(Federal Aviation Admin.) Atlanta -
58,875,694 Passengers (Sep., latest numbers
available) Memphis - 2,494,190 Metric Tons
(Airports Council Intl.)
Important qualification
118Dow Jones Industrial Average
High Low
Last Chg 30 Indus 10527.03
10321.35 10409.85 85.18 20 Transp
3038.15 2998.60 3008.16 9.83 15
Utils 268.78 264.72 266.45
1.72 66 Stocks 3022.31 2972.94
2993.12 19.65
Graphics, Icons,
119Dow Jones Industrial Average
High Low
Last Chg 30 Indus 10527.03
10321.35 10409.85 85.18 20 Transp
3038.15 2998.60 3008.16 9.83 15
Utils 268.78 264.72 266.45
1.72 66 Stocks 3022.31 2972.94
2993.12 19.65
120Presentation Outline
- Grand Challenge
- Meaning, Knowledge, Information, Data
- Fun and Games with Data
- Information Extraction Ontologies
- Applications
- Limitations and Pragmatics
- Summary and Challenges
121Some Key Ideas
- Data, Information, and Knowledge
- Data Frames
- Knowledge about everyday data items
- Recognizers for data in context
- Ontologies
- Resilient Extraction Ontologies
- Shared Conceptualizations
- Limitations and Pragmatics
122Some Research Issues
- Building a library of open source data
recognizers - Precisely finding and gathering relevant
information - Subparts of larger data
- Scattered data (linked, factored, implied)
- Data behind forms in the hidden web
- Improving concept matching
- Heuristic orchestration
- Application of NLP techniques
- Calculations, unit conversions, data
normalization, - Achieving the potential of the presented
applications
www.deg.byu.edu