SemiAutomatically Generating DataExtraction Ontology - PowerPoint PPT Presentation

About This Presentation
Title:

SemiAutomatically Generating DataExtraction Ontology

Description:

Location: Eastern Asia, bordering the East China Sea, Korea Bay, Yellow Sea, and ... Location: Eastern Asia {Location1}, bordering the East China Sea {Location2} ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 12
Provided by: drdavid59
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: SemiAutomatically Generating DataExtraction Ontology


1
Semi-Automatically Generating Data-Extraction
Ontology
  • Yihong Ding
  • March 6, 2001

2
Extract information from Web document
--------------------------------------------------
----------------------- -- Cars Application
Ontology -- -- Revision 1.2 -- -- Log
cars.osm,v -- Revision 1.2 1998/02/20
001555 liddl -- Cleaned up header -- --
Revision 1.1 1998/02/20 001414 liddl --
Initial revision -- Car -gt object Car 01
has Year 1 Year matches 4 constant
extract "\d2" context
"(\\d)4-9\d,\dkK"
substitute "" -gt "19" ,
extract "\d2" context
"(\\d)4-9\d,\d"
substitute "" -gt "19" ,
extract "\d2" context "\b'4-9\d\b"
substitute "" -gt "19" ,
extract "\d2" context
"(\\d)0\d,\dkK"
substitute "" -gt "20" ,
3
Ontology
Car -gt object Car 01 has Make 1 Make
matches 10 constant extract "\baudi\b"
end Car 01 has Model 1 Model matches
25 constant extract "80"
context "\baudi\S\s80\b" end Car
01 has Mileage 1 Mileage matches 8
constant extract "\b1-9\d0,2k"
substitute "kK" -gt "000" end Car
01 has Price 1 Price matches 8
constant extract "1-9\d3,6"
context "\1-9\d3,6" end
  • a computational entity, a resource containing
    knowledge about what concepts exist in the
    world and how they relate to one another
  • Components
  • Concepts
  • Domain dependent
  • Context free
  • Context sensitive
  • Domain independent
  • Context free
  • Context sensitive
  • Relationship (relational schema between the
    concepts)
  • Constraints

4
My work
  • Pre-assumptions
  • Given information knowledge base that already
    containing domain dependent and domain
    independent concepts
  • Pre-defined ontologies
  • Mikrokosmos, Gene, our ontologies, etc.
  • Component recognizers
  • date, time, price, phone number, etc.
  • Given sample training Web documents
  • Semi-automatically generate the ontology

5
Architecture
6
Example CIA Factbook
  • Country China
  • Location Eastern Asia, bordering the East China
    Sea, Korea Bay, Yellow Sea, and South China Sea,
    between North Korea and Vietnam
  • Geographic coordinates 35 00 N, 105 00 E
  • Map references Asia
  • Area
  • total 9,596,960 sq km
  • land 9,326,410 sq km
  • water 270,550 sq km

7
Partial completed ontology
  • CountryName matches 30
  • constant extract \bChina\b ,
  • extract \bUnited States\b
  • end
  • Location matches 50
  • constant extract "\bAsia\b" ,
  • extract "\bEurope\b" ,
  • extract \bYellow Sea\b ,
  • end
  • Latitude matches 10
  • constant extract "\b1-9\d0,2\b1-9\d0
    ,1(EW)" ,
  • end
  • Longitude matches 10
  • constant extract "\b1-9\d0,2\b1-9\d0
    ,1(NS)" ,
  • end
  • Country China
  • Location Eastern Asia, bordering the East China
    Sea, Korea Bay, Yellow Sea, and South China Sea,
    between North Korea and Vietnam
  • Geographic coordinates 35 00 N, 105 00 E
  • Map references Asia
  • Area
  • total 9,596,960 sq km
  • land 9,326,410 sq km
  • water 270,550 sq km

8
Raw completed ontology
  • Country China
  • Location Eastern Asia, bordering the East China
    Sea, Korea Bay, Yellow Sea, and South China Sea,
    between North Korea and Vietnam
  • Geographic coordinates 35 00 N, 105 00 E
  • Map references Asia
  • Area
  • total 9,596,960 sq km
  • land 9,326,410 sq km
  • water 270,550 sq km
  • Country -gt object
  • Country 01 has CountryName 11
  • Country 01 has Location1 1
  • ...
  • Country 01 has Location8 1
  • Country 01 has Latitude 1
  • Country 01 has Longitude 1
  • Country 01 has Number1 1
  • Country 01 has Number2 1
  • Country 01 has Number3 1
  • -- Generalization/Specializations
  • Location1 Location
  • ...
  • Location8 Location
  • Number1 Number
  • Number2 Number
  • Number3 Number

9
User control interface
  • Output to user
  • raw completed ontology
  • tagged training web pages
  • the query results
  • User may
  • modify attribute name
  • combine attributes
  • delete useless attributes
  • change relationships
  • add new attributes, new relations, and
    constraints
  • When satisfied, output the final ontology
  • Country China CountryName
  • Location Eastern Asia Location1, bordering the
    East China Sea Location2, Korea Bay
    Location3, Yellow Sea Location4, and South
    China Sea Location5, between North Korea
    Location6, and Vietnam Location7
  • Geographic coordinates 35 00 N Latitude, 105
    00 E Longitude
  • Map references Asia Location8
  • Area
  • total 9,596,960 Number1 sq km
  • land 9,326,410 Number2 sq km
  • water 270,550 Number3 sq km
  • Country China CountryName
  • Location Eastern Asia Location1, bordering the
    East China Sea Location2, Korea Bay
    Location3, Yellow Sea Location4, and South
    China Sea Location5, between North Korea
    Location6, and Vietnam Location7
  • Geographic coordinates 35 00 N Latitude, 105
    00 E Longitude
  • Map references Asia MapReference
  • Area
  • total 9,596,960 TotalArea sq km
  • land 9,326,410 LandArea sq km
  • water 270,550 WaterArea sq km
  • Country China CountryName
  • Location Eastern Asia, bordering the East China
    Sea, Korea Bay, Yellow Sea, and South China Sea,
    between North Korea, and Vietnam Location
  • Geographic coordinates 35 00 N Latitude, 105
    00 E Longitude
  • Map references Asia MapReference
  • Area
  • total 9,596,960 TotalArea sq km
  • land 9,326,410 LandArea sq km
  • water 270,550 WaterArea sq km

10
Problems
  • Obtain knowledge base
  • Classify related concepts for the sample
    documents
  • Refine
  • Tag the document based on the raw completed
    ontology
  • User interface design and control
  • Update strategy to raw completed ontology based
    on user modification

11
Contribution
  • Exploit existing knowledge
  • Semi-automatically generate an extraction ontology
Write a Comment
User Comments (0)
About PowerShow.com