Ontology Based Extraction of RDF Data from the World Wide Web

About This Presentation
Title:

Ontology Based Extraction of RDF Data from the World Wide Web

Description:

Has a huge amount of existing information. Designed primarily for human consumption ... Advance in levels, grab weapons, and unlock new levels. and characters. br ... –

Number of Views:60
Avg rating:3.0/5.0
Slides: 18
Provided by: deg7
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Ontology Based Extraction of RDF Data from the World Wide Web


1
Ontology Based Extraction of RDF Data from the
World Wide Web
  • Tim Chartrand
  • Masters Thesis
  • Research Supported By NSF

2
Introduction
  • World Wide Web
  • Has a huge amount of existing information
  • Designed primarily for human consumption
  • Semantic Web
  • Is an extension of WWW
  • Gives information a well-defined meaning
  • Allows automation of tasks
  • DEG contribution Extract data from the WWW
  • Solution
  • Extract Semantic Web data from the WWW
  • Superimpose extracted data

3
Research Overview
4
RDF What is it?
  • Resource Description Framework
  • Language of the Semantic Web
  • Set of ltsubjectgtltpredicategtltobjectgt triples
  • ltmailtotim_at_cs.byu.edugtltgenealogyagegt25
  • ltmailtotim_at_cs.byu.edugtltgenealogyfatherOfgtltmail
    totyler_at_thechartrands.comgt

5
DAML
  • Core Concepts
  • damlclass defines a class
  • damlproperty defines a binary relation, has a
    value
  • rdfsdomain specifies class to which a property
    applies
  • rdfsrange specifies possible values of a
    property
  • damluniqueProperty, damlunambiguousProperty
    specify cardinality constraints for a property

6
Example Ontology
  • . . .
  • ltdamlClass rdfID"Program"gt
  • ltrdfslabelgtProgramlt/rdfslabelgt
  • lt/damlClassgt
  • ltdamlClass rdfID"OperatingSystem"gt
  • ltrdfslabelgtOperatingSystemlt/rdfslabelgt
  • lt/damlClassgt
  • . . .
  • ltdamlDatatypeProperty rdfID"Name"gt
  • ltrdftype rdfresource"damlUniqueProperty"/
    gt
  • ltrdftype rdfresource"damlUnambiguousPrope
    rty"/gt
  • ltrdfsdomain rdfresource"Program"/gt
  • ltrdfsrange rdfresource"rdfsLiteral"/gt
  • lt/damlDatatypePropertygt
  • ltdamlProperty rdfID"supportsOperatingSystem"gt
  • ltrdfsdomain rdfresource"Program"/gt
  • ltrdfsrange rdfresource"OperatingSystem"/gt
  • lt/damlPropertygt
  • . . .

7
DAML ? OSM
  • Class ? Non-lexical object set
  • Property ? Binary relationship set between object
    sets
  • Literal property ? Lexical object set and binary
    relationship set between non-lexical and lexical
    object sets
  • Cardinality restriction ? Participation constraint

8
DAML ? OSM
  • ltdamlClass rdfID"Program"gt
  • ltrdfslabelgtProgramlt/rdfslabelgt
  • lt/damlClassgt
  • ltdamlClass rdfID"OperatingSystem"gt
  • ltrdfslabelgtOperatingSystemlt/rdfslabelgt
  • lt/damlClassgt
  • . . .
  • ltdamlDatatypeProperty rdfID"Name"gt
  • ltrdftype rdfresource"damlUniqueProperty"/gt
  • ltrdftyperdfresource"damlUnambiguousPropert
    y"/gt
  • ltrdfsdomain rdfresource"Program"/gt
  • ltrdfsrange rdfresource"rdfsLiteral"/gt
  • lt/damlDatatypePropertygt
  • ltdamlProperty rdfID"supportsOperatingSystem"gt
  • ltrdfsdomain rdfresource"Program"/gt
  • ltrdfsrange rdfresource"OperatingSystem"/gt
  • lt/damlPropertygt

9
Data Frames
  • Lexical object sets need data frames.
  • Use data-frame library
  • Match lexical object sets with data frames
  • Compare stemmed names and aliases
  • Levenshtein edit distance
  • Soundex
  • Longest common subsequence
  • Weighted average
  • Specialization heuristic
  • Choose most similar data frame (above a threshold)

10
User Modification
  • Provide graphical ontology editor
  • Automate graph layout
  • Allow the user to edit participation constraints
  • Allow user to edit data-frame mapping
  • Provide data frame editor

11
Extracting the Data
12
Pointing to the Data
lthtmlgt . . . ltbodygt lttablegt
lttrgt lttdgt
lta href"..."gtltbgtStick
Death 1.0lt/bgtlt/agtltbr /gt
Advance in levels, grab weapons, and unlock new
levels and
characters.ltbr /gt
ltbgtOSlt/bgt Windows 3.x/95/98/Me/NT/2000/XPltbr /gt
ltbgtFile
Sizelt/bgt2.66MBltbr /gt
ltbgtLicenselt/bgtFreeltbr /gt
lt/tdgt lttdgt05/14/2002ltbr /gt
ltigtltbgtnewlt/bgtlt/igt
lt/tdgt
lttdgtlt/tdgt lttdgt2,235lt/tdgt
lttdgtlta href"..."gtDownload
nowlt/agtltbr /gtltbr /gtlt/tdgt lt/trgt
. . .
xpointer(string-range(/html1/body1/table1/tr
1, , 10, 3))
13
Convert to RDF
14
Superimposed Data
15
Results
  • RDF Data Extraction and Viewing
  • Built 4 data-extraction ontologies
  • 3 from DAML ontologies for data extraction
  • 1 from an existing DAML ontology
  • Most existing DAML ontologies not good for data
    extraction
  • Data Frame Matcher
  • 8 training ontologies, 16 test ontologies
  • 128 lexical object sets, 40 correct matches, 12
    incorrect matches
  • Precision 77
  • Recall 89
  • Experiment (apartment rentals) 6 students 3 data
    frames
  • Phone 2.8 min
  • RentalRate 16.5 min
  • Bedrooms 17.5 min

16
Contributions
  • Advancement of Semantic Web
  • Application of Information Extraction to building
    Semantic Web content
  • Semantic Web data as superimposed information
  • Algorithm for ontology conversion

17
Future Work
  • Data extraction
  • Enhance name matcher with data values
  • Support n-ary relationship sets
  • RDF data generation
  • Generate only one URI for an object
  • Associate concepts from DAML ontologies to
    well-known DAML ontologies
Write a Comment
User Comments (0)
About PowerShow.com