Chapter 12: Ontologies and Knowledge Representation - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 12: Ontologies and Knowledge Representation

Description:

Title: Chapter 8: XML Subject: Collaborative Data Sharing Author: zives Keywords: Principles of Data Integration Description: QDB-MUD Keynote talk Last modified by – PowerPoint PPT presentation

Number of Views:152
Avg rating:3.0/5.0
Slides: 43
Provided by: Ziv8
Category:

less

Transcript and Presenter's Notes

Title: Chapter 12: Ontologies and Knowledge Representation


1
Chapter 12 Ontologies and Knowledge
Representation
PRINCIPLES OF DATA INTEGRATION
ANHAI DOAN ALON HALEVY ZACHARY IVES
2
Outline
  • Introduction to Knowledge Representation and its
    relevance to data integration
  • Description Logics a family of KR languages
  • The Semantic Web and its languages

3
Knowledge Representation
  • Knowledge representation (KR) focuses on more
    expressive languages that database schemata and
    integrity constraints
  • Designed for artificial intelligence applications
    (e.g., natural language understanding, planning)
    where complex relationships exist between
    objects.
  • KR uses ontologies to represent relationships
    between elements in a knowledge base.
  • KR is relevant to data integration because
    relationships between data sources can be
    complex.
  • The use of KR in data integration was
    investigated since the early days of the field.

4
KR in Data Integration Example
Mediated schema ontology with classes and
relationships
Data sources S1 has comedies and S2 documentaries
S3 movies with at least two awards S4 comedies
with at least one Oscar
5
Example Part 1
S1 is relevant to Q1 because Comedy is a subclass
of Movie (by subsumption)
6
Example Part 2
S2 is irrelevant to Q2 because Comedy and
Documentary are declared disjoint.
7
Example Part 3(a)
S3 is relevant to Q3 because movies with two
awards will definitely satisfy the second
subgoal.
8
Example Part 3(b)
S4 is relevant to Q3 because oscar is a
sub-property of award.
9
Outline
  • Introduction to Knowledge Representation and its
    relevance to data integration
  • Description Logics a family of KR languages
  • The Semantic Web and its languages

10
Description Logics Introduction
  • Description Logics are a subset of first-order
    logic
  • Only unary predicates (called concepts) and
    binary predicates (called roles, properties).
  • Knowledge bases are composed of
  • T-box defining the concepts and the roles
  • A-box including ground facts about individuals
  • Complex concepts are defined by concept
    descriptions
  • The expressive power of the language is
    determined by the set of constructors in the
    grammar of concept descriptions
  • Complex roles can also be defined via constructors

11
T-Boxes
  • Can include statements of the form
  • A is a base concept and C can be a concept
    description.
  • Example grammar for concept descriptions see
    next slide.

(it should really be a square inclusion)
12
An example Grammar for Concept Descriptions.
C,D are complex concepts. A is a base concept.
Many other constructors possible union,
existential quantification, equality on role
paths,
13
Example Terminology
a1 Italians are people (really. Dont
laugh!) a2 Comedies are movies a3 Comedies are
disjoint from documentaries a4 Movies have at
most one director a5 Award movies are those that
have at least one award a6 Italian hits are
award movies whose director is Italian
14
Abox the Ground Facts
  • A set of assertions of the form C(a), or R(a,b)
  • b is called an R-filler of a.
  • C and R can be concept descriptions
  • Akin to asserting that a tuple is in a view
    rather than in base relations
  • Below, we state that LaStrada is an Italian hit,
    were not given the director or the award it won.

15
Semantics of Description Logics
  • Semantics are based on interpretations.
  • Given a knowledge base ?, the models of ? are the
    interpretations that are consistent ?s T-box and
    A-box.
  • Any fact that is true in all models of ? are said
    to be entailed by ?.

16
Interpretations Formally
  • An interpretation I contains a non-empty domain
    of objects, OI .
  • I assigns an object aI in to every constant a in
    the A-box.
  • We make the unique names assumption a?b implies
    that aI?bI
  • I assigns CI , a subset of OI, to every concept C
  • I assigns a binary relation RI, a subset of OI x
    OI to every role R.

17
Extensions of Complex Expressions
The extensions of concept and role descriptions
are given by the following equations. (S denotes
the cardinality of the set S).
18
Conditions on Models
  • An interpretation of ? is a model if the
    following conditions hold

19
Example Interpretation
Assume an interpretation with the identity
mapping on individuals in the knowledge base and
a few extra elements (Director1, Award1, Actor2,
). The following interpretation is a model
20
Example Interpretation
  • Notes
  • We do not know the director of LaStrada or its
    award.
  • Removing LifeIsBeautiful from Comedy would make
    it a non-model.
  • Adding another director would also make it a
    non-model.

21
Inference in Description Logics
  • This is where all the action is coming up with
    efficient algorithms for inference and
    understanding the complexity of the problems.
  • Subsumption (only for the T-box)
  • A concept C is said to be subsumed by concept D
    w.r.t. a T-box T, if in every model, I, of T,
  • Examples

is subsumed by
is subsumed by
22
Query answering with DLs
  • The simple case instance checking
  • Does ? entail C(a) or R(a,b)? i.e., does
    C(a)/R(a,b) hold in every model of ??
  • The more general problem is query answering. Find
    the answers to a conjunctive query
  • where g1,, gn can be concept and/or role names.

23
Semantics of Conjunctive Queries
  • Compute the answer to Q in every model of ?
  • Any tuple that is in the intersection of the
    answers is entailed by ?.
  • This should remind you of the semantics of
    certain answers!
  • Lets look at a few examples.

24
Query Answering Example 1
  • Consider the Q1 over the following A-box
  • Applying Q1 directly to the A-box would yield no
    answers (award would not be matched)
  • However, ItalianHits(LifeIsBeautiful) implies
    that LifeIsBeautiful won at least one award.
  • Hence, LifeIsBeautiful should be in the answer!

25
Query Answering Example 2
  • Consider Q2
  • With the following A-box
  • Comedy(LaFunivia), director(LaFunivia,Antonioni),
    Italian(Antonioni)
  • Neither conjunctive query will yield an answer
    because we know nothing about awards.
  • However, we can reason by cases that the
    following is entailed by Q2.

26
End of Example 2
  • Ok, we have
  • But thats not enough to infer that LaFunivia
    should be in the answer.
  • However, we also know that movies have at most
    one director, so
  • Hence, LaFunivia is an answer to Q2.

27
Comparing DLs to OODB
  • Object-oriented databases
  • Also focus on unary and binary relations
  • OODBs are more focused on modeling the physical
    aspects of objects and their properties
  • An object can only belong to a single (most
    specific) class.
  • Description logics are about knowledge and
    complex relationships
  • Class membership can be inferred
  • An individual can belong to multiple classes.

28
Comparing DLs to Relational Views
  • In principle, concept descriptions are view
    definitions
  • Relational views employ selection, projection,
    join, union and apply to more than unary and
    binary relations.
  • DLs universal quantification, number
    restrictions, intersection,
  • Subsumption query containment
  • Universal quantification and number restrictions
    would require negation in conjunctive queries.
    Hence containment would be undecidable
  • In DLs you can put facts directly in views
    (i.e., complex concept).

29
Outline
  • Introduction to Knowledge Representation and its
    relevance to data integration
  • Description Logics a family of KR languages
  • The Semantic Web and its languages

30
The Semantic Web
  • Basic idea annotate content on Web pages with
    semantics
  • Specify that a web page is about a restaurant,
    where the address appears on the page, and what
    are the menu items.
  • On a page with restaurant reviews, mark the
    restaurants with a global identifier so the
    review and restaurant data can be fused.
  • Without these annotations, systems need to infer
    this correlations and are often wrong.

31
Languages, Languages
  • RDF Resource Description Framework
  • Language for marking up data
  • Triples with a few cool features
  • RDF Schema (RDFS) basic schema for RDF documents
  • OWL Web Ontology Language. Comes in multiple
    flavors
  • Owl-Lite
  • Owl-DL, Owl-Full
  • All these languages are influenced by KR
    formalisms (some more and some less)

32
RDF Basics
  • RDF triples are statements about resources
  • They are of the form
  • (subject, predicate, value)
  • Names can get long (because they can be URLs), so
    we often use qnames (qualified names)
  • ex instead of http//www.example.org/

33
RDF as a Graph
Uniform Resource Identifiers available beyond a
single data set.
34
Uniform Resource Identifiers
  • In a typical database, identifiers are used only
    internally. They have no meaning outside the
    database.
  • RDF uses URIs for subjects, predicates and
    optionally for values
  • Hence, multiple data sets can refer to the same
    identifier.
  • Key benefit for data integration!
  • Note this does not entail standardization!
  • Youre free to invent your own, but youre
    encouraged to reuse existing URIs so your data
    meshes well with others

35
Blank Nodes
Blank node
You can assign IDs to blank nodes, but they are
internal to a document.
36
Reification
  • Reification is a way of stating statements about
    statements
  • Provenance, uncertainty, date asserted,
  • To reify, make the statement itself into a
    resource
  • Once reified, you can state its properties

37
RDF Schema
  • Enables declaring classes, subclass hierarchies,
    membership in a class, and restrictions on
    domains and ranges of classes.
  • Important a class can be an instance of another
    class!

38
RDFS Declaring Properties
  • RDFS you can declare sub-properties, domains and
    ranges of properties.

39
OWL Web Ontology Language
  • Languages based on description logics but without
    the unique-names assumption
  • sameAs and differentFrom specify whether two
    individuals are the same/different.
  • OWL-Lite intersection, number restrictions (but
    only with 0 or 1), universal and existential
    quantification on properties.
  • OWL-DL union, complement, disjointness, number
    restrictions, enumeration (Sunday, Monday..),
    hasValue (filler for property value), and more.
  • OWL-Full OWL-DL reification.

40
SparQL Querying RDF
  • Language based on matching of triples
  • Borrows ideas from conjunctive queries and XQuery

Result
41
SparQL The Construct Clause
42
Summary of Chapter 12
  • Knowledge Representation enables modeling complex
    relationships between classes and objects.
  • The languages of the Semantic Web apply these
    ideas to the Web context and with URIs.
  • The constant challenge the tradeoff between
    expressive power and computational complexity of
    reasoning
  • Question to ponder Can we live with a fast
    reasoning algorithm that misses some derivations
    occasionally?
Write a Comment
User Comments (0)
About PowerShow.com