Chapter 12: Ontologies and Knowledge Representation - PowerPoint PPT Presentation

About This Presentation

Title:

Chapter 12: Ontologies and Knowledge Representation

Description:

Title: Chapter 8: XML Subject: Collaborative Data Sharing Author: zives Keywords: Principles of Data Integration Description: QDB-MUD Keynote talk Last modified by – PowerPoint PPT presentation

Number of Views:155

Avg rating:3.0/5.0

Slides: 43

Provided by: Ziv8

Learn more at: https://research.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 12: Ontologies and Knowledge Representation

1
Chapter 12 Ontologies and Knowledge
Representation
PRINCIPLES OF DATA INTEGRATION
ANHAI DOAN ALON HALEVY ZACHARY IVES
2
Outline

Introduction to Knowledge Representation and its
relevance to data integration
Description Logics a family of KR languages
The Semantic Web and its languages

3
Knowledge Representation

Knowledge representation (KR) focuses on more
expressive languages that database schemata and
integrity constraints
Designed for artificial intelligence applications
(e.g., natural language understanding, planning)
where complex relationships exist between
objects.
KR uses ontologies to represent relationships
between elements in a knowledge base.
KR is relevant to data integration because
relationships between data sources can be
complex.
The use of KR in data integration was
investigated since the early days of the field.

4
KR in Data Integration Example
Mediated schema ontology with classes and
relationships
Data sources S1 has comedies and S2 documentaries
S3 movies with at least two awards S4 comedies
with at least one Oscar
5
Example Part 1
S1 is relevant to Q1 because Comedy is a subclass
of Movie (by subsumption)
6
Example Part 2
S2 is irrelevant to Q2 because Comedy and
Documentary are declared disjoint.
7
Example Part 3(a)
S3 is relevant to Q3 because movies with two
awards will definitely satisfy the second
subgoal.
8
Example Part 3(b)
S4 is relevant to Q3 because oscar is a
sub-property of award.
9
Outline

Introduction to Knowledge Representation and its
relevance to data integration
Description Logics a family of KR languages
The Semantic Web and its languages

10
Description Logics Introduction

Description Logics are a subset of first-order
logic
Only unary predicates (called concepts) and
binary predicates (called roles, properties).
Knowledge bases are composed of
T-box defining the concepts and the roles
A-box including ground facts about individuals
Complex concepts are defined by concept
descriptions
The expressive power of the language is
determined by the set of constructors in the
grammar of concept descriptions
Complex roles can also be defined via constructors

11
T-Boxes

Can include statements of the form
A is a base concept and C can be a concept
description.
Example grammar for concept descriptions see
next slide.

(it should really be a square inclusion)
12
An example Grammar for Concept Descriptions.
C,D are complex concepts. A is a base concept.
Many other constructors possible union,
existential quantification, equality on role
paths,
13
Example Terminology
a1 Italians are people (really. Dont
laugh!) a2 Comedies are movies a3 Comedies are
disjoint from documentaries a4 Movies have at
most one director a5 Award movies are those that
have at least one award a6 Italian hits are
award movies whose director is Italian
14
Abox the Ground Facts

A set of assertions of the form C(a), or R(a,b)
b is called an R-filler of a.
C and R can be concept descriptions
Akin to asserting that a tuple is in a view
rather than in base relations
Below, we state that LaStrada is an Italian hit,
were not given the director or the award it won.

15
Semantics of Description Logics

Semantics are based on interpretations.
Given a knowledge base ?, the models of ? are the
interpretations that are consistent ?s T-box and
A-box.
Any fact that is true in all models of ? are said
to be entailed by ?.

16
Interpretations Formally

An interpretation I contains a non-empty domain
of objects, OI .
I assigns an object aI in to every constant a in
the A-box.
We make the unique names assumption a?b implies
that aI?bI
I assigns CI , a subset of OI, to every concept C
I assigns a binary relation RI, a subset of OI x
OI to every role R.

17
Extensions of Complex Expressions
The extensions of concept and role descriptions
are given by the following equations. (S denotes
the cardinality of the set S).
18
Conditions on Models

An interpretation of ? is a model if the
following conditions hold

19
Example Interpretation
Assume an interpretation with the identity
mapping on individuals in the knowledge base and
a few extra elements (Director1, Award1, Actor2,
). The following interpretation is a model
20
Example Interpretation

Notes
We do not know the director of LaStrada or its
award.
Removing LifeIsBeautiful from Comedy would make
it a non-model.
Adding another director would also make it a
non-model.

21
Inference in Description Logics

This is where all the action is coming up with
efficient algorithms for inference and
understanding the complexity of the problems.
Subsumption (only for the T-box)
A concept C is said to be subsumed by concept D
w.r.t. a T-box T, if in every model, I, of T,
Examples

is subsumed by
is subsumed by
22
Query answering with DLs

The simple case instance checking
Does ? entail C(a) or R(a,b)? i.e., does
C(a)/R(a,b) hold in every model of ??
The more general problem is query answering. Find
the answers to a conjunctive query
where g1,, gn can be concept and/or role names.

23
Semantics of Conjunctive Queries

Compute the answer to Q in every model of ?
Any tuple that is in the intersection of the
answers is entailed by ?.
This should remind you of the semantics of
certain answers!
Lets look at a few examples.

24
Query Answering Example 1

Consider the Q1 over the following A-box
Applying Q1 directly to the A-box would yield no
answers (award would not be matched)
However, ItalianHits(LifeIsBeautiful) implies
that LifeIsBeautiful won at least one award.
Hence, LifeIsBeautiful should be in the answer!

25
Query Answering Example 2

Consider Q2
With the following A-box
Comedy(LaFunivia), director(LaFunivia,Antonioni),
Italian(Antonioni)
Neither conjunctive query will yield an answer
because we know nothing about awards.
However, we can reason by cases that the
following is entailed by Q2.

26
End of Example 2

Ok, we have
But thats not enough to infer that LaFunivia
should be in the answer.
However, we also know that movies have at most
one director, so
Hence, LaFunivia is an answer to Q2.

27
Comparing DLs to OODB

Object-oriented databases
Also focus on unary and binary relations
OODBs are more focused on modeling the physical
aspects of objects and their properties
An object can only belong to a single (most
specific) class.
Description logics are about knowledge and
complex relationships
Class membership can be inferred
An individual can belong to multiple classes.

28
Comparing DLs to Relational Views

In principle, concept descriptions are view
definitions
Relational views employ selection, projection,
join, union and apply to more than unary and
binary relations.
DLs universal quantification, number
restrictions, intersection,
Subsumption query containment
Universal quantification and number restrictions
would require negation in conjunctive queries.
Hence containment would be undecidable
In DLs you can put facts directly in views
(i.e., complex concept).

29
Outline

Introduction to Knowledge Representation and its
relevance to data integration
Description Logics a family of KR languages
The Semantic Web and its languages

30
The Semantic Web

Basic idea annotate content on Web pages with
semantics
Specify that a web page is about a restaurant,
where the address appears on the page, and what
are the menu items.
On a page with restaurant reviews, mark the
restaurants with a global identifier so the
review and restaurant data can be fused.
Without these annotations, systems need to infer
this correlations and are often wrong.

31
Languages, Languages

RDF Resource Description Framework
Language for marking up data
Triples with a few cool features
RDF Schema (RDFS) basic schema for RDF documents
OWL Web Ontology Language. Comes in multiple
flavors
Owl-Lite
Owl-DL, Owl-Full
All these languages are influenced by KR
formalisms (some more and some less)

32
RDF Basics

RDF triples are statements about resources
They are of the form
(subject, predicate, value)
Names can get long (because they can be URLs), so
we often use qnames (qualified names)
ex instead of http//www.example.org/

33
RDF as a Graph
Uniform Resource Identifiers available beyond a
single data set.
34
Uniform Resource Identifiers

In a typical database, identifiers are used only
internally. They have no meaning outside the
database.
RDF uses URIs for subjects, predicates and
optionally for values
Hence, multiple data sets can refer to the same
identifier.
Key benefit for data integration!
Note this does not entail standardization!
Youre free to invent your own, but youre
encouraged to reuse existing URIs so your data
meshes well with others

35
Blank Nodes
Blank node
You can assign IDs to blank nodes, but they are
internal to a document.
36
Reification

Reification is a way of stating statements about
statements
Provenance, uncertainty, date asserted,
To reify, make the statement itself into a
resource
Once reified, you can state its properties

37
RDF Schema

Enables declaring classes, subclass hierarchies,
membership in a class, and restrictions on
domains and ranges of classes.
Important a class can be an instance of another
class!

38
RDFS Declaring Properties

RDFS you can declare sub-properties, domains and
ranges of properties.

39
OWL Web Ontology Language

Languages based on description logics but without
the unique-names assumption
sameAs and differentFrom specify whether two
individuals are the same/different.
OWL-Lite intersection, number restrictions (but
only with 0 or 1), universal and existential
quantification on properties.
OWL-DL union, complement, disjointness, number
restrictions, enumeration (Sunday, Monday..),
hasValue (filler for property value), and more.
OWL-Full OWL-DL reification.

40
SparQL Querying RDF

Language based on matching of triples
Borrows ideas from conjunctive queries and XQuery

Result
41
SparQL The Construct Clause
42
Summary of Chapter 12

Knowledge Representation enables modeling complex
relationships between classes and objects.
The languages of the Semantic Web apply these
ideas to the Web context and with URIs.
The constant challenge the tradeoff between
expressive power and computational complexity of
reasoning
Question to ponder Can we live with a fast
reasoning algorithm that misses some derivations
occasionally?