Learning Probabilistic Relational Models

About This Presentation

Title:

Learning Probabilistic Relational Models

Description:

... Relational Models. Lise Getoor, Nir Friedman, Daphne Koller, and Avi Pfeffer ... PRMs conceptually extend Bayesian networks to allow the specification of a ... – PowerPoint PPT presentation

Number of Views:113

Avg rating:3.0/5.0

Slides: 40

Provided by: scSn

Category:

more less

Transcript and Presenter's Notes

Title: Learning Probabilistic Relational Models

1
Learning Probabilistic Relational Models

Lise Getoor, Nir Friedman, Daphne Koller, and Avi
Pfeffer
Represented by Chi Eunkyung
October 23, 2006

2
Contents

Introduction
Underlying framework
Relational model
Probabilistic relational model
Parameter estimation
Structure selection
Implementation and experimental results
Discussion and conclusion

3
Introduction

Most real-world data are stored in relational
DBMS
Discovering Patterns in Structured Data

4
(No Transcript)
5
Learning Statistical Models

Traditional approaches
work well with flat representations
fixed length attribute-value vectors
assume independent (IID) sample

Problems
introduces statistical skew
loses relational structure
incapable of detecting link-based patterns
must fix attributes in advance

6
Underlying framework

Relational model
Probabilistic relational model

7
Relational Model
Strain
Infected with
Unique
Infectivity
Contact
Contact-Type
Close-Contact
Patient
Skin-Test
Homeless
Age
Interacted with
HIV-Result
Ethnicity
Disease-Site

Describes the types of objects and relations in
the database

8
Probabilistic Relational Model

PRMs conceptually extend Bayesian networks to
allow the specification of a probability model
for classes of objects rather than a fixed set of
simple attributes
PRMs also allow properties of an entity to depend
probabilistically on properties of other related
entities

9
Probabilistic Relational Model
Strain
Patient
Unique
POB
Homeless
HIV-Result
Contact
Age
Disease Site
Contact-Type
Close-Contact
Transmitted
10
Probabilistic Relational Model

Combine advantages of relational logic Bayesian
networks
natural domain modeling objects, properties,
relations
generalization over a variety of situations
compact, natural probability models.
Integrate uncertainty with relational model
properties of domain entities can depend on
properties of related entities
uncertainty over relational structure of domain

11
Mapping PRMs from Relational Models

Mapping PRMs from Relational Models
A relational model consists of a set of classes
X1,,Xn and a set of relations R1,,Rm, where
each relation Ri is typed
Each class or entity type (corresponding to a
single relational table) is associated with a set
of attributes A(Xi) and a set of reference slots
R (X)

12
PRM Semantics Continued

Each attribute Aj ? A(Xi) takes on values in some
fixed domain of possible values denoted V(Aj).
We assume that value spaces are finite
Attribute A of class X is denoted X.A
For example, the Student class has an
Intelligence attribute and the value space or
domain for Student.Intelligence might be high,
low

An instance I of a schema specifies a set of
objects x, partitioned into classes such that
there is a value for each attribute x.A and a
value for each reference slot x.?
A(x) is used as a shorthand for A(X), where x is
of class X. For each object x in the instance
and each of its attributes A, we use Ix.A to
denote the value of x.A in I

Some attributes, such as name or social security
number, are fully determined. Such attributes
are labeled as fixed. Assume that they are known
in any instantiation of the schema
The other attributes are called probabilistic

15
M
1
Student
Professor
Name
Name
Intelligence
Popularity
Ranking
Teaching-Ability
1
Registration
Course
RegID
Name
Course
Instructor
M
M
Student
Rating
M
Grade
Difficulty
Satisfaction
16

A skeleton structure s of a relational schema is
a partial specification of an instance of the
schema. It specifies the set of objects Os(Xi)
for each class, the values of the fixed
attributes of these objects, and the relations
that hold between the objects
The values of probabilistic attributes are left
unspecified
A completion I of the skeleton structure s
extends the skeleton by also specifying the
values of the probabilistic attributes

17
Relational Skeleton
18
The Completion Instance I
19
Another Relational Skeleton
Student Name Jane Doe Intelligence
high Ranking average
Professor Name Prof. Vincent Popularity
??? Teaching-Ability ???
Professor Name Prof. Gump Popularity
high Teaching-Ability ???
Student Name Jane Doe Intelligence
high Ranking average
Student Name John Doe Intelligence
??? Ranking ???
Registration RegID 5639 Grade
A Satisfaction 3
Registration RegID 5639 Grade
A Satisfaction 3
PRMs allow multiple possible skeletons
Course Name Phil201 Difficulty
??? Rating ???
Registration RegID 5723 Grade
??? Satisfaction ???
20
PRM with AU Semantics
Contact c1
Strain s1
Patient p2
Contact c2
Strain s2
Patient p1
Contact c3
Patient p3
PRM
relational skeleton ?

21
Learning PRMs
Strain
Database
Patient
Contact
PRM
Strain
Patient
Contact

Parameter estimation

Structure selection

Relational Schema
22
Parameter Estimation

Assume known dependency structure S
Goal estimate PRM parameters q
entries in local probability models,
q is good if it is likely to generate the
observed data, instance I .
MLE Principle Choose q so as to maximize l

As in Bayesian network learning, crucial
property decomposition separate terms for
different X.A
23
Patient
HIV
Contact
DiseaseSite
CloseContact
Transmitted
24
Structure Selection

Idea
define scoring function
do local search over legal structures
Key Components
legal structures
evaluating different structures
structure search

25
Structure Selection

Key Components
legal structures
evaluating different structures
structure search

26
legal structures

PRM defines a coherent probability model over a
skeleton ? if the dependencies between object
attributes is acyclic

Paper P1 Accepted yes
author-of
Researcher Prof. Gump Reputation high
Paper P2 Accepted yes
sum
How do we guarantee that a PRM is acyclic for
every skeleton?
27
PRM dependency structure S
dependency graph
Paper.Accecpted
if Researcher.Reputation depends directly on
Paper.Accepted
Researcher.Reputation
Algorithm more flexible allows certain cycles
along guaranteed acyclic relations
28
Structure Selection

Key Components
legal structures
evaluating different structures
structure search

29
Evaluating different structures

The Bayesian score of a structure S is defined as
the posterior probability of the structure given
the data I

Bayesian approach

Standard approach to scoring models used in
Bayesian network learning

Using Bayes rule
P(SI,s) ? P(IS,s) P(Ss)
marginal likelihood P(IS,s)
crucial component
the effect of penalizing models with a large
number of parameters.
thus this score automatically balances the
complexity of the structure with its fit to the
data

Key Components
legal structures
evaluating different structures
Structure search

32
Structure search

greedy hill-climbing search
the simplest heuristic search algorithm
Local maxima can be dealt with using random
restarts
but
infinitely many possible structures
require expensive database operation

33
Alternative search model space

At each phase k, we have a set of potential
parents Potk(X.A) for each attribute X.A
Then apply a standard structure search restricted
to the space of structures in which the parents
of each X.A are in Potk(X.A)
Phased search
it first explores dependencies within objects,
then between objects that are directly related,
then between objects that are two links apart, etc

34
Advantage of phased search

gradually explores larger and larger fragments of
the infinitely large space,
can give priority to dependencies between objects
that are more closely related
precompute the database view corresponding to
X.A, Potk(X.A)
most of the expensive computations the joins
and aggregation required in the definition of the
parents are precomputed in these views

35
Implementation and experimental results

Simple artificial genetic database domain
Construct training set of various sizes
Compare the log-likelihood of test set of size
100,000
gold standard model
Learn parameters (model structure given)
Learn model (learn both structure and parameters)

36
(Father)
(Mother)
Person
Blood Type
Person
Blood Type
P-chromosome
P-chromosome
M-chromosome
M-chromosome
Person
P-chromosome
M-chromosome
Blood Type
Contaminated
Result
Blood Test
37
experimental results
38
Discussion and conclusion

Scaling these ideas to large database
How to determine the probability distribution
when there is an unbound variable
Treatment of missing value and hidden value,
further more automatic discovery of hidden value

We would want these techniques to helps us
automatically discover interesting entities and
relationships that hold in the world.
39
Thank you!

Any Questions?

Write a Comment

User Comments (0)

About PowerShow.com

Learning Probabilistic Relational Models - PowerPoint PPT Presentation

Learning Probabilistic Relational Models

... Relational Models. Lise Getoor, Nir Friedman, Daphne Koller, and Avi Pfeffer ... PRMs conceptually extend Bayesian networks to allow the specification of a ... – PowerPoint PPT presentation