Title: Learning Probabilistic Relational Models
1Learning Probabilistic Relational Models
- Lise Getoor, Nir Friedman, Daphne Koller, and Avi
Pfeffer - Represented by Chi Eunkyung
- October 23, 2006
2Contents
- Introduction
- Underlying framework
- Relational model
- Probabilistic relational model
- Parameter estimation
- Structure selection
- Implementation and experimental results
- Discussion and conclusion
3Introduction
- Most real-world data are stored in relational
DBMS - Discovering Patterns in Structured Data
4(No Transcript)
5Learning Statistical Models
- Traditional approaches
- work well with flat representations
- fixed length attribute-value vectors
- assume independent (IID) sample
- Problems
- introduces statistical skew
- loses relational structure
- incapable of detecting link-based patterns
- must fix attributes in advance
6Underlying framework
- Relational model
- Probabilistic relational model
7Relational Model
Strain
Infected with
Unique
Infectivity
Contact
Contact-Type
Close-Contact
Patient
Skin-Test
Homeless
Age
Interacted with
HIV-Result
Ethnicity
Disease-Site
- Describes the types of objects and relations in
the database
8Probabilistic Relational Model
- PRMs conceptually extend Bayesian networks to
allow the specification of a probability model
for classes of objects rather than a fixed set of
simple attributes - PRMs also allow properties of an entity to depend
probabilistically on properties of other related
entities
9Probabilistic Relational Model
Strain
Patient
Unique
POB
Homeless
HIV-Result
Contact
Age
Disease Site
Contact-Type
Close-Contact
Transmitted
10Probabilistic Relational Model
- Combine advantages of relational logic Bayesian
networks - natural domain modeling objects, properties,
relations - generalization over a variety of situations
- compact, natural probability models.
- Integrate uncertainty with relational model
- properties of domain entities can depend on
properties of related entities - uncertainty over relational structure of domain
11Mapping PRMs from Relational Models
- Mapping PRMs from Relational Models
- A relational model consists of a set of classes
X1,,Xn and a set of relations R1,,Rm, where
each relation Ri is typed - Each class or entity type (corresponding to a
single relational table) is associated with a set
of attributes A(Xi) and a set of reference slots
R (X)
12PRM Semantics Continued
- Each attribute Aj ? A(Xi) takes on values in some
fixed domain of possible values denoted V(Aj).
We assume that value spaces are finite - Attribute A of class X is denoted X.A
- For example, the Student class has an
Intelligence attribute and the value space or
domain for Student.Intelligence might be high,
low
13- An instance I of a schema specifies a set of
objects x, partitioned into classes such that
there is a value for each attribute x.A and a
value for each reference slot x.? - A(x) is used as a shorthand for A(X), where x is
of class X. For each object x in the instance
and each of its attributes A, we use Ix.A to
denote the value of x.A in I
14- Some attributes, such as name or social security
number, are fully determined. Such attributes
are labeled as fixed. Assume that they are known
in any instantiation of the schema - The other attributes are called probabilistic
15M
1
Student
Professor
Name
Name
Intelligence
Popularity
Ranking
Teaching-Ability
1
Registration
Course
RegID
Name
Course
Instructor
M
M
Student
Rating
M
Grade
Difficulty
Satisfaction
16- A skeleton structure s of a relational schema is
a partial specification of an instance of the
schema. It specifies the set of objects Os(Xi)
for each class, the values of the fixed
attributes of these objects, and the relations
that hold between the objects - The values of probabilistic attributes are left
unspecified - A completion I of the skeleton structure s
extends the skeleton by also specifying the
values of the probabilistic attributes
17Relational Skeleton
18The Completion Instance I
19Another Relational Skeleton
Student Name Jane Doe Intelligence
high Ranking average
Professor Name Prof. Vincent Popularity
??? Teaching-Ability ???
Professor Name Prof. Gump Popularity
high Teaching-Ability ???
Student Name Jane Doe Intelligence
high Ranking average
Student Name John Doe Intelligence
??? Ranking ???
Registration RegID 5639 Grade
A Satisfaction 3
Registration RegID 5639 Grade
A Satisfaction 3
PRMs allow multiple possible skeletons
Course Name Phil201 Difficulty
??? Rating ???
Registration RegID 5723 Grade
??? Satisfaction ???
20PRM with AU Semantics
Contact c1
Strain s1
Patient p2
Contact c2
Strain s2
Patient p1
Contact c3
Patient p3
PRM
relational skeleton ?
21Learning PRMs
Strain
Database
Patient
Contact
PRM
Strain
Patient
Contact
Relational Schema
22Parameter Estimation
- Assume known dependency structure S
- Goal estimate PRM parameters q
- entries in local probability models,
- q is good if it is likely to generate the
observed data, instance I . - MLE Principle Choose q so as to maximize l
As in Bayesian network learning, crucial
property decomposition separate terms for
different X.A
23Patient
HIV
Contact
DiseaseSite
CloseContact
Transmitted
24Structure Selection
- Idea
- define scoring function
- do local search over legal structures
- Key Components
- legal structures
- evaluating different structures
- structure search
25Structure Selection
- Key Components
- legal structures
- evaluating different structures
- structure search
26legal structures
- PRM defines a coherent probability model over a
skeleton ? if the dependencies between object
attributes is acyclic
Paper P1 Accepted yes
author-of
Researcher Prof. Gump Reputation high
Paper P2 Accepted yes
sum
How do we guarantee that a PRM is acyclic for
every skeleton?
27PRM dependency structure S
dependency graph
Paper.Accecpted
if Researcher.Reputation depends directly on
Paper.Accepted
Researcher.Reputation
Algorithm more flexible allows certain cycles
along guaranteed acyclic relations
28Structure Selection
- Key Components
- legal structures
- evaluating different structures
- structure search
29Evaluating different structures
- The Bayesian score of a structure S is defined as
the posterior probability of the structure given
the data I
- Standard approach to scoring models used in
Bayesian network learning
30- Using Bayes rule
- P(SI,s) ? P(IS,s) P(Ss)
- marginal likelihood P(IS,s)
- crucial component
- the effect of penalizing models with a large
number of parameters. - thus this score automatically balances the
complexity of the structure with its fit to the
data
31- Key Components
- legal structures
- evaluating different structures
- Structure search
32Structure search
- greedy hill-climbing search
- the simplest heuristic search algorithm
- Local maxima can be dealt with using random
restarts - but
- infinitely many possible structures
- require expensive database operation
33Alternative search model space
- At each phase k, we have a set of potential
parents Potk(X.A) for each attribute X.A - Then apply a standard structure search restricted
to the space of structures in which the parents
of each X.A are in Potk(X.A) - Phased search
- it first explores dependencies within objects,
- then between objects that are directly related,
- then between objects that are two links apart, etc
34Advantage of phased search
- gradually explores larger and larger fragments of
the infinitely large space, - can give priority to dependencies between objects
that are more closely related - precompute the database view corresponding to
X.A, Potk(X.A) - most of the expensive computations the joins
and aggregation required in the definition of the
parents are precomputed in these views
35Implementation and experimental results
- Simple artificial genetic database domain
- Construct training set of various sizes
- Compare the log-likelihood of test set of size
100,000 - gold standard model
- Learn parameters (model structure given)
- Learn model (learn both structure and parameters)
36(Father)
(Mother)
Person
Blood Type
Person
Blood Type
P-chromosome
P-chromosome
M-chromosome
M-chromosome
Person
P-chromosome
M-chromosome
Blood Type
Contaminated
Result
Blood Test
37experimental results
38Discussion and conclusion
- Scaling these ideas to large database
- How to determine the probability distribution
when there is an unbound variable - Treatment of missing value and hidden value,
further more automatic discovery of hidden value
We would want these techniques to helps us
automatically discover interesting entities and
relationships that hold in the world.
39Thank you!