Learning Probabilistic Relational Models - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Learning Probabilistic Relational Models

Description:

... Relational Models. Lise Getoor, Nir Friedman, Daphne Koller, and Avi Pfeffer ... PRMs conceptually extend Bayesian networks to allow the specification of a ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 40
Provided by: scSn
Category:

less

Transcript and Presenter's Notes

Title: Learning Probabilistic Relational Models


1
Learning Probabilistic Relational Models
  • Lise Getoor, Nir Friedman, Daphne Koller, and Avi
    Pfeffer
  • Represented by Chi Eunkyung
  • October 23, 2006

2
Contents
  • Introduction
  • Underlying framework
  • Relational model
  • Probabilistic relational model
  • Parameter estimation
  • Structure selection
  • Implementation and experimental results
  • Discussion and conclusion

3
Introduction
  • Most real-world data are stored in relational
    DBMS
  • Discovering Patterns in Structured Data

4
(No Transcript)
5
Learning Statistical Models
  • Traditional approaches
  • work well with flat representations
  • fixed length attribute-value vectors
  • assume independent (IID) sample
  • Problems
  • introduces statistical skew
  • loses relational structure
  • incapable of detecting link-based patterns
  • must fix attributes in advance

6
Underlying framework
  • Relational model
  • Probabilistic relational model

7
Relational Model
Strain
Infected with
Unique
Infectivity
Contact
Contact-Type
Close-Contact
Patient
Skin-Test
Homeless
Age
Interacted with
HIV-Result
Ethnicity
Disease-Site
  • Describes the types of objects and relations in
    the database

8
Probabilistic Relational Model
  • PRMs conceptually extend Bayesian networks to
    allow the specification of a probability model
    for classes of objects rather than a fixed set of
    simple attributes
  • PRMs also allow properties of an entity to depend
    probabilistically on properties of other related
    entities

9
Probabilistic Relational Model
Strain
Patient
Unique
POB
Homeless
HIV-Result
Contact
Age
Disease Site
Contact-Type
Close-Contact
Transmitted
10
Probabilistic Relational Model
  • Combine advantages of relational logic Bayesian
    networks
  • natural domain modeling objects, properties,
    relations
  • generalization over a variety of situations
  • compact, natural probability models.
  • Integrate uncertainty with relational model
  • properties of domain entities can depend on
    properties of related entities
  • uncertainty over relational structure of domain

11
Mapping PRMs from Relational Models
  • Mapping PRMs from Relational Models
  • A relational model consists of a set of classes
    X1,,Xn and a set of relations R1,,Rm, where
    each relation Ri is typed
  • Each class or entity type (corresponding to a
    single relational table) is associated with a set
    of attributes A(Xi) and a set of reference slots
    R (X)

12
PRM Semantics Continued
  • Each attribute Aj ? A(Xi) takes on values in some
    fixed domain of possible values denoted V(Aj).
    We assume that value spaces are finite
  • Attribute A of class X is denoted X.A
  • For example, the Student class has an
    Intelligence attribute and the value space or
    domain for Student.Intelligence might be high,
    low

13
  • An instance I of a schema specifies a set of
    objects x, partitioned into classes such that
    there is a value for each attribute x.A and a
    value for each reference slot x.?
  • A(x) is used as a shorthand for A(X), where x is
    of class X. For each object x in the instance
    and each of its attributes A, we use Ix.A to
    denote the value of x.A in I

14
  • Some attributes, such as name or social security
    number, are fully determined. Such attributes
    are labeled as fixed. Assume that they are known
    in any instantiation of the schema
  • The other attributes are called probabilistic

15
M
1
Student
Professor
Name
Name
Intelligence
Popularity
Ranking
Teaching-Ability
1
Registration
Course
RegID
Name
Course
Instructor
M
M
Student
Rating
M
Grade
Difficulty
Satisfaction
16
  • A skeleton structure s of a relational schema is
    a partial specification of an instance of the
    schema. It specifies the set of objects Os(Xi)
    for each class, the values of the fixed
    attributes of these objects, and the relations
    that hold between the objects
  • The values of probabilistic attributes are left
    unspecified
  • A completion I of the skeleton structure s
    extends the skeleton by also specifying the
    values of the probabilistic attributes

17
Relational Skeleton
18
The Completion Instance I
19
Another Relational Skeleton
Student Name Jane Doe Intelligence
high Ranking average
Professor Name Prof. Vincent Popularity
??? Teaching-Ability ???
Professor Name Prof. Gump Popularity
high Teaching-Ability ???
Student Name Jane Doe Intelligence
high Ranking average
Student Name John Doe Intelligence
??? Ranking ???
Registration RegID 5639 Grade
A Satisfaction 3
Registration RegID 5639 Grade
A Satisfaction 3
PRMs allow multiple possible skeletons
Course Name Phil201 Difficulty
??? Rating ???
Registration RegID 5723 Grade
??? Satisfaction ???
20
PRM with AU Semantics
Contact c1
Strain s1
Patient p2
Contact c2
Strain s2
Patient p1
Contact c3
Patient p3
PRM
relational skeleton ?


21
Learning PRMs
Strain
Database
Patient
Contact
PRM
Strain
Patient
Contact
  • Parameter estimation
  • Structure selection

Relational Schema
22
Parameter Estimation
  • Assume known dependency structure S
  • Goal estimate PRM parameters q
  • entries in local probability models,
  • q is good if it is likely to generate the
    observed data, instance I .
  • MLE Principle Choose q so as to maximize l

As in Bayesian network learning, crucial
property decomposition separate terms for
different X.A
23
Patient
HIV
Contact
DiseaseSite
CloseContact
Transmitted
24
Structure Selection
  • Idea
  • define scoring function
  • do local search over legal structures
  • Key Components
  • legal structures
  • evaluating different structures
  • structure search

25
Structure Selection
  • Key Components
  • legal structures
  • evaluating different structures
  • structure search

26
legal structures
  • PRM defines a coherent probability model over a
    skeleton ? if the dependencies between object
    attributes is acyclic

Paper P1 Accepted yes
author-of
Researcher Prof. Gump Reputation high
Paper P2 Accepted yes
sum
How do we guarantee that a PRM is acyclic for
every skeleton?
27
PRM dependency structure S
dependency graph
Paper.Accecpted
if Researcher.Reputation depends directly on
Paper.Accepted
Researcher.Reputation
Algorithm more flexible allows certain cycles
along guaranteed acyclic relations
28
Structure Selection
  • Key Components
  • legal structures
  • evaluating different structures
  • structure search

29
Evaluating different structures
  • The Bayesian score of a structure S is defined as
    the posterior probability of the structure given
    the data I
  • Bayesian approach
  • Standard approach to scoring models used in
    Bayesian network learning

30
  • Using Bayes rule
  • P(SI,s) ? P(IS,s) P(Ss)
  • marginal likelihood P(IS,s)
  • crucial component
  • the effect of penalizing models with a large
    number of parameters.
  • thus this score automatically balances the
    complexity of the structure with its fit to the
    data

31
  • Key Components
  • legal structures
  • evaluating different structures
  • Structure search

32
Structure search
  • greedy hill-climbing search
  • the simplest heuristic search algorithm
  • Local maxima can be dealt with using random
    restarts
  • but
  • infinitely many possible structures
  • require expensive database operation

33
Alternative search model space
  • At each phase k, we have a set of potential
    parents Potk(X.A) for each attribute X.A
  • Then apply a standard structure search restricted
    to the space of structures in which the parents
    of each X.A are in Potk(X.A)
  • Phased search
  • it first explores dependencies within objects,
  • then between objects that are directly related,
  • then between objects that are two links apart, etc

34
Advantage of phased search
  • gradually explores larger and larger fragments of
    the infinitely large space,
  • can give priority to dependencies between objects
    that are more closely related
  • precompute the database view corresponding to
    X.A, Potk(X.A)
  • most of the expensive computations the joins
    and aggregation required in the definition of the
    parents are precomputed in these views

35
Implementation and experimental results
  • Simple artificial genetic database domain
  • Construct training set of various sizes
  • Compare the log-likelihood of test set of size
    100,000
  • gold standard model
  • Learn parameters (model structure given)
  • Learn model (learn both structure and parameters)

36
(Father)
(Mother)
Person
Blood Type
Person
Blood Type
P-chromosome
P-chromosome
M-chromosome
M-chromosome
Person
P-chromosome
M-chromosome
Blood Type
Contaminated
Result
Blood Test
37
experimental results
38
Discussion and conclusion
  • Scaling these ideas to large database
  • How to determine the probability distribution
    when there is an unbound variable
  • Treatment of missing value and hidden value,
    further more automatic discovery of hidden value

We would want these techniques to helps us
automatically discover interesting entities and
relationships that hold in the world.
39
Thank you!
  • Any Questions?
Write a Comment
User Comments (0)
About PowerShow.com