PROBABILISTIC GRAPHICAL MODELS - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

PROBABILISTIC GRAPHICAL MODELS

Description:

Explosion of interest in 'Expert Systems' in the early 1980's ... 'Currently fashionable ad-hoc mumbo jumbo' A.F.M. Smith. Eschewed Probabilistic Approach ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 36
Provided by: davidm45
Category:

less

Transcript and Presenter's Notes

Title: PROBABILISTIC GRAPHICAL MODELS


1
PROBABILISTIC GRAPHICAL MODELS
  • David Madigan
  • Rutgers University
  • madigan_at_stat.rutgers.edu

2
Expert Systems
  • Explosion of interest in Expert Systems in the
    early 1980s
  • Many companies (Teknowledge, IntelliCorp,
    Inference, etc.), many IPOs, much media hype
  • Ad-hoc uncertainty handling

3
Uncertainty in Expert Systems
If A then C (p1) If B then C (p2)
What if both A and B true?
Then C true with CF p1 (p2 X (1- p1))
Currently fashionable ad-hoc mumbo
jumbo A.F.M. Smith
4
Eschewed Probabilistic Approach
  • Computationally intractable
  • Inscrutable
  • Requires vast amounts of data/elicitation

e.g., for n dichotomous variables need 2n - 1
probabilities to fully specify the joint
distribution
5
Conditional Independence
X Y Z
6
Conditional Independence
  • Suppose A and B are marginally independent.
    Pr(A), Pr(B), Pr(CAB) X 4 6 probabilities
  • Suppose A and C are conditionally independent
    given B Pr(A), Pr(BA) X 2, Pr(CB) X 2 5
  • Chain with 50 variables requires 99 probabilities
    versus 250-1

A
B
C
C A B
7
Properties of Conditional Independence (Dawid,
1980)
For any probability measure P and random
variables A, B, and C
CI 1 A B P ? B A P CI 2 A B ?
C P ? A B P CI 3 A B ? C P ? A
B C P CI 4 A B and A C B P ?
A B ? C P
Some probability measures also satisfy
CI 5 A B C and A C B P ? A B
? C P
CI5 satisfied whenever P has a positive joint
probability density with respect to some product
measure
8
Markov Properties for Undirected Graphs
(Global) S separates A from B ? A B
S (Local) a V \ cl(a) bd (a) (Pairwise) a
b V \ a,b
(G) ? (L) ? (P)
B E, D A, C (1)
A
B
?
B D A, C, E (2)
C
E
To go from (2) to (1) need E B A,C? or CI5
D
Lauritzen, Dawid, Larsen Leimer (1990)
9
Factorizations
A density f is said to factorize according to
G if f(x) ? ?C(xC)
C ? C
clique potentials
  • cliques are maximally complete subgraphs


Proposition If f factorizes according to a UG
G, then it also obeys the global Markov
property Proof Let S separate A from B in G
and assume Let CA be the set of cliques with
non-empty intersection with A. Since S separates
A from B, we must have for
all C in CA. Then
10
Markov Properties for Acyclic Directed
Graphs (Bayesian Networks)
(Global) S separates A from B in Gan(A,B,S)m ? A
B S (Local) a nd(a)\pa(a) pa (a)
(G) ? (L)
A
B
B
S
S
Lauritzen, Dawid, Larsen Leimer (1990)
11
Factorizations
A density f admits a recursive factorization
according to an ADG G if f(x) ? f(xv xpa(v) )

ADG Global Markov Property ? f(x) ? f(xv
xpa(v) )
v ? V
Lemma If P admits a recursive factorization
according to an ADG G, then P factorizes
according GM (and chordal supergraphs of GM)
Lemma If P admits a recursive factorization
according to an ADG G, and A is an ancestral set
in G, then PA admits a recursive factorization
according to the subgraph GA
12
Factorizations
p(A,B,C,D,E,F,G,H,S) p(A)p(CA)p(DC)p(SD,F)p(
ES) p(FG)p(GB)p(HS,B)p(B) ? p(SA,B,C,D,E,F,G,
H) ? p(SD,F)p(ES)p(HS,B)
C
G
B
F
D
S
H
E
D,F,W,H,B is the Markov Blanket of S. It
contains the parents of S, the children of S, and
the other parents of the children of S.
13
Markov Properties for Acyclic Directed
Graphs (Bayesian Networks)
(Global) S separates A from B in Gan(A,B,S)m ? A
B S (Local) a nd(a)\pa(a) pa (a)
  • ? nd(a) is an ancestral set pa(a) obviously
  • separates a from nd(a)\pa(a) in Gan(a?nd(a))m

(G) ? (L)
(L) ? (factorization)
induction on the number of vertices
14
d-separation
A chain p from a to b in an acyclic directed
graph G is said to be blocked by S if it contains
a vertex g ? p such that either - g ? S and
arrows of p do not meet head to head at g, or -
g ? S nor has g any descendents in S, and arrows
of p do meet head to head at g Two subsets A
and B are d-separated by S if all chains from A
to B are blocked by S
15
(No Transcript)
16
d-separation and global markov property
Let A, B, and S be disjoint subsets of a
directed, acyclic graph, G. Then S d-separates A
from B if and only if S separates A from B in
Gan(A,B,S)m
17
UG ADG Intersection
A
B
C
C A B
A
D
A
A
B
C
B
A C B
C
B
C
A B C,D C D A,B
A C
A
B
C
A C B
A
B
C
A C B
18
UG ADG Intersection
UG
ADG
Decomposable
  • UG is decomposable if chordal
  • ADG is decomposable if moral
  • Decomposable closed-form log-linear models

No CI5
19
Chordal Graphs and RIP
  • Chordal graphs (uniquely) admit clique orderings
    that have the Running Intersection Property
  • V,T
  • A,L,T
  • L,A,B
  • S,L,B
  • A,B,D
  • A,X

V
T
L
A
S
X
D
B
  • The intersection of each set with those earlier
    in the list is fully contained in previous set
  • Can compute cond. probabilities (e.g. Pr(XV)) by
    message passing (Lauritzen Spiegelhalter,
    Dawid, Jensen)

20
Probabilistic Expert System
  • Computationally intractable
  • Inscrutable
  • Requires vast amounts of data/elicitation
  • Chordal UG models facilitate fast inference
  • ADG models better for expert system applications
    more natural to specify Pr( v pa(v) )

21
Factorizations
UG Global Markov Property ? f(x) ? ?C(xC)
C ? C
ADG Global Markov Property ? f(x) ? f(xv
xpa(v) )
v ? V
22
Lauritzen-Spiegelhalter Algorithm
A
  • ? (C,S,D) ? Pr(SC, D)
  • (A,E) ? Pr(EA) Pr(A)
  • ? (C,E) ? Pr(CE)
  • (F,D,B) ? Pr(DF)Pr(BF)Pr(F)
  • ? (D,B,S) ? 1
  • ? (B,S,G) ? Pr(GS,B)
  • ? (H,S) ? Pr(HS)

E
F
E
F
D
C
D
C
B
B
S
S
H
H
G
G
  • Moralize
  • Triangulate

Algorithm is widely deployed in commercial
software
23
LS Toy Example
Pr(CB)0.2 Pr(CB)0.6 Pr(BA)0.5
Pr(BA)0.1 Pr(A)0.7
A
B
C
  • (A,B) ? Pr(BA)Pr(A)
  • ? (B,C) ? Pr(CB)

A
B
C
B
B
C
C
B
B
A
0.35
0.35
B
0.2
0.8
AB
B
BC
1
1
A
0.03
0.27
B
0.6
0.4
Pr(AC)
Message Schedule AB BC BC AB
C
C
C
C
B
B
B
0.076
0
B
0.076
0.304
0.38
0.62
B
0.372
0
B
0.372
0.248
24
Other Theoretical Developments
Do the UG and ADG global Markov properties
identify all the conditional independences
implied by the corresponding factorizations? Yes.
Completeness for ADGs by Geiger and Pearl
(1988) for UGs by Frydenberg (1988)
Graphical characterization of collapsibility in
hierarchical log-linear models (Asmussen and
Edwards, 1983)
25
Collapsibility
Survival
Survival
No
Yes
No
Yes
Less
3
176
1.7
Less
17
197
7.9
Care
Care
More
4
293
1.4
More
2
23
8.0
Clinic B
Clinic A
Survival
No
Yes
Less
20
373
5.1
Care
More
6
316
1.9
Pooled
26
Collapsibility
Surv.
Clinic
Care
Theorem A graphical log-linear model L is
collapsible onto A iff every connected component
of Ac is complete.
27
Bayesian Learning for Discrete ADGs
  • Example three binary variables
  • Five parameters

28
Local and Global Independence
29
Bayesian learning
Consider a particular state pa(v) of pa(v)
30
Equivalence Classes and Chain Graphs
  • ADG models for a fixed set of vertices decompose
    into Markov equivalence classes

A C B
A D B,C B C A
A D B,C B C
31
Why is this a problem?
  • Repeating analyses for equivalent ADGs leads to
    significant computational inefficiencies.
  • Ensuring that equivalent ADGs have equal
    posterior probabilities imposes severe
    constraints on prior distributions (Geiger and
    Heckerman, 1995).
  • Bayesian model averaging procedures that average
    across ADGs assign weights to statistical models
    that are proportional to equivalence class sizes.

32
Theorem (Verma Pearl, Glymour et al,
Frydenberg, AMP94)Two ADGs are Markov
equivalent iff they have the same skeletons and
the same immoralities.
Equivalence Class Characterization
Definition The essential graph D associated with
D is the graph D ?(DD D),
33
Essential GraphsAMP (1995)
  • Essential graphs are chain graphs
  • D is the unique smallest chain graph Markov
    equivalent to D
  • A graph G (V, E) is equal to D for some ADG D
    if and only if G satisfies the following four
    conditions

(i) G is a chain graph (ii) For every chain
component t of G, Gt is chordal (iii) The
configuration ab¾c does not occur as an induced
subgraph of G (iv) Every arrow ab ÃŽ G is
strongly protected in G
also Meek (1995) and Chickering (1995)
34
Whats a Chain Graph?
Equivalence a b iff a
b
35
Chain Graphs
ADG
UG
CG
Decomposable
  • Chain graph Markov property, Frydenberg (1990)
  • Equivalence results (LWF, AMP, Meek, Studeny)

A
D
or
?
C D A,B
C D A
C
B
Cox Wermuth (1996)
Write a Comment
User Comments (0)
About PowerShow.com