PROBABILISTIC GRAPHICAL MODELS - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

PROBABILISTIC GRAPHICAL MODELS

Description:

Explosion of interest in 'Expert Systems' in the early 1980's ... 'Currently fashionable ad-hoc mumbo jumbo' A.F.M. Smith. Eschewed Probabilistic Approach ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 36

Provided by: davidm45

Learn more at: https://www.stat.rutgers.edu

Category:

more less

Transcript and Presenter's Notes

Title: PROBABILISTIC GRAPHICAL MODELS

1
PROBABILISTIC GRAPHICAL MODELS

David Madigan
Rutgers University
madigan_at_stat.rutgers.edu

2
Expert Systems

Explosion of interest in Expert Systems in the
early 1980s

Many companies (Teknowledge, IntelliCorp,
Inference, etc.), many IPOs, much media hype
Ad-hoc uncertainty handling

3
Uncertainty in Expert Systems
If A then C (p1) If B then C (p2)
What if both A and B true?
Then C true with CF p1 (p2 X (1- p1))
Currently fashionable ad-hoc mumbo
jumbo A.F.M. Smith
4
Eschewed Probabilistic Approach

Computationally intractable
Inscrutable
Requires vast amounts of data/elicitation

e.g., for n dichotomous variables need 2n - 1
probabilities to fully specify the joint
distribution
5
Conditional Independence
X Y Z
6
Conditional Independence

Suppose A and B are marginally independent.
Pr(A), Pr(B), Pr(CAB) X 4 6 probabilities
Suppose A and C are conditionally independent
given B Pr(A), Pr(BA) X 2, Pr(CB) X 2 5
Chain with 50 variables requires 99 probabilities
versus 250-1

A
B
C
C A B
7
Properties of Conditional Independence (Dawid,
1980)
For any probability measure P and random
variables A, B, and C
CI 1 A B P ? B A P CI 2 A B ?
C P ? A B P CI 3 A B ? C P ? A
B C P CI 4 A B and A C B P ?
A B ? C P
Some probability measures also satisfy
CI 5 A B C and A C B P ? A B
? C P
CI5 satisfied whenever P has a positive joint
probability density with respect to some product
measure
8
Markov Properties for Undirected Graphs
(Global) S separates A from B ? A B
S (Local) a V \ cl(a) bd (a) (Pairwise) a
b V \ a,b
(G) ? (L) ? (P)
B E, D A, C (1)
A
B
?
B D A, C, E (2)
C
E
To go from (2) to (1) need E B A,C? or CI5
D
Lauritzen, Dawid, Larsen Leimer (1990)
9
Factorizations
A density f is said to factorize according to
G if f(x) ? ?C(xC)
C ? C
clique potentials

cliques are maximally complete subgraphs

Proposition If f factorizes according to a UG
G, then it also obeys the global Markov
property Proof Let S separate A from B in G
and assume Let CA be the set of cliques with
non-empty intersection with A. Since S separates
A from B, we must have for
all C in CA. Then
10
Markov Properties for Acyclic Directed
Graphs (Bayesian Networks)
(Global) S separates A from B in Gan(A,B,S)m ? A
B S (Local) a nd(a)\pa(a) pa (a)
(G) ? (L)
A
B
B
S
S
Lauritzen, Dawid, Larsen Leimer (1990)
11
Factorizations
A density f admits a recursive factorization
according to an ADG G if f(x) ? f(xv xpa(v) )

ADG Global Markov Property ? f(x) ? f(xv
xpa(v) )
v ? V
Lemma If P admits a recursive factorization
according to an ADG G, then P factorizes
according GM (and chordal supergraphs of GM)
Lemma If P admits a recursive factorization
according to an ADG G, and A is an ancestral set
in G, then PA admits a recursive factorization
according to the subgraph GA
12
Factorizations
p(A,B,C,D,E,F,G,H,S) p(A)p(CA)p(DC)p(SD,F)p(
ES) p(FG)p(GB)p(HS,B)p(B) ? p(SA,B,C,D,E,F,G,
H) ? p(SD,F)p(ES)p(HS,B)
C
G
B
F
D
S
H
E
D,F,W,H,B is the Markov Blanket of S. It
contains the parents of S, the children of S, and
the other parents of the children of S.
13
Markov Properties for Acyclic Directed
Graphs (Bayesian Networks)
(Global) S separates A from B in Gan(A,B,S)m ? A
B S (Local) a nd(a)\pa(a) pa (a)

? nd(a) is an ancestral set pa(a) obviously
separates a from nd(a)\pa(a) in Gan(a?nd(a))m

(G) ? (L)
(L) ? (factorization)
induction on the number of vertices
14
d-separation
A chain p from a to b in an acyclic directed
graph G is said to be blocked by S if it contains
a vertex g ? p such that either - g ? S and
arrows of p do not meet head to head at g, or -
g ? S nor has g any descendents in S, and arrows
of p do meet head to head at g Two subsets A
and B are d-separated by S if all chains from A
to B are blocked by S
15
(No Transcript)
16
d-separation and global markov property
Let A, B, and S be disjoint subsets of a
directed, acyclic graph, G. Then S d-separates A
from B if and only if S separates A from B in
Gan(A,B,S)m
17
UG ADG Intersection
A
B
C
C A B
A
D
A
A
B
C
B
A C B
C
B
C
A B C,D C D A,B
A C
A
B
C
A C B
A
B
C
A C B
18
UG ADG Intersection
UG
ADG
Decomposable

UG is decomposable if chordal
ADG is decomposable if moral
Decomposable closed-form log-linear models

No CI5
19
Chordal Graphs and RIP

Chordal graphs (uniquely) admit clique orderings
that have the Running Intersection Property

V,T
A,L,T
L,A,B
S,L,B
A,B,D
A,X

V
T
L
A
S
X
D
B

The intersection of each set with those earlier
in the list is fully contained in previous set
Can compute cond. probabilities (e.g. Pr(XV)) by
message passing (Lauritzen Spiegelhalter,
Dawid, Jensen)

20
Probabilistic Expert System

Computationally intractable
Inscrutable
Requires vast amounts of data/elicitation

Chordal UG models facilitate fast inference
ADG models better for expert system applications
more natural to specify Pr( v pa(v) )

21
Factorizations
UG Global Markov Property ? f(x) ? ?C(xC)
C ? C
ADG Global Markov Property ? f(x) ? f(xv
xpa(v) )
v ? V
22
Lauritzen-Spiegelhalter Algorithm
A

? (C,S,D) ? Pr(SC, D)
(A,E) ? Pr(EA) Pr(A)
? (C,E) ? Pr(CE)
(F,D,B) ? Pr(DF)Pr(BF)Pr(F)
? (D,B,S) ? 1
? (B,S,G) ? Pr(GS,B)
? (H,S) ? Pr(HS)

E
F
E
F
D
C
D
C
B
B
S
S
H
H
G
G

Moralize
Triangulate

Algorithm is widely deployed in commercial
software
23
LS Toy Example
Pr(CB)0.2 Pr(CB)0.6 Pr(BA)0.5
Pr(BA)0.1 Pr(A)0.7
A
B
C

(A,B) ? Pr(BA)Pr(A)
? (B,C) ? Pr(CB)

A
B
C
B
B
C
C
B
B
A
0.35
0.35
B
0.2
0.8
AB
B
BC
1
1
A
0.03
0.27
B
0.6
0.4
Pr(AC)
Message Schedule AB BC BC AB
C
C
C
C
B
B
B
0.076
0
B
0.076
0.304
0.38
0.62
B
0.372
0
B
0.372
0.248
24
Other Theoretical Developments
Do the UG and ADG global Markov properties
identify all the conditional independences
implied by the corresponding factorizations? Yes.
Completeness for ADGs by Geiger and Pearl
(1988) for UGs by Frydenberg (1988)
Graphical characterization of collapsibility in
hierarchical log-linear models (Asmussen and
Edwards, 1983)
25
Collapsibility
Survival
Survival
No
Yes
No
Yes
Less
3
176
1.7
Less
17
197
7.9
Care
Care
More
4
293
1.4
More
2
23
8.0
Clinic B
Clinic A
Survival
No
Yes
Less
20
373
5.1
Care
More
6
316
1.9
Pooled
26
Collapsibility
Surv.
Clinic
Care
Theorem A graphical log-linear model L is
collapsible onto A iff every connected component
of Ac is complete.
27
Bayesian Learning for Discrete ADGs

Example three binary variables
Five parameters

28
Local and Global Independence
29
Bayesian learning
Consider a particular state pa(v) of pa(v)
30
Equivalence Classes and Chain Graphs

ADG models for a fixed set of vertices decompose
into Markov equivalence classes

A C B
A D B,C B C A
A D B,C B C
31
Why is this a problem?

Repeating analyses for equivalent ADGs leads to
significant computational inefficiencies.
Ensuring that equivalent ADGs have equal
posterior probabilities imposes severe
constraints on prior distributions (Geiger and
Heckerman, 1995).
Bayesian model averaging procedures that average
across ADGs assign weights to statistical models
that are proportional to equivalence class sizes.

32
Theorem (Verma Pearl, Glymour et al,
Frydenberg, AMP94)Two ADGs are Markov
equivalent iff they have the same skeletons and
the same immoralities.
Equivalence Class Characterization
Definition The essential graph D associated with
D is the graph D ?(DD D),
33
Essential GraphsAMP (1995)

Essential graphs are chain graphs
D is the unique smallest chain graph Markov
equivalent to D
A graph G (V, E) is equal to D for some ADG D
if and only if G satisfies the following four
conditions

(i) G is a chain graph (ii) For every chain
component t of G, Gt is chordal (iii) The
configuration ab¾c does not occur as an induced
subgraph of G (iv) Every arrow ab Î G is
strongly protected in G
also Meek (1995) and Chickering (1995)
34
Whats a Chain Graph?
Equivalence a b iff a
b
35
Chain Graphs
ADG
UG
CG
Decomposable