Title: Probabilistic Graphical Models
1Probabilistic Graphical Models
- David Madigan
- Rutgers University
- madigan_at_stat.rutgers.edu
2Expert Systems
- Explosion of interest in Expert Systems in the
early 1980s
- Many companies (Teknowledge, IntelliCorp,
Inference, etc.), many IPOs, much media hype - Ad-hoc uncertainty handling
3Uncertainty in Expert Systems
If A then C (p1) If B then C (p2)
What if both A and B true?
Then C true with CF p1 (p2 X (1- p1))
Currently fashionable ad-hoc mumbo
jumbo A.F.M. Smith
4Eschewed Probabilistic Approach
- Computationally intractable
- Inscrutable
- Requires vast amounts of data/elicitation
e.g., for n dichotomous variables need 2n - 1
probabilities to fully specify the joint
distribution
5Conditional Independence
X Y Z
6Conditional Independence
- Suppose A and B are marginally independent.
Pr(A), Pr(B), Pr(CAB) X 4 6 probabilities - Suppose A and C are conditionally independent
given B Pr(A), Pr(BA) X 2, Pr(CB) X 2 5 - Chain with 50 variables requires 99 probabilities
versus 2100-1
A
B
C
C A B
7Properties of Conditional Independence (Dawid,
1980)
For any probability measure P and random
variables A, B, and C
CI 1 A B P ? B A P CI 2 A B ?
C P ? A B P CI 3 A B ? C P ? A
B C P CI 4 A B and A C B P ?
A B ? C P
Some probability measures also satisfy
CI 5 A B C and A C B P ? A B
? C P
CI5 satisfied whenever P has a positive joint
probability density with respect to some product
measure
8Markov Properties for Undirected Graphs
(Global) S separates A from B ? A B
S (Local) a V \ cl(a) bd (a) (Pairwise) a
b V \ a,b
(G) ? (L) ? (P)
B E, D A, C (1)
A
B
?
B D A, C, E (2)
C
E
To go from (2) to (1) need E B A,C? or CI5
D
Lauritzen, Dawid, Larsen Leimer (1990)
9Factorizations
A density f is said to factorize according to
G if f(x) ? ?C(xC)
C ? C
clique potentials
- cliques are maximally complete subgraphs
Proposition If f factorizes according to a UG
G, then it also obeys the global Markov
property Proof Let S separate A from B in G
and assume Let CA be the set of cliques with
non-empty intersection with A. Since S separates
A from B, we must have for
all C in CA. Then
10Markov Properties for Acyclic Directed
Graphs (Bayesian Networks)
(Global) S separates A from B in Gan(A,B,S)m ? A
B S (Local) a nd(a)\pa(a) pa (a)
(G) ? (L)
A
B
B
S
S
Lauritzen, Dawid, Larsen Leimer (1990)
11Factorizations
A density f admits a recursive factorization
according to an ADG G if f(x) ? f(xv xpa(v) )
ADG Global Markov Property ? f(x) ? f(xv
xpa(v) )
v ? V
Lemma If P admits a recursive factorization
according to an ADG G, then P factorizes
according GM (and chordal supergraphs of GM)
Lemma If P admits a recursive factorization
according to an ADG G, and A is an ancestral set
in G, then PA admits a recursive factorization
according to the subgraph GA
12Markov Properties for Acyclic Directed
Graphs (Bayesian Networks)
(Global) S separates A from B in Gan(A,B,S)m ? A
B S (Local) a nd(a)\pa(a) pa (a)
- ? nd(a) is an ancestral set pa(a) obviously
- separates a from nd(a)\pa(a) in Gan(a?nd(a))m
(G) ? (L)
(L) ? (factorization)
induction on the number of vertices
13d-separation
A chain p from a to b in an acyclic directed
graph G is said to be blocked by S if it contains
a vertex g ? p such that either - g ? S and
arrows of p do not meet head to head at g, or -
g ? S nor has g any descendents in S, and arrows
of p do meet head to head at g Two subsets A
and B are d-separated by S if all chains from A
to B are blocked by S
14(No Transcript)
15d-separation and global markov property
Let A, B, and S be disjoint subsets of a
directed, acyclic graph, G. Then S d-separates A
from B if and only if S separates A from B in
Gan(A,B,S)m
16UG ADG Intersection
A
B
C
C A B
A
D
A
A
B
C
B
A C B
C
B
C
A B C,D C D A,B
A C
A
B
C
A C B
A
B
C
A C B
17UG ADG Intersection
UG
ADG
Decomposable
- UG is decomposable if chordal
- ADG is decomposable if moral
- Decomposable closed-form log-linear models
No CI5
18Chordal Graphs and RIP
- Chordal graphs (uniquely) admit clique orderings
that have the Running Intersection Property
- V,T
- A,L,T
- L,A,B
- S,L,B
- A,B,D
- A,X
V
T
L
A
S
X
D
B
- The intersection of each set with those earlier
in the list is fully contained in previous set - Can compute cond. probabilities (e.g. Pr(XV)) by
message passing (Lauritzen Spiegelhalter,
Dawid, Jensen)
19Probabilistic Expert System
- Computationally intractable
- Inscrutable
- Requires vast amounts of data/elicitation
- Chordal UG models facilitate fast inference
- ADG models better for expert system applications
more natural to specify Pr( v pa(v) )
20Factorizations
UG Global Markov Property ? f(x) ? ?C(xC)
C ? C
ADG Global Markov Property ? f(x) ? f(xv
xpa(v) )
v ? V
21Lauritzen-Spiegelhalter Algorithm
A
- ? (C,S,D) ? Pr(SC, D)
- (A,E) ? Pr(EA) Pr(A)
- ? (C,E) ? Pr(CE)
- (F,D,B) ? Pr(DF)Pr(BF)Pr(F)
- ? (D,B,S) ? 1
- ? (B,S,G) ? Pr(GS,B)
- ? (H,S) ? Pr(HS)
E
F
E
F
D
C
D
C
B
B
S
S
H
H
G
G
Algorithm is widely deployed in commercial
software
22LS Toy Example
Pr(CB)0.2 Pr(CB)0.6 Pr(BA)0.5
Pr(BA)0.1 Pr(A)0.7
A
B
C
- (A,B) ? Pr(BA)Pr(A)
- ? (B,C) ? Pr(CB)
A
B
C
B
B
C
C
B
B
A
0.35
0.35
B
0.2
0.8
AB
B
BC
1
1
A
0.03
0.27
B
0.6
0.4
Pr(AC)
Message Schedule AB BC BC AB
C
C
C
C
B
B
B
0.076
0
B
0.076
0.304
0.38
0.62
B
0.372
0
B
0.372
0.248
23Other Theoretical Developments
Do the UG and ADG global Markov properties
identify all the conditional independences
implied by the corresponding factorizations? Yes.
Completeness for ADGs by Geiger and Pearl
(1988) for UGs by Frydenberg (1988)
Graphical characterization of collapsibility in
hierarchical log-linear models (Asmussen and
Edwards, 1983)
24Bayesian Learning for ADGs
- Example three binary variables
- Five parameters
25Local and Global Independence
26Bayesian learning
Consider a particular state pa(v) of pa(v)