Probabilistic Graphical Models - PowerPoint PPT Presentation

About This Presentation
Title:

Probabilistic Graphical Models

Description:

Title: Statistical Analysis of Web-Generated Data Author: David Madigan Last modified by: David Madigan Created Date: 4/6/1997 3:24:04 PM Document presentation format – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 27
Provided by: DavidM474
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Graphical Models


1
Probabilistic Graphical Models
  • David Madigan
  • Rutgers University
  • madigan_at_stat.rutgers.edu

2
Expert Systems
  • Explosion of interest in Expert Systems in the
    early 1980s
  • Many companies (Teknowledge, IntelliCorp,
    Inference, etc.), many IPOs, much media hype
  • Ad-hoc uncertainty handling

3
Uncertainty in Expert Systems
If A then C (p1) If B then C (p2)
What if both A and B true?
Then C true with CF p1 (p2 X (1- p1))
Currently fashionable ad-hoc mumbo
jumbo A.F.M. Smith
4
Eschewed Probabilistic Approach
  • Computationally intractable
  • Inscrutable
  • Requires vast amounts of data/elicitation

e.g., for n dichotomous variables need 2n - 1
probabilities to fully specify the joint
distribution
5
Conditional Independence
X Y Z
6
Conditional Independence
  • Suppose A and B are marginally independent.
    Pr(A), Pr(B), Pr(CAB) X 4 6 probabilities
  • Suppose A and C are conditionally independent
    given B Pr(A), Pr(BA) X 2, Pr(CB) X 2 5
  • Chain with 50 variables requires 99 probabilities
    versus 2100-1

A
B
C
C A B
7
Properties of Conditional Independence (Dawid,
1980)
For any probability measure P and random
variables A, B, and C
CI 1 A B P ? B A P CI 2 A B ?
C P ? A B P CI 3 A B ? C P ? A
B C P CI 4 A B and A C B P ?
A B ? C P
Some probability measures also satisfy
CI 5 A B C and A C B P ? A B
? C P
CI5 satisfied whenever P has a positive joint
probability density with respect to some product
measure
8
Markov Properties for Undirected Graphs
(Global) S separates A from B ? A B
S (Local) a V \ cl(a) bd (a) (Pairwise) a
b V \ a,b
(G) ? (L) ? (P)
B E, D A, C (1)
A
B
?
B D A, C, E (2)
C
E
To go from (2) to (1) need E B A,C? or CI5
D
Lauritzen, Dawid, Larsen Leimer (1990)
9
Factorizations
A density f is said to factorize according to
G if f(x) ? ?C(xC)
C ? C
clique potentials
  • cliques are maximally complete subgraphs


Proposition If f factorizes according to a UG
G, then it also obeys the global Markov
property Proof Let S separate A from B in G
and assume Let CA be the set of cliques with
non-empty intersection with A. Since S separates
A from B, we must have for
all C in CA. Then
10
Markov Properties for Acyclic Directed
Graphs (Bayesian Networks)
(Global) S separates A from B in Gan(A,B,S)m ? A
B S (Local) a nd(a)\pa(a) pa (a)
(G) ? (L)
A
B
B
S
S
Lauritzen, Dawid, Larsen Leimer (1990)
11
Factorizations
A density f admits a recursive factorization
according to an ADG G if f(x) ? f(xv xpa(v) )

ADG Global Markov Property ? f(x) ? f(xv
xpa(v) )
v ? V
Lemma If P admits a recursive factorization
according to an ADG G, then P factorizes
according GM (and chordal supergraphs of GM)
Lemma If P admits a recursive factorization
according to an ADG G, and A is an ancestral set
in G, then PA admits a recursive factorization
according to the subgraph GA
12
Markov Properties for Acyclic Directed
Graphs (Bayesian Networks)
(Global) S separates A from B in Gan(A,B,S)m ? A
B S (Local) a nd(a)\pa(a) pa (a)
  • ? nd(a) is an ancestral set pa(a) obviously
  • separates a from nd(a)\pa(a) in Gan(a?nd(a))m

(G) ? (L)
(L) ? (factorization)
induction on the number of vertices
13
d-separation
A chain p from a to b in an acyclic directed
graph G is said to be blocked by S if it contains
a vertex g ? p such that either - g ? S and
arrows of p do not meet head to head at g, or -
g ? S nor has g any descendents in S, and arrows
of p do meet head to head at g Two subsets A
and B are d-separated by S if all chains from A
to B are blocked by S
14
(No Transcript)
15
d-separation and global markov property
Let A, B, and S be disjoint subsets of a
directed, acyclic graph, G. Then S d-separates A
from B if and only if S separates A from B in
Gan(A,B,S)m
16
UG ADG Intersection
A
B
C
C A B
A
D
A
A
B
C
B
A C B
C
B
C
A B C,D C D A,B
A C
A
B
C
A C B
A
B
C
A C B
17
UG ADG Intersection
UG
ADG
Decomposable
  • UG is decomposable if chordal
  • ADG is decomposable if moral
  • Decomposable closed-form log-linear models

No CI5
18
Chordal Graphs and RIP
  • Chordal graphs (uniquely) admit clique orderings
    that have the Running Intersection Property
  1. V,T
  2. A,L,T
  3. L,A,B
  4. S,L,B
  5. A,B,D
  6. A,X

V
T
L
A
S
X
D
B
  • The intersection of each set with those earlier
    in the list is fully contained in previous set
  • Can compute cond. probabilities (e.g. Pr(XV)) by
    message passing (Lauritzen Spiegelhalter,
    Dawid, Jensen)

19
Probabilistic Expert System
  • Computationally intractable
  • Inscrutable
  • Requires vast amounts of data/elicitation
  • Chordal UG models facilitate fast inference
  • ADG models better for expert system applications
    more natural to specify Pr( v pa(v) )

20
Factorizations
UG Global Markov Property ? f(x) ? ?C(xC)
C ? C
ADG Global Markov Property ? f(x) ? f(xv
xpa(v) )
v ? V
21
Lauritzen-Spiegelhalter Algorithm
A
  • ? (C,S,D) ? Pr(SC, D)
  • (A,E) ? Pr(EA) Pr(A)
  • ? (C,E) ? Pr(CE)
  • (F,D,B) ? Pr(DF)Pr(BF)Pr(F)
  • ? (D,B,S) ? 1
  • ? (B,S,G) ? Pr(GS,B)
  • ? (H,S) ? Pr(HS)

E
F
E
F
D
C
D
C
B
B
S
S
H
H
G
G
  • Moralize
  • Triangulate

Algorithm is widely deployed in commercial
software
22
LS Toy Example
Pr(CB)0.2 Pr(CB)0.6 Pr(BA)0.5
Pr(BA)0.1 Pr(A)0.7
A
B
C
  • (A,B) ? Pr(BA)Pr(A)
  • ? (B,C) ? Pr(CB)

A
B
C
B
B
C
C
B
B
A
0.35
0.35
B
0.2
0.8
AB
B
BC
1
1
A
0.03
0.27
B
0.6
0.4
Pr(AC)
Message Schedule AB BC BC AB
C
C
C
C
B
B
B
0.076
0
B
0.076
0.304
0.38
0.62
B
0.372
0
B
0.372
0.248
23
Other Theoretical Developments
Do the UG and ADG global Markov properties
identify all the conditional independences
implied by the corresponding factorizations? Yes.
Completeness for ADGs by Geiger and Pearl
(1988) for UGs by Frydenberg (1988)
Graphical characterization of collapsibility in
hierarchical log-linear models (Asmussen and
Edwards, 1983)
24
Bayesian Learning for ADGs
  • Example three binary variables
  • Five parameters

25
Local and Global Independence
26
Bayesian learning
Consider a particular state pa(v) of pa(v)
Write a Comment
User Comments (0)
About PowerShow.com