Finding Optimal Bayesian Networks with Greedy Search - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Finding Optimal Bayesian Networks with Greedy Search

Description:

Rev. Rev. Rev. Meek's Conjecture and BES. S*S. Assume: Local Max S Not S ... Rev. Rev. Rev. Discussion Points. In practice, GES is as fast as DAG-based search ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 60
Provided by: dmax
Category:

less

Transcript and Presenter's Notes

Title: Finding Optimal Bayesian Networks with Greedy Search


1
Finding Optimal Bayesian Networks with Greedy
Search
  • Max Chickering

2
Outline
  • Bayesian-Network Definitions
  • Learning
  • Greedy Equivalence Search (GES)
  • Optimality of GES

3
Bayesian Networks
Use B (S,q) to represent p(X1, , Xn)
4
Markov Conditions
From factorization I(X, ND Par(X))
Par
Par
Par
ND
X
Desc
ND
Desc
Markov Conditions Graphoid Axioms characterize
all independencies
5
Structure/Distribution Inclusion
All distributions
p
X
Y
Z
S
  • p is included in S if there exists q s.t. B(S,q)
    defines p

6
Structure/Structure Inclusion T S
All distributions
X
Y
Z
X
Y
Z
S
T
  • T is included in S if every p included in T is
    included in S

(S is an I-map of T)
7
Structure/Structure EquivalenceT ? S
All distributions
X
Y
Z
X
Y
Z
S
T
Reflexive, Symmetric, Transitive
8
Equivalence
A
B
C
A
B
C
D
D
Skeleton
V-structure
Theorem (Verma and Pearl, 1990) S ? T ? same
v-structures and skeletons
9
Learning Bayesian Networks
X
X Y Z 0 1 1 1 0 1 0 1 0 . . . 1 0 1
iid samples
Y
p
Z
Generative Distribution
Observed Data
Learned Model
  1. Learn the structure
  2. Estimate the conditional distributions

10
Learning Structure
  • Scoring criterion
  • F(D, S)
  • Search procedure
  • Identify one or more structures with high values
  • for the scoring function

11
Properties of Scoring Criteria
  • Consistent
  • Locally Consistent
  • Score Equivalent

12
Consistent Criterion
Criterion favors (in the limit) simplest model
that includes the generative distribution p
  • S includes p, T does not include p ? F(S,D)
    gt F(T,D)
  • Both include p, S has fewer parameters ? F(S,D)
    gt F(T,D)

13
Locally Consistent Criterion
S and T differ by one edge
X
Y
X
Y
S
T
If I(X,YPar(X)) in p then F(S,D) gt
F(T,D) Otherwise F(S,D) lt F(T,D)
14
Score-Equivalent Criterion
X
Y
S
X
Y
T
S?T ? F(S,D) F(T,D)
15
Bayesian Criterion(Consistent, locally
consistent and score equivalent)
  • Sh generative distribution p has same
  • independence constraints as S.
  • FBayes(S,D) log p(Sh D)
  • k log p(DSh) log p(Sh)

Structure Prior (e.g. prefer simple)
Marginal Likelihood (closed form w/ assumptions)
16
Search Procedure
  • Set of states
  • Representation for the states
  • Operators to move between states
  • Systematic Search Algorithm

17
Greedy Equivalence Search
  • Set of states
  • Equivalence classes of DAGs
  • Representation for the states
  • Essential graphs
  • Operators to move between states
  • Forward and Backward Operators
  • Systematic Search Algorithm
  • Two-phase Greedy

18
Representation Essential Graphs
A
B
C
Compelled Edges Reversible Edges
E
F
D
A
B
C
E
F
D
19
GES Operators
Forward Direction single edge additions
Backward Direction single edge deletions
20
Two-Phase Greedy Algorithm
  • Phase 1 Forward Equivalence Search (FES)
  • Start with all-independence model
  • Run Greedy using forward operators
  • Phase 2 Backward Equivalence Search (BES)
  • Start with local max from FES
  • Run Greedy using backward operators

21
Forward Operators
  • Consider all DAGs in the current state
  • For each DAG, consider all single-edge additions
    (acyclic)
  • Take the union of the resulting equivalence
    classes

22
Forward-Operators Example
Current State
All DAGs
All DAGs resulting from single-edge addition
Union of corresponding essential graphs
23
Forward-Operators Example
24
Backward Operators
  • Consider all DAGs in the current state
  • For each DAG, consider all single-edge deletions
  • Take the union of the resulting equivalence
    classes

25
Backward-Operators Example
Current State
All DAGs
All DAGs resulting from single-edge deletion
B
A
B
A
B
A
B
A
B
A
B
A
C
C
C
C
C
C
Union of corresponding essential graphs
26
Backward-Operators Example
27
DAG Perfect
  • DAG-perfect distribution p
  • Exists DAG G
  • I(X,YZ) in p ? I(X,YZ) in G

Non-DAG-perfect distribution q
B
A
B
A
B
A
D
C
D
C
D
C
I(A,DB,C) I(B,CA,D)
I(B,CA,D)
I(A,DB,C)
28
DAG-Perfect Consequence Composition Axiom Holds
in p
If ?I(X,Y Z) then ?I(X,Y Z) for some
singleton Y ? Y
A
B
C
D
C
X
X
29
Optimality of GES
If p is DAG-perfect wrt some G
X
X
X
X Y Z 0 1 1 1 0 1 0 1 0 . .
. 1 0 1
Y
Y
Y
n
iid samples
GES
Z
Z
Z
S
G
S
p
For large n, S S
30
Optimality of GES
BES
FES
State includes S
State equals S
All-independence
  • Proof Outline
  • After first phase (FES), current state includes
    S
  • After second phase (BES), the current state S

31
FES Maximum Includes S
Assume Local Max does NOT include S
Any DAG G from S
Markov Conditions characterize independencies In
p, exists X not indep. non-desc given parents
A
B
C
? I(X,A,B,C,D E) in p
E
X
D
p is DAG-perfect ? composition axiom holds
A
B
C
? I(X,C E) in p
E
X
D
Locally consistent adding C?X edge improves
score, and EQ class is a neighbor
32
BES Identifies S
  • Current state always includes S
  • Local consistency of the criterion
  • Local Minimum is S
  • Meeks conjecture

33
Meeks Conjecture
  • Any pair of DAGs G,H such that H includes G (G
    H)
  • There exists a sequence of
  • covered edge reversals in G
  • (2) single-edge additions to G
  • after each change G H
  • after all changes GH

34
Meeks Conjecture
B
A
I(A,B) I(C,BA,D)
C
D
H
B
A
B
A
B
A
B
A
C
D
C
D
C
D
C
D
G
35
Meeks Conjecture and BESSS
Assume Local Max S Not S
Any DAG H from S
Any DAG G from S
Add
Add
Rev
Rev
Rev
G
H
36
Meeks Conjecture and BESSS
Assume Local Max S Not S
Any DAG H from S
Any DAG G from S
Add
Add
Rev
Rev
Rev
G
H
Del
Del
Rev
Rev
Rev
G
H
37
Meeks Conjecture and BESSS
Assume Local Max S Not S
Any DAG H from S
Any DAG G from S
Add
Add
Rev
Rev
Rev
G
H
Del
Del
Rev
Rev
Rev
G
H
S
S
Neighbor of S in BES
38
Discussion Points
  • In practice, GES is as fast as DAG-based search
  • Neighborhood of essential graphs can be
    generated and scored very efficiently
  • When DAG-perfect assumption fails, we still get
    optimality guarantees
  • As long as composition holds in generative
    distribution, local maximum is inclusion-minimal

39
Thanks!
  • My Home Page
  • http//research.microsoft.com/dmax
  • Relevant Papers
  • Optimal Structure Identification with Greedy
    Search
  • JMLR Submission
  • Contains detailed proofs of Meeks conjecture and
    optimality of GES
  • Finding Optimal Bayesian Networks
  • UAI02 Paper with Chris Meek
  • Contains extension of optimality results of GES
    when not DAG perfect

40
(No Transcript)
41
Bayesian Criterion is Locally Consistent
  • Bayesian score approaches BIC constant
  • BIC is decomposible
  • Difference in score same for any DAGS that differ
    by Y?X edge if X has same parents

X
Y
X
Y
Complete network (always includes p)
42
Bayesian Criterion is Consistent
  • Assume Conditionals
  • unconstrained multinomials
  • linear regressions

Geiger, Heckerman, King and Meek (2001)
Network structures curved exponential models
Haughton (1988)
Bayesian Criterion is consistent
43
Bayesian Criterion isScore Equivalent
S?T ? F(S,D) F(T,D)
X
Y
Sh no independence constraints
S
X
Y
Th no independence constraints
T
Sh Th
44
Active Paths
  • Z-active Path between X and Y (non-standard)
  • Neither X nor Y is in Z
  • Every pair of colliding edges meets at a member
    of Z
  • No other pair of edges meets at a member of Z

X
Z
Y
G H ? If Z-active path between X and Y in
G then Z-active path between X and Y in H
45
Active Paths
X
A
Z
W
B
Y
  • X-Y Out-of X and In-to Y
  • X-W Out-of both X and W
  • Any sub-path between A,B?Z is also active
  • A B, B C, at least one is out-of B
  • ?Active path between A and C

46
Simple Active Paths
A
B
contains Y?X
Then ? active path
(1) Edge appears exactly once
OR
Y
X
B
A
(2) Edge appears exactly twice
A
Y
X
X
Y
B
Simplify discussion Assume (1) only proofs
for (2) almost identical
47
Typical ArgumentCombining Active Paths
X
Y
B
A
X
Y
Z
Z sink node adj X,Y
G
Z
H
X
Y
B
A
X
A
GH
Y
B
Z
G Suppose AP in G (X not in CS) with no
corresp. AP in H. Then Z not in CS.
48
Proof Sketch
  • Two DAGs G, H with GltH
  • Identify either
  • a covered edge X?Y in G that has opposite
    orientation in H
  • a new edge X?Y to be added to G such that it
    remains included in H

49
The Transformation
Choose any node Y that is a sink in H
Case 1a Y is a sink in G X ? ParH(Y) X ?
ParG(Y) Case 1b Y is a sink in G same
parents Case 2a ?X s.t. Y?X covered Case
2b ?X s.t. Y?X W par of Y but not X Case
2c Every Y?X, Par (Y) ? Par(X)
X
Y
X
Y
Y
X
Y
X
Y
W
W
X
Y
X
Y
Y
Y
50
Preliminaries
(G H)
  • The adjacencies in G are a subset of the
    adjacencies in H
  • If X?Y?Z is a v-structure in G but not H, then X
    and Z are adjacent in H
  • Any new active path that results from adding X?Y
    to G includes X?Y

51
Proof Sketch Case 1
Y is a sink in G
Case 1a X ? ParH(Y) X ? ParG(Y)
H
X
Y
X
Y
G
X
Y
Suppose theres some new active path between A
and B not in H
Y
X
B
Z
A
  1. Y is a sink in G, so it must be in CS
  2. Neither X nor next node Z is in CS
  3. In H, AP(A,Z), AP(X,B), Z?Y?X

Case 1b Parents identical Remove Y from both
graphs proof similar
52
Proof Sketch Case 2
Y is not a sink in G
Case 2a There is a covered edge Y?X Reverse
the edge
Case 2b There is a non-covered edge Y?X such
that W is a parent of Y but not a parent of X
W
W
W
G
H
G
X
Y
X
Y
X
Y
Suppose theres some new active path between A
and B not in H
Y must be in CS, else replace W?X by W ? Y ? X
(not new). If X not in CS, then in H active A-W,
X-B, W?Y?X
B
W
A
B
W
A
G
H
Z
X
Y
Z
X
Y
53
Case 2c The Difficult Case
  • All non-covered edges Y?Z have Par(Y) ? Par(Z)

W1
W2
W1
W2
Y
Y
Z1
Z2
Z1
Z2
G
H
W1?Y G no longer lt H (Z2-active path between W1
and W2) W2?Y G lt H
54
Choosing Z
G
H
Y
Y
D
D
Z
Descendants of Y in G
Descendants of Y in G
D is the maximal G-descendant in H Z is any
maximal child of Y such that D is a descendant of
Z in G
55
Choosing Z
G
H
Descendants of Y in G Y, Z1, Z2 Maximal
descendant in H DZ2 Maximal child of Y in G
that has DZ2 as descendant Z2
Add W2?Y
56
Difficult Case Proof Intuition
Y
A
Y
W
W
A
B
B
Z
Z
B or CS
B or CS
D
D
G
H
1. W not in CS 2. Y not in CS, else active in
H 3. In G, next edges must be away from Y until B
or CS reached 4. In G, neither Z nor desc in CS,
else active before addition 5. From (1,2,4), AP
(A,D) and (B,D) in H 6. Choice of D directed
path from D to B or CS in H
57
(No Transcript)
58
Optimality of GES
  • Definition
  • p is DAG-perfect wrt G
  • Independence constraints in p are precisely those
    in G
  • Assumption
  • Generative distribution p is perfect wrt some G
    defined
  • over the observable variables
  • S Equivalence class containing G
  • Under DAG-perfect assumption, GES results in S

59
Important Definitions
  • Bayesian Networks
  • Markov Conditions
  • Distribution/Structure Inclusion
  • Structure/Structure Inclusion
Write a Comment
User Comments (0)
About PowerShow.com