Finding Optimal Bayesian Networks with Greedy Search - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Finding Optimal Bayesian Networks with Greedy Search

Description:

For each DAG, consider all single-edge additions (acyclic) ... In practice, GES is as fast as DAG-based search ... When DAG-perfect assumption fails, we still ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 58
Provided by: dmax
Category:

less

Transcript and Presenter's Notes

Title: Finding Optimal Bayesian Networks with Greedy Search


1
Finding Optimal Bayesian Networks with Greedy
Search
  • Max Chickering

2
Reasoning Under Uncertainty
Print Troubleshooter (Win95, Win2k, WinXP)
Net/Local Printing
Network Up
Correct Local Port
Correct Printer Path
PC to Printer Transport OK
Net Path OK
Local Path OK
Local Cable Connected
Net Cable Connected
Paper Loaded
Printer On and Online
Printer Data OK
Printer Memory Adequate
Print Output OK
3
Troubleshooters(Win95 on)
4
Answer Wizard(Office 95 on)
5
Machine Learning and Applied Statistics Group
Data
  • Applications
  • Commerce Server
  • SQL Server
  • Spam Detection
  • Machine Translation
  • Analysis of Web Data

6
Outline
  • Bayesian-Networks
  • Learning
  • Greedy Equivalence Search (GES)
  • Optimality of GES
  • (Details of Meeks Conjecture)

7
Bayesian Networks
  • Use B (S,q) to represent p(X1, , Xn)
  • Structural Component S is a DAG
  • Parameters q specify local probability
  • distributions

8
Markov Conditions
From factorization I(X, ND Par(X))
Par
Par
Par
ND
X
Desc
ND
Desc
Markov Conditions Graphoid Axioms characterize
all independencies
9
Structure/Distribution Inclusion
All distributions
p
X
Y
Z
S
  • p is included in S if there exists q s.t. B(S,q)
    defines p

10
Structure/Structure Inclusion T S
All distributions
X
Y
Z
X
Y
Z
S
T
  • T is included in S if every p included in T is
    included in S

(S is an I-map of T)
11
Structure/Structure EquivalenceT ? S
All distributions
X
Y
Z
X
Y
Z
S
T
Reflexive, Symmetric, Transitive
12
Equivalence
A
B
C
A
B
C
D
D
Skeleton
V-structure
Theorem (Verma and Pearl, 1990) S ? T ? same
v-structures and skeletons
13
Learning Bayesian Networks
X
X Y Z 0 1 1 1 0 1 0 1 0 . . . 1 0 1
iid samples
Y
p
Z
Generative Distribution
Observed Data
Learned Model
  1. Learn the structure
  2. Estimate the conditional distributions

14
Learning Structure
  • Scoring criterion
  • F(D, S)
  • Search procedure
  • Identify one or more structures with high values
  • for the scoring function

15
Bayesian Criterion
  • Sh generative distribution p has same
  • independence constraints as S.
  • FBayes(S,D) log p(Sh D)
  • k log p(DSh) log p(Sh)

Structure Prior (e.g. prefer simple)
Marginal Likelihood (closed form w/ assumptions)
16
Consistent Scoring Criterion
Criterion favors (in the limit) simplest model
that includes the generative distribution p
  • S includes p, T does not include p ? F(S,D)
    gt F(T,D)
  • Both include p, S has fewer parameters ? F(S,D)
    gt F(T,D)

X
Y
Z
p
X
Y
Z
X
Y
Z
X
Y
Z
17
Bayesian Criterion is Consistent
  • Assume Conditionals
  • unconstrained multinomials
  • linear regressions

Geiger, Heckerman, King and Meek (2001)
Network structures curved exponential models
Haughton (1988)
Bayesian Criterion is consistent
18
Locally Consistent Criterion
S and T differ by one edge
X
Y
X
Y
S
T
If I(X,YPar(X)) in p then F(S,D) gt
F(T,D) Otherwise F(S,D) lt F(T,D)
19
Bayesian Criterion is Locally Consistent
  • Bayesian score approaches BIC constant
  • BIC is decomposible
  • Difference in score same for any DAGS that differ
    by Y?X edge if X has same parents

X
Y
X
Y
Complete network (always includes p)
20
Bayesian Criterion isScore Equivalent
S?T ? F(S,D) F(T,D)
X
Y
Sh no independence constraints
S
X
Y
Th no independence constraints
T
Sh Th
21
Search Procedure
  • Set of states
  • Representation for the states
  • Operators to move between states
  • Systematic Search Algorithm

22
Greedy Equivalence Search
  • Set of states
  • Equivalence classes of DAGs
  • Representation for the states
  • Essential graphs
  • Operators to move between states
  • Forward and Backward Operators
  • Systematic Search Algorithm
  • Two-phase Greedy

23
Representation Essential Graphs
A
B
C
Compelled Edges Reversible Edges
E
F
D
A
B
C
E
F
D
24
GES Operators
Forward Direction single edge additions
Backward Direction single edge deletions
25
Two-Phase Greedy Algorithm
  • Phase 1 Forward Equivalence Search (FES)
  • Start with all-independence model
  • Run Greedy using forward operators
  • Phase 2 Backward Equivalence Search (BES)
  • Start with local max from FES
  • Run Greedy using backward operators

26
Forward Operators
  • Consider all DAGs in the current state
  • For each DAG, consider all single-edge additions
    (acyclic)
  • Take the union of the resulting equivalence
    classes

27
Forward-Operators Example
Current State
All DAGs
All DAGs resulting from single-edge addition
Union of corresponding essential graphs
28
Forward-Operators Example
29
Backward Operators
  • Consider all DAGs in the current state
  • For each DAG, consider all single-edge deletions
  • Take the union of the resulting equivalence
    classes

30
Backward-Operators Example
Current State
All DAGs
All DAGs resulting from single-edge deletion
B
A
B
A
B
A
B
A
B
A
B
A
C
C
C
C
C
C
Union of corresponding essential graphs
31
Backward-Operators Example
32
DAG Perfect
  • DAG-perfect distribution p
  • Exists DAG G
  • I(X,YZ) in p ? I(X,YZ) in G

Non-DAG-perfect distribution q
B
A
B
A
B
A
D
C
D
C
D
C
I(A,DB,C) I(B,CA,D)
I(B,CA,D)
I(A,DB,C)
33
Optimality of GES
If p is DAG-perfect wrt some G
X
X
X
X Y Z 0 1 1 1 0 1 0 1 0 . .
. 1 0 1
Y
Y
Y
n
iid samples
GES
Z
Z
Z
S
G
S
p
For large n, S S
34
Optimality of GES
BES
FES
State includes S
State equals S
All-independence
  • Proof Outline
  • After first phase (FES), current state includes
    S
  • After second phase (BES), the current state S

35
FES Maximum Includes S
Assume Local Max does NOT include S
Any DAG G from S
Markov Conditions characterize independencies In
p, exists X not indep. non-desc given parents
A
B
C
? I(X,A,B,C,D E) in p
E
X
D
p is DAG-perfect ? composition axiom holds
A
B
C
? I(X,C E) in p
E
X
D
Locally consistent adding C?X edge improves
score, and EQ class is a neighbor
36
BES Identifies S
  • Current state always includes S
  • Local consistency of the criterion
  • Local Minimum is S
  • Meeks conjecture

37
Meeks Conjecture
  • Any pair of DAGs G,H such that H includes G (G
    H)
  • There exists a sequence of
  • covered edge reversals in G
  • (2) single-edge additions to G
  • after each change G H
  • after all changes GH

38
Meeks Conjecture
B
A
I(A,B) I(C,BA,D)
C
D
H
B
A
B
A
B
A
B
A
C
D
C
D
C
D
C
D
G
39
Meeks Conjecture and BESSS
Assume Local Max S Not S
Any DAG H from S
Any DAG G from S
Add
Add
Rev
Rev
Rev
G
H
S
S
Neighbor of S in BES
40
Discussion Points
  • In practice, GES is as fast as DAG-based search
  • Neighborhood of essential graphs can be
    generated and scored very efficiently
  • When DAG-perfect assumption fails, we still get
    optimality guarantees
  • As long as composition holds in generative
    distribution, local maximum is inclusion-minimal

41
Thanks!
  • My Home Page
  • http//research.microsoft.com/dmax
  • Relevant Papers
  • Optimal Structure Identification with Greedy
    Search
  • JMLR Submission
  • Contains detailed proofs of Meeks conjecture and
    optimality of GES
  • Finding Optimal Bayesian Networks
  • UAI02 Paper with Chris Meek
  • Contains extension of optimality results of GES
    when not DAG perfect

42
Active Paths
  • Z-active Path between X and Y (non-standard)
  • Neither X nor Y is in Z
  • Every pair of colliding edges meets at a member
    of Z
  • No other pair of edges meets at a member of Z

X
Z
Y
G H ? If Z-active path between X and Y in
G then Z-active path between X and Y in H
43
Active Paths
X
A
Z
W
B
Y
  • X-Y Out-of X and In-to Y
  • X-W Out-of both X and W
  • Any sub-path between A,B?Z is also active
  • A B, B C, at least one is out-of B
  • ?Active path between A and C

44
Simple Active Paths
A
B
contains Y?X
Then ? active path
(1) Edge appears exactly once
OR
Y
X
B
A
(2) Edge appears exactly twice
A
Y
X
X
Y
B
Simplify discussion Assume (1) only proofs
for (2) almost identical
45
Typical ArgumentCombining Active Paths
X
Y
B
A
X
Y
Z
Z sink node adj X,Y
G
Z
H
X
Y
B
A
X
A
GH
Y
B
Z
G Suppose AP in G (X not in CS) with no
corresp. AP in H. Then Z not in CS.
46
Proof Sketch
  • Two DAGs G, H with GltH
  • Identify either
  • a covered edge X?Y in G that has opposite
    orientation in H
  • a new edge X?Y to be added to G such that it
    remains included in H

47
The Transformation
Choose any node Y that is a sink in H
Case 1a Y is a sink in G X ? ParH(Y) X ?
ParG(Y) Case 1b Y is a sink in G same
parents Case 2a ?X s.t. Y?X covered Case
2b ?X s.t. Y?X W par of Y but not X Case
2c Every Y?X, Par (Y) ? Par(X)
X
Y
X
Y
Y
X
Y
X
Y
W
W
X
Y
X
Y
Y
Y
48
Preliminaries
(G H)
  • The adjacencies in G are a subset of the
    adjacencies in H
  • If X?Y?Z is a v-structure in G but not H, then X
    and Z are adjacent in H
  • Any new active path that results from adding X?Y
    to G includes X?Y

49
Proof Sketch Case 1
Y is a sink in G
Case 1a X ? ParH(Y) X ? ParG(Y)
H
X
Y
X
Y
G
X
Y
Suppose theres some new active path between A
and B not in H
Y
X
B
Z
A
  1. Y is a sink in G, so it must be in CS
  2. Neither X nor next node Z is in CS
  3. In H, AP(A,Z), AP(X,B), Z?Y?X

Case 1b Parents identical Remove Y from both
graphs proof similar
50
Proof Sketch Case 2
Y is not a sink in G
Case 2a There is a covered edge Y?X Reverse
the edge
Case 2b There is a non-covered edge Y?X such
that W is a parent of Y but not a parent of X
W
W
W
G
H
G
X
Y
X
Y
X
Y
Suppose theres some new active path between A
and B not in H
Y must be in CS, else replace W?X by W ? Y ? X
(not new). If X not in CS, then in H active A-W,
X-B, W?Y?X
B
W
A
B
W
A
G
H
Z
X
Y
Z
X
Y
51
Case 2c The Difficult Case
  • All non-covered edges Y?Z have Par(Y) ? Par(Z)

W1
W2
W1
W2
Y
Y
Z1
Z2
Z1
Z2
G
H
W1?Y G no longer lt H (Z2-active path between W1
and W2) W2?Y G lt H
52
Choosing Z
G
H
Y
Y
D
D
Z
Descendants of Y in G
Descendants of Y in G
D is the maximal G-descendant in H Z is any
maximal child of Y such that D is a descendant of
Z in G
53
Choosing Z
G
H
Descendants of Y in G Y, Z1, Z2 Maximal
descendant in H DZ2 Maximal child of Y in G
that has DZ2 as descendant Z2
Add W2?Y
54
Difficult Case Proof Intuition
Y
A
Y
W
W
A
B
B
Z
Z
B or CS
B or CS
D
D
G
H
1. W not in CS 2. Y not in CS, else active in
H 3. In G, next edges must be away from Y until B
or CS reached 4. In G, neither Z nor desc in CS,
else active before addition 5. From (1,2,4), AP
(A,D) and (B,D) in H 6. Choice of D directed
path from D to B or CS in H
55
(No Transcript)
56
Optimality of GES
  • Definition
  • p is DAG-perfect wrt G
  • Independence constraints in p are precisely those
    in G
  • Assumption
  • Generative distribution p is perfect wrt some G
    defined
  • over the observable variables
  • S Equivalence class containing G
  • Under DAG-perfect assumption, GES results in S

57
Important Definitions
  • Bayesian Networks
  • Markov Conditions
  • Distribution/Structure Inclusion
  • Structure/Structure Inclusion
Write a Comment
User Comments (0)
About PowerShow.com