Title: Bayesian Networks
1Bayesian Networks
2Contents
- Introduction
- Probability Theory ? Skip
- Inference
- Clique Tree Propagation
- Building the Clique Tree
- Inference by Propagation
3Bayesian Networks
- Introduction
- ???????
- ?????????
4What is Bayesian Networks?
- Bayesian Networks are directed acyclic graphs
(DAGs) with an associated set of probability
tables. - The nodes are random variables.
- Certain independence relations can be induced by
the topology of the graph.
5Why Use a Bayesian Network?
- Deal with uncertainty in inference via
probability ? Bayes. - Handle incomplete data set, e.g., classification,
regression. - Model the domain knowledge, e.g., causal
relationships.
6Example
Use a DAG to model the causality.
Train Strike
Norman Oversleep
Martin Oversleep
Martin Late
Norman Late
Project Delay
Office Dirty
Boss Failure-in-Love
Boss Angry
7Example
Attach prior probabilities to all root nodes
Norman oversleep Probability
T 0.2
F 0.8
Train Strike Probability
T 0.1
F 0.9
Martin oversleep Probability
T 0.01
F 0.99
Boss failure-in-love Probability
T 0.01
F 0.99
8Example
Attach prior probabilities to non-root nodes
Each column is summed to 1.
Norman untidy
Train strike Train strike Train strike Train strike
T T F F
Martin oversleep Martin oversleep Martin oversleep Martin oversleep
T F T F
Martin Late T 0.95 0.8 0.7 0.05
Martin Late F 0.05 0.2 0.3 0.95
Norman oversleep Norman oversleep
T F
Norman untidy T 0.6 0.2
Norman untidy F 0.4 0.8
9Example
Attach prior probabilities to non-root nodes
Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love Boss Failure-in-love
T T T T F F F F
Project Delay Project Delay Project Delay Project Delay Project Delay Project Delay Project Delay Project Delay
T T F F T T F F
Office Dirty Office Dirty Office Dirty Office Dirty Office Dirty Office Dirty Office Dirty Office Dirty
T F T F T F T F
Boss Angry very 0.98 0.85 0.6 0.5 0.3 0.2 0 0.01
Boss Angry mid 0.02 0.15 0.3 0.25 0.5 0.5 0.2 0.02
Boss Angry little 0 0 0.1 0.25 0.2 0.3 0.7 0.07
Boss Angry no 0 0 0 0 0 0 0.1 0.9
Each column is summed to 1.
Norman untidy
What is the difference between probability
fuzzy measurements?
10Example
Medical Knowledge
11Definition of Bayesian Networks
- A Bayesian network is a directed acyclic graph
with - the following properties
- Each node represents a random variable.
- Each node representing a variable A with parent
nodes representing variables B1, B2,..., Bn is
assigned a conditional probability table (CPT)
12Problems
- How to inference?
- How to learn the probabilities from data?
- How to learn the structure from data?
- What applications we may have?
Bad news All of them are NP-Hard
13Bayesian Networks
- Inference
- ???????
- ?????????
14Inference
15Example
Train Strike Probability
T 0.1
F 0.9
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
Questions
P (Martin Late, Norman Late, Train
Strike)?
Joint distribution
P(Martin Late)?
Marginal distribution
Conditional distribution
P(Matrin Late Norman Late )?
16Example
A B C Probability
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
Demo
Train Strike Probability
T 0.1
F 0.9
C
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
B
A
Questions
P (Martin Late, Norman Late, Train
Strike)?
Joint distribution
e.g.,
17Example
A B C Probability
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
A B Probability
T T 0.093
F T 0.077
T F 0.417
F F 0.413
Demo
Train Strike Probability
T 0.1
F 0.9
C
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
B
A
Questions
P (Martin Late, Norman Late)?
Marginal distribution
e.g.,
18Example
A B C Probability
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
A B Probability
T T 0.093
F T 0.077
T F 0.417
F F 0.413
Train Strike Probability
T 0.1
F 0.9
C
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
B
A
A Probability
T 0.51
F 0.49
Demo
Questions
P (Martin Late)?
Marginal distribution
e.g.,
19Example
A B C Probability
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
A B Probability
T T 0.093
F T 0.077
T F 0.417
F F 0.413
Train Strike Probability
T 0.1
F 0.9
C
Train Strike Train Strike
T F
Norman Late T 0.8 0.1
Norman Late F 0.2 0.9
Train Strike Train Strike
T F
Martin Late T 0.6 0.5
Martin Late F 0.4 0.5
B
A
A Probability
T 0.51
F 0.49
B Probability
T 0.17
F 0.83
Questions
P (Martin Late Norman Late )?
Conditional distribution
e.g.,
Demo
20Inference Methods
- Exact Algorithms
- Probability propagation
- Variable elimination
- Cutset Conditioning
- Dynamic Programming
- Approximation Algorithms
- Variational methods
- Sampling (Monte Carlo) methods
- Loopy belief propagation
- Bounded cutset conditioning
- Parametric approximation methods
21Independence Assertions
The given terms are called evidences.
- Bayesian Networks have build-in independent
assertions. - An independence assertion is a statement of the
form - X and Y are independent given Z
- We called that X and Y are d-separated by Z.
That is,
or
22d-Separation
Z
23Type of Connections
Serial Connections
Yi Z Xj
Converge Connections
Y1/2 Z Y3/4
Z
Y3 Z Y4
Diverge Connections
Xi Z Xj
24d-Separation
Serial
Converge
Diverge
25Joint Distribution
JPT Joint probability table
CPT Conditional probability table
? With this, we can compute all probabilities
By chain rule
By independence assertions
Parents of Xi
Consider binary random variables
- To store JPT of all r.vs 2n ?1 table entries
- To store CPT of all r.vs ? table entries
26Joint Distribution
Consider binary random variables
- To store JPT of all r.vs 2n ?1 table entries
- To store CPT of all r.vs ? table entries
27Joint Distribution
To store JPT of all random variables
To store CPT of all random variables
28More on d-Separation
A path from X to Y is d-connecting w.r.t evidence
nodes E is every interior nodes N in the path has
the property that either
- It is linear or diverge and not a member of E or
- It is converging, and either N or one of its
descendants is in E.
29More on d-Separation
Identify the d-connecting and non-d-connecting
paths from X to Y.
A path from X to Y is d-connecting w.r.t evidence
nodes E is every interior nodes N in the path has
the property that either
- It is linear or diverge and not a member of E or
- It is converging, and either N or one of its
descendants is in E.
30More on d-Separation
Two nodes are d-separated if there is no
d-connecting path between them.
Exercise
Withdraw minimum number of edges such that X and
Y are d-separated.
31More on d-Separation
Two set of nodes, say, XX1, , Xm and YY1,
, Yn are d-separated w.r.t. evidence nodes E if
any pair of Xi and Yj are d-separated w.r.t. E.
In this case, we have
32Bayesian Networks
- Clique Tree Propagation
- ???????
- ?????????
33References
- Developed by Lauritzen and Spiegelhalter and
refined by Jensen et al.
Lauritzen, S. L., and Spiegelhalter, D. J., Local
computations with probabilities on graphical
structures and their application to expert
systems, J. Roy. Stat. Soc. B, 50, 157-224,
1988. Jensen, F. V., Lauritzen, S. L., and
Olesen, K. G., Bayesian updating in causal
probabilistic networks by local computations,
Comp. Stat. Quart., 4, 269-282, 1990. Shenoy,
P., and Shafer, G., Axioms for probability and
belief-function propagation, in Uncertainty and
Articial Intelligence, Vol. 4 (R. D. Shachter, T.
Levitt, J. F. Lemmer and L. N. Kanal, Eds.),
Elsevier, North-Holland, Amsterdam, 169-198, 1990.
34Clique Tree Propagation (CTP)
- Given a Bayesian Network, build a secondary
structure, called clique tree. - An undirected tree
- Inference by propagation the belief potential
among tree nodes. - It is an exact algorithm.
35Notations
Item Notation Notation Examples
Random variables uninitiated uppercase A, B, C
Random variables initiated lowercase a, b, c
Random vectors uninitiated Boldface uppercase X, Y, Z
Random vectors initiated Boldface lowercase x, y, z
36Definition Family of a Node
The family of a node V, denoted as FV, is defined
by
Examples
37Potential and Distributions
We will model the probability tables as potential
functions.
a P(a)
on 0.5
off 0.5
Function of a.
All of these tables map a set of random variables
to a real value.
Prior probability
b b a a
b b on off
P(b a) on 0.7 0.2
P(b a) off 0.3 0.8
Conditional probability
Conditional probability
f f d d d d
f f on on off off
f f e e e e
f f on off on off
P(f de) on 0.95 0.8 0.7 0.05
P(f de) off 0.05 0.2 0.3 0.95
Function of a and b.
Function of d, e and f.
38Potential
Used to implement matrices or tables.
Two operations
1. Marginalization 2. Multiplication
39Marginalization
A B C ?ABC
T T T 0.048
F T T 0.032
T F T 0.012
F F T 0.008
T T F 0.045
F T F 0.045
T F F 0.405
F F F 0.405
Example
A B ?AB
T T 0.093
F T 0.077
T F 0.417
F F 0.413
A ?A
T 0.51
F 0.49
40Multiplication
A B C ?ABC
T T T 0.093? 0.080.00744
F T T 0.077? 0.080.00616
T F T 0.417? 0.020.00834
F F T 0.413? 0.020.00826
T T F 0.093? 0.090.00837
F T F 0.077? 0.090.00693
T F F 0.417? 0.910.37947
F F F 0.413? 0.910.37583
Not necessary sum to one.
x and y are consistent with z.
Example
A B ?AB
T T 0.093
F T 0.077
T F 0.417
F F 0.413
B C ?AB
T T 0.08
F T 0.02
T F 0.09
F F 0.91
41The Secondary Structure
Given a Bayesian Network over a set of
variables U V1, , Vn , its secondary
structure contains a graphical and a numerical
component.
Graphic Component
An undirected clique tree satisfies the join
tree property.
Numerical Component
Belief potentials on nodes and edges.
42The Clique Tree T
How to build a clique tree?
The clique tree T for a belief network over a
set of variables U V1, , Vn satisfies the
following properties.
- Each node in T is a cluster or clique (nonempty
set) of variables. - The clusters satisfy the join tree property
- Given two clusters X and Y in T, all clusters on
the path between X and Y contain X??Y. - For each variable V?U, FV is included in at least
one of the cluster. - Sepsets Each edge in T is labeled with the
intersection of the adjacent clusters.
43The Numeric Component
How to assign belief functions?
Clusters and sepsets are attached with belief
functions.
- For each cluster X and neighboring sepset S, it
holds that - It also holds that
Local Consistency
Global Consistency
44The Numeric Component
How to assign belief functions?
Clusters and sepsets are attached with belief
functions.
The key step to satisfy these constraints by
letting
and
If so,
45Bayesian Networks
- Building the Clique Tree
- ???????
- ?????????
46The Steps
Belief Network
Moral Graph
Triangulated Graph
Clique Set
Join Tree
47Moral Graph
Belief Network
Moral Graph
- Convert the directed graph to undirected.
- Connect each pair of parent nodes for each node.
48Triangulation
This step is, in fact, done by incorporating with
the next step.
Moral Graph
Triangulated Graph
- Triangulate the cycles with length more than 4
There are many ways.
49Select Clique Set
- Copy GM to GM.
- While GM is not empty
- select a node V from GM, according to a
criterion. - Node V and its neighbor form a cluster.
- Connect all the nodes in the cluster. For each
edge added to GM, add the same edge to GM. - Remove V from GM.
50Select Clique Set
- Criterion
- The weight of a node V is the number of values of
V. - The weight of a cluster is the product of it
constituent nodes. - Choose the node that causes the least number of
edges to be added. - Breaking ties by choosing the node that induces
the cluster with the smallest weight.
- Copy GM to GM.
- While GM is not empty
- select a node V from GM, according to a
criterion. - Node V and its neighbor form a cluster.
- Connect all the nodes in the cluster. For each
edge added to GM, add the same edge to GM. - Remove V from GM.
51Select Clique Set
- Criterion
- The weight of a node V is the number of values of
V. - The weight of a cluster is the product of it
constituent nodes. - Choose the node that causes the least number of
edges to be added. - Breaking ties by choosing the node that induces
the cluster with the smallest weight.
52Building an Optimal Join Tree
We need to find minimal number of edges to
connect these cliques, i.e. to build a tree.
Given n nodes to build a tree, n?1 edges are
required.
There are many ways.
How to achieve optimality?
53Building an Optimal Join Tree
- Begin with a set of n trees, each consisting of a
single clique, and an empty set S. - For each distinct pair of cliques X and Y
- Create a candidate sepset SXY X?Y, with
backpointers to X and Y. - Insert SXY to S.
- Repeat until n?1 sepsets have been inserted into
the forest. - Select a sepset SXY from S, according to the
criterion described in the next slide. Delete SXY
from S. - Insert SXY between cliques X and Y only if X and
Y are on different trees in the forest.
54Building an Optimal Join Tree
- Criterion
- The mass of SXY is the number of nodes in X?Y.
- The cost of SXY is the weight X plus the weight
Y. - The weight of a node V is the number of values of
V. - The weight of a set of nodes X is the product of
it constituent nodes in X. - Choose the sepset with causes the largest mass.
- Breaking ties by choosing the sepset with the
smallest cost.
- Begin with a set of n trees, each consisting of a
single clique, and an empty set S. - For each distinct pair of cliques X and Y
- Create a candidate sepset SXY X?Y, with
backpointers to X and Y. - Insert SXY to S.
- Repeat until n?1 sepsets have been inserted into
the forest. - Select a sepset SXY from S, according to the
criterion described in the next slide. Delete SXY
from S. - Insert SXY between cliques X and Y only if X and
Y are on different trees in the forest.
55Building an Optimal Join Tree
Graphical Transformation
56Bayesian Networks
- Inference by Propagation
- ???????
- ?????????
57Inferences
Inference without evidence
Inference with evidence
PPTC Probability Propagation in Tree of Cliques.
58Inference without Evidence
Demo
59Procedure for PPTC without Evidence
Belief Network
Building Graphic Component
Graphical Transformation
Join Tree Structure
Building Numeric Component
Initialization
Inconsistent Join Tree
Propagation
Consistent Join Tree
Marginalization
60Initialization
- For each cluster and sepset X, set each ?X(x) to
1 - For each variable V
- Assign to V a cluster X that contains FV call X
the parent cluster of FV. - Multiply ?X(x) by P(V ?V).
61Initialization
62Initialization
By independence assertions
N clusters Q variables
63Initialization
By independence assertions
N clusters Q variables
After initialization, global consistency is
satisfied, but local consistency is not.
64Global Propagation
It is used to achieve local consistency.
Lets consider single message passing first.
Message Passing
Absorption on receiving cluster
Projection on sepset
65The Effect of Single Message Passing
Message Passing
Absorption on receiving cluster
Projection on sepset
66Global Propagation
- Choose an arbitrary cluster X.
- Unmark all clusters. Call Ingoing-Propagation(X).
- Unmark all clusters. Call Outgoing-Propagation(X).
67Global Propagation
- Choose an arbitrary cluster X.
- Unmark all clusters. Call Ingoing-Propagation(X).
- Unmark all clusters. Call Outgoing-Propagation(X).
- Ingoing-Propagation(X)
- Mark X.
- Call Ingoing-Propagation recursively on Xs
unmarked neighboring clusters, if any. - Pass a message from X to the cluster which
invoked Ingoing-Propagation(X).
- Outgoing-Propagation(X)
- Mark X.
- Pass a message from X to each of its unmarked
clusters, if any. - Call Outgoing-Propagation recursively on Xs
unmarked neighboring clusters, if any.
1
3
5
After global propagation, The clique tree is both
global and local consistent.
8
6
9
2
7
10
4
68Marginalization
Consistent Join Tree
69ReviewProcedure for PPTC without Evidence
Belief Network
Building Graphic Component
Graphical Transformation
Join Tree Structure
Building Numeric Component
Initialization
Inconsistent Join Tree
Propagation
Consistent Join Tree
Marginalization
70Inference with Evidence
Demo
71Observations
- Observations are the simplest forms of evidences.
- An observations is a statement of the form V v.
- Collections of observations may be denoted by
- Observations are referred to as hard evidence.
E e
An instantiation of a set of variable E.
72Likelihoods
Given E e, the likelihood of V, denoted as ?V,
is defined as
73Likelihoods
Variable V ?V(v) ?V(v)
Variable V v on v off
A 1 1
B 1 1
C 1 0
D 0 1
E 1 1
F 1 1
G 1 1
H 1 1
on
off
74Procedure for PPTC with Evidence
- Initialization
- Observation Entry
- Marginalization
- Normalization
75Initialization with Observations
- Set each likelihood element ?V(v) to 1
76Observation Entry
- Encode the observation V v as
- Identify a cluster X that contains V.
- Update ?X and ?V
77Marginalization
After global propagation,
78Normalization
After global propagation,
Normalization
79Handling Dynamic Observations
e2
e1
How to handle the consistency if the observation
is changed to e2?
Suppose that the join tree now is consistent for
e1.
80Observation States
e2
e1
Three observation states for a variable, say, V
- No change
- Update
- Retraction
V is unobserved ? observed
V is observed ? unobserved
or
V v1 ? V v2 , v1 ? v2
81Handling Dynamic Observations
Global Update
Global Retraction
When?
When?