Title: A Graph Model Bayesian Network
1A Graph Model - Bayesian Network
- CSCI 4260 Project
- Xiaoli Zhang, ECSE
- April 27, 2006
2Outline
- Introduction to Bayesian networks
- Inference in BNs
- Triangulation in BNs
- Constructing the junction tree
- Applications in Bayesian networks
-
3Graph Model
- Definition
- A collection of variables (nodes) with a set of
dependencies (edges) between the variables and a
set of probability distribution functions for
each variable - A Bayesian network is a special type of graph
model which is a directed acyclic graph (DAG)
4Bayesian Networks
- A Graph
- nodes represent the random variables
- directed edges (arrows) between pairs of nodes
- it must be a Directed Acyclic Graph (DAG)
- the graph represents relationships between
variables - Conditional probability specifications
- the conditional probability of each variable
given its parents in the DAG
5Bayesian Networks
- Variable A and C are conditionally independent,
given the variable B.
P(A, B, C) P(AB,C) P(B,C) P(AB) P(CB) P(B)
6An Example
Suppose now that there is a similar link between
Lung Cancer (L) and a chest X-ray (X) and that we
also have the following relationships History of
smoking (S) has a direct influence on bronchitis
(B) and lung cancer (L) L and B have a direct
influence on fatigue (F). What is the probability
that someone has bronchitis given that they
smoke, have fatigue and have received a positive
X-ray result?
where, for example, the variable B takes on
values b1 (has bronchitis) and b2 (does not have
bronchitis).
7Problems with Large Instances
- The joint probability distribution,
P(b,s,f,x,l) - For five binary variables there are 25 32
values in the joint distribution (for 100
variables there are over 1030 values) - How are these values to be obtained?
- Inference
- To obtain posterior distributions once some
evidence is available requires summation over an
exponential number of terms eg 22 in the
calculation of
which increases to 297 if there are 100 variables.
8An Example Bayesian Network
P(s1)0.2
P(l1s1)0.003P(l1s2)0.00005
P(b1s1)0.25P(b1s2)0.05
P(f1b1,l1)0.75P(f1b1,l2)0.10P(f1b2,l1)0.5
P(f1b2,l2)0.05
P(x1l1)0.6P(x1l2)0.02
9The Joint Probability Distribution
Note that our joint distribution with 5 variables
can be represented as
Consequently the joint probability distribution
can now be expressed as
For example, the probability that someone has a
smoking history, lung cancer but not bronchitis,
suffers from fatigue and tests positive in an
X-ray test is
10Representing the Joint Distribution
In general, for a network with nodes X1, X2, ,
Xn then
An enormous saving can be made regarding the
number of values required for the joint
distribution. To determine the joint
distribution directly for n binary variables 2n
1 values are required. For a BN with n binary
variables and each node has at most k parents
then less than 2kn values are required.
11Inference in BNs and Junction Tree
- The main point of BNs is to enable probabilistic
inference to be performed. Inference is the task
of computing the probability of each value of a
node in BNs when other variables values are
know. - The general idea is doing inference by
representing the joint probability distribution
on an undirected graph called the Junction tree - The junction tree has the following
characteristics - it is an undirected tree, its nodes are
clusters of variables - given two clusters, C1 and C2, every node on
the path between them contains their
intersection C1 ? C2 - a Separator, S, is associated with each edge
and contains the variables in the
intersection between neighbouring nodes
12Inference in BNs
- Moralize the Bayesian network
- Triangulate the moralized graph
- Let the cliques of the triangulated graph be the
nodes of a tree, and construct the junction tree - Belief propagation throughout the junction tree
to do inference
13Constructing the Junction Tree (1)
Step 1. Form the moral graph from the
DAG Consider BN in our example
Moral Graph marry parents and remove arrows
DAG
14Constructing the Junction Tree (2)
Step 2. Triangulate the moral graph An undirected
graph is triangulated if every cycle of length
greater than 3 possesses a chord
15Constructing the Junction Tree (3)
Step 3. Identify the Cliques A clique is a subset
of nodes which is complete (i.e. there is an edge
between every pair of nodes) and maximal.
Cliques B,S,LB,L,FL,X
?
16Constructing the Junction Tree (4)
Step 4. Build Junction Tree The cliques should be
ordered (C1,C2,,Ck) so they possess the running
intersection property for all 1 lt j k, there
is an i lt j such that Cj ? (C1? ?Cj-1) ? Ci.
To build the junction tree choose one such I for
each j and add an edge between Cj and Ci.
Junction Tree
Cliques B,S,LB,L,FL,X
?
BL
L
17Potentials Initialization
To initialize the potential functions 1. set all
potentials to unity 2. for each variable, Xi,
select one node in the junction tree (i.e. one
clique) containing both that variable and its
parents, pa(Xi), in the original DAG 3. multiply
the potential by P(xipa(xi))
BL
BL
L
18Potential Representation
The joint probability distribution can now be
represented in terms of potential functions, ?,
defined on each clique and each separator of the
junction tree. The joint distribution is given by
The idea is to transform one representation of
the joint distribution to another in which for
each clique, C, the potential function gives the
marginal distribution for the variables in C, i.e.
This will also apply for the separators, S.
19Triangulation
- Given a numbered graph, proceed from node n,
decrease to 1 - Determine the lower-numbered nodes which are
adjacent to the current node, including those
which may have been made adjacent to this node
earlier in this algorithm - Connects these nodes to each other.
20Triangulation
- Numbering the nodes
- Arbitrarily number the nodes
- Maximum cardinality search
- Give any node a value of 1
- For each subsequent number, pick an new
unnumbered node that neighbors the most already
numbered nodes
21Triangulation
Moralized graph
BN
22Triangulation
8
5
3
6
4
7
2
1
Arbitrary numbering
23Triangulation
Maximum cardinality search
24Model Context as Statistical Dependence in OCR
- Pattern pair z (x1, x2)
- x1 (u1, v1) x2 (u2, v2)
- class label (y1, y2)
- P(x1, y1, x2, y2)
- (y1, y2) argmax P(x1, y1, x2, y2)
- (y1,y2)
25Modeling Context in OCR
- No dependence
- P(x1, y1, x2, y2) P(u1, v1, y1, u2, v2, y2)
- P(u1y1) P(v1y1) P(u2y2)P(v2y2)P(y1)P(y2)
26Modeling Context in OCR
- Intra-pattern class-conditional feature
dependence - P(x1, y1, x2, y2)P(u1,v1y1)P(u2,v2y2)P(y1)P(y2)
27Modeling Context in OCR
- Inter-pattern class dependence (linguistic
context) - P(x1, y1, x2, y2) P(x1y1) P(x2y2)P(y1,y2)
28Modeling Context in OCR
- Inter-pattern class-feature dependence
- P(x1, y1, x2, y2) P(x1y1,y2)
P(x2y1,y2)P(y1)P(y2)
29Reference
- Todd A. Stephenson, An introduction to Bayesian
network theory and usage, IDIAP-RR 00-03,
February 2000. - D. Heckerman, Bayesian Networks for Data Mining,
Data Mining and Knowledge Discovery, 179-119,
1997. - S. Veeramachaneni, G. Nagy, Style Context with
Second Order Statistics, IEEE Trans. PAMI 27, 1,
88-98, 2005
30 31Inference in Bayesian Networks
- The main point of BNs is to enable probabilistic
inference to be performed. - There are two main types of inference to be
carried out - Belief updating to obtain the posterior
probability of one or more variables given
evidence concerning the values of other variables - Abductive inference (or belief updating)
find the most probable configuration of a
set of variables (hypothesis) given evidence - Consider the BN discussed earlier
What is the probability that someone has
bronchitis (B) given that they smoke (S) have
fatigue (F) and have received a positive X-ray
(X) result?