Title: Bayesian Networks
1Used in Spring 2012, Spring 2013, Winter 2014
(partially)
- Bayesian Networks
- Conditional Independence
- Creating Tables
- Notations for Bayesian Networks
- Calculating conditional probabilities from the
tables - Calculating conditional independence
- Markov Chain Monte Carlo
- Markov Models.
- Markov Models and Probabilistic methods in vision
2Introduction to Probabilistic Robotics
- Probabilities
- Bayes rule
- Bayes filters
- Bayes networks
- Markov Chains
new
next
3Bayesian Networks and Markov Models
- Bayesian networks and Markov models
- Applications in User Modeling
- Applications in Natural Language Processing
- Applications in robotic control
- Applications in robot Vision
4Bayesian Networks (BNs) Overview
- Introduction to BNs
- Nodes, structure and probabilities
- Reasoning with BNs
- Understanding BNs
- Extensions of BNs
- Decision Networks
- Dynamic Bayesian Networks (DBNs)
5Definition of Bayesian Networks
- A data structure that represents the dependence
between variables - Gives a concise specification of the joint
probability distribution - A Bayesian Network is a directed acyclic graph
(DAG) in which the following holds - A set of random variables makes up the nodes in
the network - A set of directed links connects pairs of nodes
- Each node has a probability distribution that
quantifies the effects of its parents
6Conditional Independence
- The relationship between
- conditional independence
- and BN structure
- is important for understanding how BNs work
7Conditional Independence Causal Chains
- Causal chains give rise to conditional
independence - Example Smoking causes cancer, which causes
dyspnoea
smoking
cancer
dyspnoea
8Conditional Independence Common Causes
- Common Causes (or ancestors) also give rise to
conditional independenceExample Cancer
is a common cause of the two symptoms a positive
Xray and dyspnoea
cancer
B
dyspnoea
Xray
A
C
(
)
? (A indep C) B
I have dyspnoea (C ) because of cancer (B) so I
do not need an Xray test
9Conditional Dependence Common Effects
- Common effects (or their descendants) give rise
to conditional dependence - Example Cancer is a common effect of pollution
and smokingGiven cancer, smoking explains
away pollution
pollution
A
C
???
smoking
B
cancer
pollution
cancer
(
)
?
We know that you smoke and have cancer, we do not
need to assume that your cancer was caused by
pollution
10Joint Distributions for describing uncertain
worlds
- Researchers found already numerous and dramatic
benefits of Joint Distributions for describing
uncertain worlds - Students in robotics and Artificial
Intelligence have to understand problems with
using Joint Distributions - You should discover how Bayes Net methodology
allows us to build Joint Distributions in
manageable chunks
11Bayes Net methodology
Why Bayesian methods matter?
- Bayesian Methods are one of the most important
conceptual advances in the Machine Learning / AI
field to have emerged since 1995. - A clean, clear, manageable language and
methodology for expressing what the robot
designer is certain and uncertain about - Already, many practical applications in
medicine, factories, helpdesks for instance - P(this problem these symptoms) // we will use
P as probability - anomalousness of this observation
- choosing next diagnostic test these
observations
12(No Transcript)
13Problem 1 Creating Joint Distribution Table
- Joint Distribution Table is an important concept
14Probabilistic truth table
- You can guess this table, you can take data from
some statistics, - You can build this table based on some partial
tables
Truth table of all combinations of Boolean
Variables
15- Idea use decision diagrams to represent these
data.
16- Use of independence while creating the tables
17- Wet Sprinkler Rain Example
18W
S
R
Wet-Sprinkler-Rain Example
19- Problem 1
- Creating the Joint Table
20Our Goal is to derive this table
Let us observe that if I know 7 of these, the
eight is obviously unique , as their sum 1
So I need to guess or calculate or find 2n-1 7
values
But the same data can be stored explicitely or
implicitely, not necessarily in the form of a
table!!
What extra assumptions can help to create this
table?
21Wet-Sprinkler-Rain Example
22Sprinkler on under condition that it rained
You need to understand causation when you create
the table
Wet-Sprinkler-Rain Example
Understanding of causation
23Independence simplifies probabilities
We use independence of variables S and R
P(SR) Sprinkler on under condition that it
rained
We can use these probabilities to create the table
S and R are independent
Wet-Sprinkler-Rain Example
24Wet-Sprinkler-Rain Example
We create the CPT for S and R based on our
knowledge of the problem
Conditional Probability Table (CPT)
It rained
Sprinkler was on
Grass is wet
What about children playing or a dog pissing? It
is still possible by this value 0.1
This first step shows the collected data
25Full joint for only S and R
Independence of S and R is used
0.95
0.90
0.90
0.01
Wet-Sprinkler-Rain Example
Use chain rule for probabilities
26Chain Rule for Probabilities
Random variables
0.95
0.90
0.90
0.01
27Full joint probability
- You have a table
- You want to calculate some probability
P(W)
Wet-Sprinkler-Rain Example
28Independence of S and R implies calculating fewer
numbers to create the complete Joint Table for W,
S and R
Six numbers
We reduced only from seven to six numbers
Wet-Sprinkler-Rain Example
29- Explanation of Diagrammatic Notations
- such as Bayes Networks
You do not need to build the complete table!!
30You can build a graph of tables or nodes which
correspond to certain types of tables
31Wet-Sprinkler-Rain Example
32Wet-Sprinkler-Rain Example
It rained
Sprinkler was on
Grass is wet
This first step shows the collected data
Conditional Probability Table (CPT)
33Full joint probability
- You have a table
- You want to calculate some probability
When you have this table you can modify it, you
can also calculate everything!!
P(W)
34- Problem 2
- Calculating conditional probabilities from the
Joint Distribution Table
35Wet-Sprinkler-Rain Example
Probability that ST and WT
Probability that grass is wet under assumption
that sprinkler was on
Probability that ST
36Wet-Sprinkler-Rain Example
37We showed examples of both causal inference and
diagnostic inference
We will use this in next slide
Wet-Sprinkler-Rain Example
38Explaining Away the facts from the table
Calculated earlier from this table
lt
Wet-Sprinkler-Rain Example
39Conclusions on this problem
- Table can be used for Explaining Away
- Table can be used to calculate conditional
independence. - Table can be used to calculate conditional
probabilities - Table can be used to determine causality
40- Problem 3 What if S and R are dependent?
- Calculating
- conditional independence
41Conditional Independence of S and R
Wet-Sprinkler-Rain Example
42Diagrammatic notation for conditional
Independence of two variables
Wet-Sprinkler-Rain Example extended
43Conditional Independence formalized for sets of
variables
S3
S1
S2
44Now we will explain conditional independence
CLOUDY - Wet-Sprinkler-Rain Example
45Example Lung Cancer Diagnosis
46Example Lung Cancer Diagnosis
- A patient has been suffering from shortness of
breath (called dyspnoea) and visits the doctor,
worried that he has lung cancer. - The doctor knows that other diseases, such as
tuberculosis and bronchitis are possible causes,
as well as lung cancer. - She also knows that other relevant information
includes whether or not the patient is a smoker
(increasing the chances of cancer and bronchitis)
and what sort of air pollution he has been
exposed to. - A positive Xray would indicate either TB or lung
cancer.
47Nodes and Values in Bayesian Networks
- Q What are the nodes to represent and what
values can they take? - A Nodes can be discrete or continuous
- Boolean nodes represent propositions taking
binary valuesExample Cancer node represents
proposition the patient has cancer - Ordered valuesExample Pollution node with
values low, medium, high - Integral valuesExample Age with possible values
1-120
Lung Cancer
48Lung Cancer Example Nodes and Values
Node name Type Values
Pollution Binary low,high
Smoker Boolean T,F
Cancer Boolean T,F
Dyspnoea Boolean T,F
Xray Binary pos,neg
Shortness of breath
Example of variables as nodes in BN
49Lung Cancer Example Bayesian Network Structure
Pollution
Smoker
Cancer
Xray
Dyspnoea
Lung Cancer
50Conditional Probability Tables (CPTs) in
Bayesian Networks
51Conditional Probability Tables (CPTs) in Bayesian
Networks
- After specifying topology, we must specify
- the CPT for each discrete node
- Each row of CPT contains the conditional
probability of each node value for each possible
combination of values in its parent nodes - Each row of CPT must sum to 1
- A CPT for a Boolean variable with n Boolean
parents contains 2n1 probabilities - A node with no parents has one row (its prior
probabilities)
52Lung Cancer Example example of CPT
Smoking is true
Probability of cancer given state of variables P
and S
Pollution low
C cancer P pollution S smoking X Xray D
Dyspnoea
Bayesian Network for cancer
Lung Cancer
53- Several small CPTs are used to create larger JDTs.
54The Markov Property for Bayesian Networks
- Modelling with BNs requires assuming the Markov
Property - There are no direct dependencies in the system
being modelled which are not already explicitly
shown via arcs - Example smoking can influence dyspnoea only
through causing cancer
55Software NETICA for Bayesian Networks and
joint probabilities
56Reasoning with Numbers Using Netica software
Here are the collected data
Lung Cancer
57Representing the Joint Probability Distribution
Example
We want to calculate this
P pollution S smoking X Xray D Dyspnoea
This graph shows how we can calculate the joint
probability from other probabilities in the
network
Lung Cancer
58Problem 4Determining Causality and Bayes Nets
Advertisement example
59Causality and Bayes Nets Advertisement example
- Bayes nets allow one to learn about causal
relationships - One more Example
- Marketing analysts want to know whether to
increase, decrease or leave unchanged the
exposure of some advertisement in order to
maximize profit from the sale of some product - Advertised (A) and Buy (B) will be variables for
someone having seen the advertisement or
purchased the product
Advertised-Buy Example
60Causality Example
- So we want to know the probability that Btrue
given that we force Atrue, or Afalse - We could do this by finding two similar
populations and observing B based on Atrue for
one and Afalse for the other - But this may be difficult or expensive to find
such populations
- Advertised (A) seen the advertisement
- Buy (B) will be variables for someone purchased
the product
Advertised-Buy Example
61How causality can be represented in a graph?
62Markov condition and Causal Markov Condition
- But how do we learn whether or not A causes B at
all? - The Markov Condition states
- Any node in a Bayes net is conditionally
independent of its non-descendants given its
parents - The CAUSAL Markov Condition (CMC) states
- Any phenomenon in a causal net is independent of
its non-effects given its direct causes
Advertised (A) and Buy (B)
Advertised-Buy Example
63Acyclic Causal Graph versus Bayes Net
- Thus, if we have a directed acyclic causal graph
C for variables in X, then, by the Causal Markov
Condition, C is also a Bayes net for the joint
probability distribution of X - The reverse is not necessarily truea network may
satisfy the Markov condition without depicting
causality
Advertised-Buy Example
64Causality Example when we learn that p(ba) and
p(ba) are not equal
- Given the Causal Markov Condition CMC, we can
infer causal relationships from conditional
(in)dependence relationships learned from the
data - Suppose we learn with high Bayesian probability
that p(ba) and p(ba) are not equal - Given the CMC, there are four simple causal
explanations for this (more complex ones too)
65Causality Example four causal explanations
If they advertise more you buy more
B causes A If you buy more, they have more money
to advertise
66Causality Example four causal explanations
selection bias
- Hidden common cause of A and B (e.g. income)
- A and B are causes for data selection
- (a.k.a. selection bias,
- perhaps if database didnt record false instances
of A and B)
In rich country they advertise more and they buy
more
If you increase information about Ad in database,
then you increase also information about Buy in
the database
67Causality Example continued
- But we still dont know if A causes B
- Suppose
- We learn about the Income (I) and geographic
Location (L) of the purchaser - And we learn with high Bayesian probability the
network on the right
Advertised (AAd) and Buy (B)
Advertised-Buy Example
68Causality Example - using CMC
- Given the Causal Markov Condition CMC, the ONLY
causal explanation for the conditional
(in)dependence relationships encoded in the Bayes
net is that Ad is a cause for Buy - That is, none of the other relationships or
combinations thereof produce the probabilistic
relationships encoded here
Advertised (Ad) and Buy (B)
Advertised-Buy Example
69Causality in Bayes Networks
- Thus, Bayes Nets allow inference of causal
relationships by the Causal Markov Condition (CMC)
70Problem 5Determine D-separation in Bayesian
Networks
71D-separation in Bayesian Networks
- We will formulate a Graphical Criterion of
conditional independence - We can determine whether a set of nodes X is
independent of another set Y, given a set of
evidence nodes E, via the Markov property - If every undirected path from a node in X to a
node in Y is d-separated by E, then X and Y are
conditionally independent given E
72Determining D-separation (cont)
- A set of nodes E d-separates two sets of nodes X
and Y, if every undirected path from a node in X
to a node in Y is blocked given E - A path is blocked given a set of nodes E, if
there is a node Z on the path for which one of
three conditions holds - Z is in E and Z has one arrow on the path leading
in and one arrow out (chain) - Z is in E and Z has both path arrows leading out
(common cause) - Neither Z nor any descendant of Z is in E, and
both path arrows lead into Z (common effect)
Chain Common cause Common effect
73Another Example of Bayesian Networks Alarm
Alarm Example
- Let us draw BN from these data
74Bayes Net Corresponding to Alarm-Burglar problem
Alarm Example
75Compactness, Global Semantics, Local Semantics
and Markov Blanket
Earthquake
Burglar
John calls
Mary calls
Alarm Example
76Global Semantics, Local Semantics and Markov
Blanket for BNs
77Alarm Example
78(No Transcript)
79- Markovs blanket are
- Parents
- Children
- Childrens parent
80Problem 6How to systematically Build a Bayes
Network -- Example
81(No Transcript)
82Alarm Example
83Alarm Example
84Alarm Example
85So we add arrow
Alarm Example
86Alarm Example
87Alarm Example
88Bayes Net for the car that does not want to start
Such networks can be used for robot diagnostics
or diagnostic of a human done by robot
89Inference in Bayes Nets and how to simplify it
Alarm Example
90First method of simplification Enumeration
Alarm Example
91Alarm Example
92Second method Variable Elimination
Alarm Example
Variable A was eliminated
Variable E was eliminated
93Polytrees are better
3SAT Example
94IDEA Convert DAG to polytrees
95Clustering is used to convert non-polytree BNs
96EXAMPLE Clustering is used to convert
non-polytree BNs
Not a polytree
Is a polytree
Alarm Example
97- Approximate Inference
- Direct sampling methods
- Rejection sampling
- Likelihood weighting
- Markov chain Monte Carlo
98- 1. Direct Sampling Methods
99Direct Sampling
Direct Sampling generates minterms with their
probabilities
100We start from top
Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
101Wwet C cloudy R rain S sprinkler
Cloudy yes
Wet Sprinkler Rain Example
102Wwet C cloudy R rain S sprinkler
Cloudy yes
Wet Sprinkler Rain Example
103Wwet C cloudy R rain S sprinkler
sprinkler no
Wet Sprinkler Rain Example
104Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
105Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
106Wwet C cloudy R rain S sprinkler
We generated a sample minterm C S R W
Wet Sprinkler Rain Example
107- 2. Rejection Sampling Methods
108Rejection Sampling
- Reject inconsistent samples
Wet Sprinkler Rain Example
109- 3. Likelihood weighting methods
110(No Transcript)
111Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
112Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
113Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
114Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
115Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
116Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
117Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
118Likelihood weighting vs. rejection sampling
- Both generate consistent estimates of the joint
distribution conditioned on the values of the
evidence variables - Likelihood weighting converges faster to the
correct probabilities - But even likelihood weighting degrades with many
evidence variables because a few samples will
have nearly all the total weight
119(No Transcript)
120Sources
- Prof. David Page
- Matthew G. Lee
- Nuria Oliver,
- Barbara Rosario
- Alex Pentland
- Ehrlich Av
- Ronald J. Williams
- Andrew Moore tutorial with the same title
- Russell Norvigs AIMA site
- Alpaydins Introduction to Machine Learning
site.