Bayesian Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Bayesian Networks

Description:

Title: PowerPoint Presentation Last modified by: mperkows Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show (4:3) Other titles – PowerPoint PPT presentation

Number of Views:279
Avg rating:3.0/5.0
Slides: 121
Provided by: pdx53
Learn more at: http://web.cecs.pdx.edu
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Networks


1
Used in Spring 2012, Spring 2013, Winter 2014
(partially)
  1. Bayesian Networks
  2. Conditional Independence
  3. Creating Tables
  4. Notations for Bayesian Networks
  5. Calculating conditional probabilities from the
    tables
  6. Calculating conditional independence
  7. Markov Chain Monte Carlo
  8. Markov Models.
  9. Markov Models and Probabilistic methods in vision

2
Introduction to Probabilistic Robotics
  1. Probabilities
  2. Bayes rule
  3. Bayes filters
  4. Bayes networks
  5. Markov Chains

new
next
3
Bayesian Networks and Markov Models
  • Bayesian networks and Markov models
  • Applications in User Modeling
  • Applications in Natural Language Processing
  • Applications in robotic control
  • Applications in robot Vision

4
Bayesian Networks (BNs) Overview
  • Introduction to BNs
  • Nodes, structure and probabilities
  • Reasoning with BNs
  • Understanding BNs
  • Extensions of BNs
  • Decision Networks
  • Dynamic Bayesian Networks (DBNs)

5
Definition of Bayesian Networks
  • A data structure that represents the dependence
    between variables
  • Gives a concise specification of the joint
    probability distribution
  • A Bayesian Network is a directed acyclic graph
    (DAG) in which the following holds
  • A set of random variables makes up the nodes in
    the network
  • A set of directed links connects pairs of nodes
  • Each node has a probability distribution that
    quantifies the effects of its parents

6
Conditional Independence
  • The relationship between
  • conditional independence
  • and BN structure
  • is important for understanding how BNs work

7
Conditional Independence Causal Chains
  • Causal chains give rise to conditional
    independence
  • Example Smoking causes cancer, which causes
    dyspnoea

smoking
cancer
dyspnoea
8
Conditional Independence Common Causes
  • Common Causes (or ancestors) also give rise to
    conditional independenceExample Cancer
    is a common cause of the two symptoms a positive
    Xray and dyspnoea

cancer
B
dyspnoea
Xray
A
C
(
)
? (A indep C) B
I have dyspnoea (C ) because of cancer (B) so I
do not need an Xray test
9
Conditional Dependence Common Effects
  • Common effects (or their descendants) give rise
    to conditional dependence
  • Example Cancer is a common effect of pollution
    and smokingGiven cancer, smoking explains
    away pollution

pollution
A
C
???
smoking
B
cancer
pollution
cancer
(
)
?
We know that you smoke and have cancer, we do not
need to assume that your cancer was caused by
pollution
10
Joint Distributions for describing uncertain
worlds
  • Researchers found already numerous and dramatic
    benefits of Joint Distributions for describing
    uncertain worlds
  • Students in robotics and Artificial
    Intelligence have to understand problems with
    using Joint Distributions
  • You should discover how Bayes Net methodology
    allows us to build Joint Distributions in
    manageable chunks

11
Bayes Net methodology
Why Bayesian methods matter?
  1. Bayesian Methods are one of the most important
    conceptual advances in the Machine Learning / AI
    field to have emerged since 1995.
  2. A clean, clear, manageable language and
    methodology for expressing what the robot
    designer is certain and uncertain about
  3. Already, many practical applications in
    medicine, factories, helpdesks for instance
  4. P(this problem these symptoms) // we will use
    P as probability
  5. anomalousness of this observation
  6. choosing next diagnostic test these
    observations

12
(No Transcript)
13
Problem 1 Creating Joint Distribution Table
  • Joint Distribution Table is an important concept

14
Probabilistic truth table
  • You can guess this table, you can take data from
    some statistics,
  • You can build this table based on some partial
    tables

Truth table of all combinations of Boolean
Variables
15
  • Idea use decision diagrams to represent these
    data.

16
  • Use of independence while creating the tables

17
  • Wet Sprinkler Rain Example

18
W
S
R
Wet-Sprinkler-Rain Example
19
  • Problem 1
  • Creating the Joint Table

20
Our Goal is to derive this table
Let us observe that if I know 7 of these, the
eight is obviously unique , as their sum 1
So I need to guess or calculate or find 2n-1 7
values
But the same data can be stored explicitely or
implicitely, not necessarily in the form of a
table!!
What extra assumptions can help to create this
table?
21
Wet-Sprinkler-Rain Example
22
Sprinkler on under condition that it rained
You need to understand causation when you create
the table
Wet-Sprinkler-Rain Example
Understanding of causation
23
Independence simplifies probabilities
We use independence of variables S and R
P(SR) Sprinkler on under condition that it
rained
We can use these probabilities to create the table
S and R are independent
Wet-Sprinkler-Rain Example
24
Wet-Sprinkler-Rain Example
We create the CPT for S and R based on our
knowledge of the problem
Conditional Probability Table (CPT)
It rained
Sprinkler was on
Grass is wet
What about children playing or a dog pissing? It
is still possible by this value 0.1
This first step shows the collected data
25
Full joint for only S and R
Independence of S and R is used
0.95
0.90
0.90
0.01
Wet-Sprinkler-Rain Example
Use chain rule for probabilities
26
Chain Rule for Probabilities
Random variables
0.95
0.90
0.90
0.01
27
Full joint probability
  • You have a table
  • You want to calculate some probability

P(W)
Wet-Sprinkler-Rain Example
28
Independence of S and R implies calculating fewer
numbers to create the complete Joint Table for W,
S and R
Six numbers
We reduced only from seven to six numbers
Wet-Sprinkler-Rain Example
29
  • Explanation of Diagrammatic Notations
  • such as Bayes Networks

You do not need to build the complete table!!
30
You can build a graph of tables or nodes which
correspond to certain types of tables
31
Wet-Sprinkler-Rain Example
32
Wet-Sprinkler-Rain Example
It rained
Sprinkler was on
Grass is wet
This first step shows the collected data
Conditional Probability Table (CPT)
33
Full joint probability
  • You have a table
  • You want to calculate some probability

When you have this table you can modify it, you
can also calculate everything!!
P(W)
34
  • Problem 2
  • Calculating conditional probabilities from the
    Joint Distribution Table

35
Wet-Sprinkler-Rain Example
Probability that ST and WT
Probability that grass is wet under assumption
that sprinkler was on
Probability that ST
36
Wet-Sprinkler-Rain Example
37
We showed examples of both causal inference and
diagnostic inference
We will use this in next slide
Wet-Sprinkler-Rain Example
38
Explaining Away the facts from the table
Calculated earlier from this table
lt
Wet-Sprinkler-Rain Example
39
Conclusions on this problem
  1. Table can be used for Explaining Away
  2. Table can be used to calculate conditional
    independence.
  3. Table can be used to calculate conditional
    probabilities
  4. Table can be used to determine causality

40
  • Problem 3 What if S and R are dependent?
  • Calculating
  • conditional independence

41
Conditional Independence of S and R
Wet-Sprinkler-Rain Example
42
Diagrammatic notation for conditional
Independence of two variables
Wet-Sprinkler-Rain Example extended
43
Conditional Independence formalized for sets of
variables
S3
S1
S2
44
Now we will explain conditional independence
CLOUDY - Wet-Sprinkler-Rain Example
45
Example Lung Cancer Diagnosis
46
Example Lung Cancer Diagnosis
  1. A patient has been suffering from shortness of
    breath (called dyspnoea) and visits the doctor,
    worried that he has lung cancer.
  2. The doctor knows that other diseases, such as
    tuberculosis and bronchitis are possible causes,
    as well as lung cancer.
  3. She also knows that other relevant information
    includes whether or not the patient is a smoker
    (increasing the chances of cancer and bronchitis)
    and what sort of air pollution he has been
    exposed to.
  4. A positive Xray would indicate either TB or lung
    cancer.

47
Nodes and Values in Bayesian Networks
  • Q What are the nodes to represent and what
    values can they take?
  • A Nodes can be discrete or continuous
  • Boolean nodes represent propositions taking
    binary valuesExample Cancer node represents
    proposition the patient has cancer
  • Ordered valuesExample Pollution node with
    values low, medium, high
  • Integral valuesExample Age with possible values
    1-120

Lung Cancer
48
Lung Cancer Example Nodes and Values
Node name Type Values
Pollution Binary low,high
Smoker Boolean T,F
Cancer Boolean T,F
Dyspnoea Boolean T,F
Xray Binary pos,neg
Shortness of breath
Example of variables as nodes in BN
49
Lung Cancer Example Bayesian Network Structure
Pollution
Smoker
Cancer
Xray
Dyspnoea
Lung Cancer
50
Conditional Probability Tables (CPTs) in
Bayesian Networks
51
Conditional Probability Tables (CPTs) in Bayesian
Networks
  • After specifying topology, we must specify
  • the CPT for each discrete node
  • Each row of CPT contains the conditional
    probability of each node value for each possible
    combination of values in its parent nodes
  • Each row of CPT must sum to 1
  • A CPT for a Boolean variable with n Boolean
    parents contains 2n1 probabilities
  • A node with no parents has one row (its prior
    probabilities)

52
Lung Cancer Example example of CPT
Smoking is true
Probability of cancer given state of variables P
and S
Pollution low
C cancer P pollution S smoking X Xray D
Dyspnoea
Bayesian Network for cancer
Lung Cancer
53
  • Several small CPTs are used to create larger JDTs.

54
The Markov Property for Bayesian Networks
  • Modelling with BNs requires assuming the Markov
    Property
  • There are no direct dependencies in the system
    being modelled which are not already explicitly
    shown via arcs
  • Example smoking can influence dyspnoea only
    through causing cancer

55
Software NETICA for Bayesian Networks and
joint probabilities
56
Reasoning with Numbers Using Netica software
Here are the collected data
Lung Cancer
57
Representing the Joint Probability Distribution
Example
We want to calculate this
P pollution S smoking X Xray D Dyspnoea
This graph shows how we can calculate the joint
probability from other probabilities in the
network
Lung Cancer
58
Problem 4Determining Causality and Bayes Nets
Advertisement example
59
Causality and Bayes Nets Advertisement example
  • Bayes nets allow one to learn about causal
    relationships
  • One more Example
  • Marketing analysts want to know whether to
    increase, decrease or leave unchanged the
    exposure of some advertisement in order to
    maximize profit from the sale of some product
  • Advertised (A) and Buy (B) will be variables for
    someone having seen the advertisement or
    purchased the product

Advertised-Buy Example
60
Causality Example
  1. So we want to know the probability that Btrue
    given that we force Atrue, or Afalse
  2. We could do this by finding two similar
    populations and observing B based on Atrue for
    one and Afalse for the other
  3. But this may be difficult or expensive to find
    such populations
  • Advertised (A) seen the advertisement
  • Buy (B) will be variables for someone purchased
    the product

Advertised-Buy Example
61
How causality can be represented in a graph?
62
Markov condition and Causal Markov Condition
  • But how do we learn whether or not A causes B at
    all?
  • The Markov Condition states
  • Any node in a Bayes net is conditionally
    independent of its non-descendants given its
    parents
  • The CAUSAL Markov Condition (CMC) states
  • Any phenomenon in a causal net is independent of
    its non-effects given its direct causes

Advertised (A) and Buy (B)
Advertised-Buy Example
63
Acyclic Causal Graph versus Bayes Net
  • Thus, if we have a directed acyclic causal graph
    C for variables in X, then, by the Causal Markov
    Condition, C is also a Bayes net for the joint
    probability distribution of X
  • The reverse is not necessarily truea network may
    satisfy the Markov condition without depicting
    causality

Advertised-Buy Example
64
Causality Example when we learn that p(ba) and
p(ba) are not equal
  • Given the Causal Markov Condition CMC, we can
    infer causal relationships from conditional
    (in)dependence relationships learned from the
    data
  • Suppose we learn with high Bayesian probability
    that p(ba) and p(ba) are not equal
  • Given the CMC, there are four simple causal
    explanations for this (more complex ones too)

65
Causality Example four causal explanations
  • A causes B

If they advertise more you buy more
B causes A If you buy more, they have more money
to advertise
66
Causality Example four causal explanations
selection bias
  1. Hidden common cause of A and B (e.g. income)
  • A and B are causes for data selection
  • (a.k.a. selection bias,
  • perhaps if database didnt record false instances
    of A and B)

In rich country they advertise more and they buy
more
If you increase information about Ad in database,
then you increase also information about Buy in
the database
67
Causality Example continued
  • But we still dont know if A causes B
  • Suppose
  • We learn about the Income (I) and geographic
    Location (L) of the purchaser
  • And we learn with high Bayesian probability the
    network on the right

Advertised (AAd) and Buy (B)
Advertised-Buy Example
68
Causality Example - using CMC
  • Given the Causal Markov Condition CMC, the ONLY
    causal explanation for the conditional
    (in)dependence relationships encoded in the Bayes
    net is that Ad is a cause for Buy
  • That is, none of the other relationships or
    combinations thereof produce the probabilistic
    relationships encoded here

Advertised (Ad) and Buy (B)
Advertised-Buy Example
69
Causality in Bayes Networks
  • Thus, Bayes Nets allow inference of causal
    relationships by the Causal Markov Condition (CMC)

70
Problem 5Determine D-separation in Bayesian
Networks
71
D-separation in Bayesian Networks
  • We will formulate a Graphical Criterion of
    conditional independence
  • We can determine whether a set of nodes X is
    independent of another set Y, given a set of
    evidence nodes E, via the Markov property
  • If every undirected path from a node in X to a
    node in Y is d-separated by E, then X and Y are
    conditionally independent given E

72
Determining D-separation (cont)
  • A set of nodes E d-separates two sets of nodes X
    and Y, if every undirected path from a node in X
    to a node in Y is blocked given E
  • A path is blocked given a set of nodes E, if
    there is a node Z on the path for which one of
    three conditions holds
  • Z is in E and Z has one arrow on the path leading
    in and one arrow out (chain)
  • Z is in E and Z has both path arrows leading out
    (common cause)
  • Neither Z nor any descendant of Z is in E, and
    both path arrows lead into Z (common effect)

Chain Common cause Common effect
73
Another Example of Bayesian Networks Alarm
Alarm Example
  • Let us draw BN from these data

74
Bayes Net Corresponding to Alarm-Burglar problem
Alarm Example
75
Compactness, Global Semantics, Local Semantics
and Markov Blanket
  • Compactness of Bayes Net

Earthquake
Burglar
John calls
Mary calls
Alarm Example
76
Global Semantics, Local Semantics and Markov
Blanket for BNs
  • Useful concepts

77
Alarm Example
78
(No Transcript)
79
  • Markovs blanket are
  • Parents
  • Children
  • Childrens parent

80
Problem 6How to systematically Build a Bayes
Network -- Example
81
(No Transcript)
82
Alarm Example
83
Alarm Example
84
Alarm Example
85
So we add arrow
Alarm Example
86
Alarm Example
87
Alarm Example
88
Bayes Net for the car that does not want to start
Such networks can be used for robot diagnostics
or diagnostic of a human done by robot
89
Inference in Bayes Nets and how to simplify it
Alarm Example
90
First method of simplification Enumeration
Alarm Example
91
Alarm Example
92
Second method Variable Elimination
Alarm Example
Variable A was eliminated
Variable E was eliminated
93
Polytrees are better
3SAT Example
94
IDEA Convert DAG to polytrees
95
Clustering is used to convert non-polytree BNs
96
EXAMPLE Clustering is used to convert
non-polytree BNs
Not a polytree
Is a polytree
Alarm Example
97
  • Approximate Inference
  • Direct sampling methods
  • Rejection sampling
  • Likelihood weighting
  • Markov chain Monte Carlo

98
  • 1. Direct Sampling Methods

99
Direct Sampling
Direct Sampling generates minterms with their
probabilities
100
We start from top
Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
101
Wwet C cloudy R rain S sprinkler
Cloudy yes
Wet Sprinkler Rain Example
102
Wwet C cloudy R rain S sprinkler
Cloudy yes
Wet Sprinkler Rain Example
103
Wwet C cloudy R rain S sprinkler
sprinkler no
Wet Sprinkler Rain Example
104
Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
105
Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
106
Wwet C cloudy R rain S sprinkler
We generated a sample minterm C S R W
Wet Sprinkler Rain Example
107
  • 2. Rejection Sampling Methods

108
Rejection Sampling
  • Reject inconsistent samples

Wet Sprinkler Rain Example
109
  • 3. Likelihood weighting methods

110
(No Transcript)
111
Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
112
Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
113
Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
114
Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
115
Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
116
Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
117
Wwet C cloudy R rain S sprinkler
Wet Sprinkler Rain Example
118
Likelihood weighting vs. rejection sampling
  1. Both generate consistent estimates of the joint
    distribution conditioned on the values of the
    evidence variables
  2. Likelihood weighting converges faster to the
    correct probabilities
  3. But even likelihood weighting degrades with many
    evidence variables because a few samples will
    have nearly all the total weight

119
(No Transcript)
120
Sources
  • Prof. David Page
  • Matthew G. Lee
  • Nuria Oliver,
  • Barbara Rosario
  • Alex Pentland
  • Ehrlich Av
  • Ronald J. Williams
  • Andrew Moore tutorial with the same title
  • Russell Norvigs AIMA site
  • Alpaydins Introduction to Machine Learning
    site.
Write a Comment
User Comments (0)
About PowerShow.com