Cooperating Intelligent Systems - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Cooperating Intelligent Systems

Description:

Inference in the statistical setting means computing probabilities for different ... Probit & Logit. Discrete Boolean. If the input is continuous but output is ... – PowerPoint PPT presentation

Number of Views:51

Avg rating:3.0/5.0

Slides: 39

Provided by: HH3

Category:

more less

Transcript and Presenter's Notes

Title: Cooperating Intelligent Systems

1
Cooperating Intelligent Systems

Bayesian networks
Chapter 14, AIMA

2
Inference

Inference in the statistical setting means
computing probabilities for different outcomes to
be true given the information
We need an efficient method for doing this, which
is more powerful than the naïve Bayes model.

3
Bayesian networks

A Bayesian network is a directed graph in which
each node is annotated with quantitative
probability information
A set of random variables, X1,X2,X3,..., makes
up the nodes of the network.
A set of directed links connect pairs of nodes,
parent ? child
Each node Xi has a conditional probability
distribution P(Xi Parents(Xi)) .
The graph is a directed acyclic graph (DAG).

4
The dentist network
Cavity
Weather
Catch
Toothache
5
The alarm network
Burglar alarm responds to both earthquakes and
burglars. Two neighbors John and Mary,who have
promised to call youwhen the alarm goes
off. John always calls when theresan alarm,
and sometimes whentheres not an alarm. Mary
sometimes misses the alarms (she likes loud
music).
Burglary
Earthquake
Alarm
JohnCalls
MaryCalls
6
The cancer network
From Breese and Coller 1997
7
The cancer network
P(A,G) P(A)P(G)
Age
Gender
Smoking
Toxics
P(CS,T,A,G) P(CS,T)
Cancer
GeneticDamage
SerumCalcium
LungTumour
P(A,G,T,S,C,SC,LT,GD) P(A)P(G)P(TA)P(SA,G)?P
(CT,S)P(GD)P(SCC)?P(LTC,GD)
P(SC,C,LT,GD) P(SCC)P(LTC,GD)P(C) P(GD)
From Breese and Coller 1997
8
The product (chain) rule
(This is for Bayesian networks, the general case
comeslater in this lecture)
9
Bayes network node is a function
A
B
a
b
C
P(Ca,b) U0.7,1.9
10
Bayes network node is a function
A
B
C

A BN node is a conditionaldistribution function
Inputs Parent values
Output distribution over values
Any type of function from valuesto distributions.

11
Example The alarm network
Note Each number in the tables represents
aboolean distribution. Hence there is
adistribution output forevery input.
12
Example The alarm network
Probability distribution forno earthquake, no
burglary,but alarm, and both Mary andJohn make
the call
13
Meaning of Bayesian network
The BN is a correct representation of the domain
iff each node is conditionally independent of
its predecessors, given its parents.
14
The alarm network
The fully correct alarm network might look
something like the figure. The Bayesian network
(red) assumes that some of the variables are
independent (or that the dependecies can be
neglected since they are very weak). The
correctness of the Bayesian network of course
depends on the validity of these assumptions.
It is this sparse connection structure that makes
the BN approachfeasible (linear growth in
complexity rather than exponential)
15
How construct a BN?

Add nodes in causal order (causal determined
from expertize).
Determine conditional independence using either
(or all) of the following semantics
Blocking/d-separation rule
Non-descendant rule
Markov blanket rule
Experience/your beliefs

16
Path blocking d-separation

Intuitively, knowledge about Serum Calcium
influences our belief about Cancer, if we dont
know the value of Cancer, which in turn
influences our belief about Lung Tumour, etc.
However, if we are given the value of Cancer
(i.e. C true or false), then knowledge of Serum
Calcium will not tell us anything about Lung
Tumour that we dont already know.
We say that Cancer d-separates (direction-dependen
t separation) Serum Calcium and Lung Tumour.

17
Path blocking d-separation
Xi and Xj are d-separated if all paths betweeen
them are blocked

Two nodes Xi and Xj are conditionally independent
given a set W X1,X2,X3,... of nodes if for
every undirected path in the BN between Xi and Xj
there is some node Xk on the path having one of
the following three properties

Xk ? W, and both arcs on the path lead out of
Xk.
Xk ? W, and one arc on the path leads into Xk
and one arc leads out.
Neither Xk nor any descendant of Xk is in W, and
both arcs on the path lead into Xk.
Xk blocks the path between Xi and Xj

18
Non-descendants

A node is conditionally independent of its
non-descendants (Zij), given its parents.

19
Markov blanket
X2
X3
X1
X4

A node is conditionally independent of all other
nodes in the network, given its parents,
children, and childrens parents
These constitute the nodes Markov blanket.

X5
X6
Xk
20
Efficient representation of PDs
A
C
P(Ca,b) ?

Boolean ? Boolean
Boolean ? Discrete
Boolean ? Continuous
Discrete ? Boolean
Discrete ? Discrete
Discrete ? Continuous
Continuous ? Boolean
Continuous ? Discrete
Continuous ? Continuous

B
21
Efficient representation of PDs

Boolean ? Boolean Noisy-OR, Noisy-AND
Boolean/Discrete ? Discrete Noisy-MAX
Bool./Discr./Cont. ? Continuous Parametric
distribution (e.g. Gaussian)
Continuous ? Boolean Logit/Probit

22
Noisy-OR exampleBoolean ? Boolean
The effect (E) is off (false) when none of the
causes are true. The probability for the effect
increases with the number of true causes.
(for this example)
Example from L.E. Sucar
23
Noisy-OR general caseBoolean ? Boolean
Example on previous slide usedqi 0.1 for all i.
Image adapted from Laskey Mahoney 1999
24
Noisy-MAXBoolean ? Discrete

Effect takes on the max value from different
causes
Restrictions
Each cause must have an off state, which does not
contribute to effect
Effect is off when all causes are off
Effect must have consecutive escalating values
e.g., absent, mild, moderate, severe.

Image adapted from Laskey Mahoney 1999
25
Parametric probability densitiesBoolean/Discr./Co
ntinuous ? Continuous

Use parametric probability densities, e.g., the
normal distribution

Gaussian networks (a input to the node)
26
Probit LogitDiscrete ? Boolean

If the input is continuous but output is boolean,
use probit or logit

P(Ax)
x
27
The cancer network
Discrete
Discrete/boolean

Age 1-10, 11-20,...
Gender M, F
Toxics Low, Medium, High
Smoking No, Light, Heavy
Cancer No, Benign, Malignant
Serum Calcium Level
Lung Tumour Yes, No

Discrete
Discrete
Discrete
Continuous
Discrete/boolean
28
Inference in BN

Inference means computing P(Xe), where X is a
query (variable) and e is a set of evidence
variables (for which we know the values).
Examples
P(Burglary john_calls, mary_calls)
P(Cancer age, gender, smoking, serum_calcium)
P(Cavity toothache, catch)

29
Exact inference in BN

Doable for boolean variables Look up entries
in conditional probability tables (CPTs).

30
Example The alarm network
What is the probability for a burglary if both
John and Mary call?
Evidence variables J,M Query variable B
31
Example The alarm network
What is the probability for a burglary if both
John and Mary call?
0.001 10-3
32
Example The alarm network
What is the probability for a burglary if both
John and Mary call?
33
Example The alarm network
What is the probability for a burglary if both
John and Mary call?
34
Example The alarm network
What is the probability for a burglary if both
John and Mary call?
Answer 28
35
Use depth-first search
A lot of unneccesary repeated computation...
36
Complexity of exact inference

By eliminating repeated calculation
uninteresting paths we can speed up the inference
a lot.
Linear time complexity for singly connected
networks (polytrees).
Exponential for multiply connected networks.
Clustering can improve this

37
Approximate inference in BN

Exact inference is intractable in large multiply
connected BNs ? use approximate inference
Monte Carlo methods (random sampling).
Direct sampling
Rejection sampling
Likelihood weighting
Markov chain Monte Carlo

38
Markov chain Monte Carlo

Fix the evidence variables (E1, E2, ...) at their
given values.
Initialize the network with values for all other
variables, including the query variable.
Repeat the following many, many, many times
Pick a non-evidence variable at random (query Xi
or hidden Yj)
Select a new value for this variable, conditioned
on the current values in the variables Markov
blanket.
Monitor the values of the query variables.