22c:145 Artificial Intelligence Bayesian Networks - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

22c:145 Artificial Intelligence Bayesian Networks

Description:

He wakes up to find his lawn wet. He wonders if it has rained or if he left his sprinkler on. He looks at his neighbor Watson's lawn and he sees it is wet too. ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 53
Provided by: tomasloz
Category:

less

Transcript and Presenter's Notes

Title: 22c:145 Artificial Intelligence Bayesian Networks


1
22c145 Artificial IntelligenceBayesian Networks
  • Reading Ch 14. Russell Norvig

2
Review of Probability Theory
  • Random Variables
  • The probability that a random variable X has
    value val is written as P(Xval)
  • P domain ! 0, 1
  • Sums to 1 over the domain
  • P(Raining true) P(Raining) 0.2
  • P(Raining false) P( Raining) 0.8
  • Joint distribution
  • P(X1, X2, , Xn)
  • Probability assignment to all combinations of
    values of random variables and provide complete
    information about the probabilities of its random
    variables.
  • A JPD table for n random variables, each ranging
    over k distinct values, has kn entries!

3
Review of Probability Theory
  • Conditioning
  • P(A) P(A B) P(B) P(A B) P(B)
  • P(A Æ B) P(A Æ B)
  • A and B are independent iff
  • P(A Æ B) P(A) P(B)
  • P(A B) P(A)
  • P(B A) P(B)
  • A and B are conditionally independent given C iff
  • P(A B, C) P(A C)
  • P(B A, C) P(B C)
  • P(A Æ B C) P(A C) P(B C)
  • Bayes Rule
  • P(A B) P(B A) P(A) / P(B)
  • P(A B, C) P(B A, C) P(A C) / P(B C)

4
Bayesian Networks
  • To do probabilistic reasoning, you need to know
    the joint probability distribution
  • But, in a domain with N propositional variables,
    one needs 2N numbers to specify the joint
    probability distribution
  • We want to exploit independences in the domain
  • Two components structure and numerical parameters

5
Bayesian networks
  • A simple, graphical notation for conditional
    independence assertions and hence for compact
    specification of full joint distributions
  • Syntax
  • a set of nodes, one per variable
  • a directed, acyclic graph (link "directly
    influences")
  • a conditional distribution for each node given
    its parents
  • P (Xi Parents (Xi))
  • In the simplest case, conditional distribution
    represented as a conditional probability table
    (CPT) giving the distribution over Xi for each
    combination of parent values

6
Bayesian (Belief) Networks
  • Set of random variables, each has a finite set of
    values
  • Set of directed arcs between them forming acyclic
    graph, representing causal relation
  • Every node A, with parents B1, , Bn, has
  • P(A B1,,Bn) specified

7
Key Advantage
  • The conditional independencies (missing arrows)
    mean that we can store and compute the joint
    probability distribution more efficiently
  • How to design a Belief Network?
  • Explore the causal relations

8
Icy Roads
Inspector Smith is waiting for Holmes and Watson,
who are driving (separately) to meet him. It is
winter. His secretary tells him that Watson has
had an accident. He says, It must be that the
roads are icy. I bet that Holmes will have an
accident too. I should go to lunch. But, his
secretary says, No, the roads are not icy, look
at the window. So, he says, I guess I better
wait for Holmes.
Causal Component
9
Icy Roads
Inspector Smith is waiting for Holmes and Watson,
who are driving (separately) to meet him. It is
winter. His secretary tells him that Watson has
had an accident. He says, It must be that the
roads are icy. I bet that Holmes will have an
accident too. I should go to lunch. But, his
secretary says, No, the roads are not icy, look
at the window. So, he says, I guess I better
wait for Holmes.
Causal Component
10
Icy Roads
Inspector Smith is waiting for Holmes and Watson,
who are driving (separately) to meet him. It is
winter. His secretary tells him that Watson has
had an accident. He says, It must be that the
roads are icy. I bet that Holmes will have an
accident too. I should go to lunch. But, his
secretary says, No, the roads are not icy, look
at the window. So, he says, I guess I better
wait for Holmes.
Causal Component
Icy
Holmes Crash
11
Icy Roads
Inspector Smith is waiting for Holmes and Watson,
who are driving (separately) to meet him. It is
winter. His secretary tells him that Watson has
had an accident. He says, It must be that the
roads are icy. I bet that Holmes will have an
accident too. I should go to lunch. But, his
secretary says, No, the roads are not icy, look
at the window. So, he says, I guess I better
wait for Holmes.
Causal Component
Icy
H and W are dependent,
Holmes Crash
12
Icy Roads
Inspector Smith is waiting for Holmes and Watson,
who are driving (separately) to meet him. It is
winter. His secretary tells him that Watson has
had an accident. He says, It must be that the
roads are icy. I bet that Holmes will have an
accident too. I should go to lunch. But, his
secretary says, No, the roads are not icy, look
at the window. So, he says, I guess I better
wait for Holmes.
Causal Component
Icy
H and W are dependent, but conditionally
independent given I
Watson Crash
13
Holmes and Watson in IA
Holmes and Watson have moved to IA. He wakes up
to find his lawn wet. He wonders if it has
rained or if he left his sprinkler on. He looks
at his neighbor Watsons lawn and he sees it is
wet too. So, he concludes it must have rained.
14
Holmes and Watson in IA
Holmes and Watson have moved to IA. He wakes up
to find his lawn wet. He wonders if it has
rained or if he left his sprinkler on. He looks
at his neighbor Watsons lawn and he sees it is
wet too. So, he concludes it must have rained.
15
Holmes and Watson in IA
Holmes and Watson have moved to IA. He wakes up
to find his lawn wet. He wonders if it has
rained or if he left his sprinkler on. He looks
at his neighbor Watsons lawn and he sees it is
wet too. So, he concludes it must have rained.
Sprinkler
Rain
Holmes Lawn Wet
Watson Lawn Wet
16
Holmes and Watson in IA
Holmes and Watson have moved to IA. He wakes up
to find his lawn wet. He wonders if it has
rained or if he left his sprinkler on. He looks
at his neighbor Watsons lawn and he sees it is
wet too. So, he concludes it must have rained.
Sprinkler
Rain
Holmes Lawn Wet
Watson Lawn Wet
17
Holmes and Watson in IA
Holmes and Watson have moved to IA. He wakes up
to find his lawn wet. He wonders if it has
rained or if he left his sprinkler on. He looks
at his neighbor Watsons lawn and he sees it is
wet too. So, he concludes it must have rained.
Rain
Sprinkler
Given W, P(R) goes up
Holmes Lawn Wet
Watson Lawn Wet
18
Holmes and Watson in IA
Holmes and Watson have moved to IA. He wakes up
to find his lawn wet. He wonders if it has
rained or if he left his sprinkler on. He looks
at his neighbor Watsons lawn and he sees it is
wet too. So, he concludes it must have rained.
Rain
Sprinkler
Given W, P(R) goes up and P(S) goes down
explaining away
Holmes Lawn Wet
Watson Lawn Wet
19
Inference in Bayesian Networks Query
Types
  • Given a Bayesian network, what questions might we
    want to ask?
  • Conditional probability query P(x e)
  • Maximum a posteriori probability
  • What value of x maximizes P(xe) ?
  • General question Whats the whole probability
    distribution over variable X given evidence e,
    P(X e)?

20
Using the joint distribution
  • To answer any query involving a conjunction of
    variables, sum over the variables not involved in
    the query.

21
Using the joint distribution
  • To answer any query involving a conjunction of
    variables, sum over the variables not involved in
    the query.

22
Chain Rule
  • Variables V1, , Vn
  • Values v1, , vn
  • P(V1v1, V2v2, , Vnvn) Õi P(Vivi
    parents(Vi))

P(A)
P(B)
P(CA,B)
P(DC)
23
Chain Rule
  • Variables V1, , Vn
  • Values v1, , vn
  • P(V1v1, V2v2, , Vnvn) Õi P(Vivi
    parents(Vi))

P(A)
P(B)
P(ABCD) P(Atrue, Btrue, Ctrue, Dtrue)
P(CA,B)
P(DC)
24
Chain Rule
  • Variables V1, , Vn
  • Values v1, , vn
  • P(V1v1, V2v2, , Vnvn) Õi P(Vivi
    parents(Vi))

P(A)
P(B)
P(ABCD) P(Atrue, Btrue, Ctrue, Dtrue)
P(CA,B)
P(ABCD) P(DABC)P(ABC)
P(DC)
25
Chain Rule
  • Variables V1, , Vn
  • Values v1, , vn
  • P(V1v1, V2v2, , Vnvn) Õi P(Vivi
    parents(Vi))

P(A)
P(B)
P(ABCD) P(Atrue, Btrue, Ctrue, Dtrue)
P(CA,B)
P(ABCD) P(DABC)P(ABC) P(DC) P(ABC)
P(DC)
A independent from D given C B independent from D
given C
26
Chain Rule
  • Variables V1, , Vn
  • Values v1, , vn
  • P(V1v1, V2v2, , Vnvn) Õi P(Vivi
    parents(Vi))

P(A)
P(B)
P(ABCD) P(Atrue, Btrue, Ctrue, Dtrue)
P(CA,B)
P(ABCD) P(DABC)P(ABC) P(DC) P(ABC)
P(DC) P(CAB) P(AB)
P(DC)
A independent from D given C B independent from D
given C
27
Chain Rule
  • Variables V1, , Vn
  • Values v1, , vn
  • P(V1v1, V2v2, , Vnvn) Õi P(Vivi
    parents(Vi))

P(A)
P(B)
P(ABCD) P(Atrue, Btrue, Ctrue, Dtrue)
P(CA,B)
P(ABCD) P(DABC)P(ABC) P(DC) P(ABC)
P(DC) P(CAB) P(AB) P(DC) P(CAB)
P(A)P(B)
P(DC)
A independent from D given C B independent from D
given C A independent from B
28
Chain Rule
  • Variables V1, , Vn
  • Values v1, , vn
  • P(V1v1, V2v2, , Vnvn) Õi P(Vivi
    parents(Vi))

P(A)
P(B)
P(ABCD) P(Atrue, Btrue, Ctrue, Dtrue)
P(CA,B)
P(ABCD) P(DABC)P(ABC) P(DC) P(ABC)
P(DC) P(CAB) P(AB) P(DC) P(CAB)
P(A)P(B)
P(DC)
A independent from D given C B independent from D
given C A independent from B
29
Icy Roads with Numbers
t true f false
The right-hand column in these tables is
redundant, since we know the entries in each row
must add to 1. NB the columns need NOT add to 1.
30
Icy Roads with Numbers
t true f false
The right-hand column in these tables is
redundant, since we know the entries in each row
must add to 1. Note the columns need NOT add to
1.
31
Icy Roads with Numbers
t true f false
The right-hand column in these tables is
redundant, since we know the entries in each row
must add to 1. Note the columns need NOT add to
1.
32
Probability that Watson Crashes
P(I)0.7
P(W)
33
Probability that Watson Crashes
P(I)0.7
P(W) P(W I) P(I) P(W-I) P(-I)
0.80.7 0.10.3 0.56
0.03 0.59
34
Probability of Icy given Watson
P(I)0.7
Icy
Watson Crash
P(I W)
35
Probability of Icy given Watson
P(I)0.7
Icy
Watson Crash
P(I W) P(W I) P(I) / P(W)
0.80.7 / 0.59 0.95
We started with P(I) 0.7 knowing that Watson
crashed raised the probability to 0.95
36
Probability of Holmes given Watson
P(I)0.7
Icy
Holmes Crash
Watson Crash
P(HW)
37
Probability of Holmes given Watson
P(I)0.7
Icy
Holmes Crash
Watson Crash
P(HW) P(H, I W) P(H, -I W)
P(HW,I)P(IW) P(HW,-I) P(-I W)
P(HI)P(IW) P(H-I) P(-I W)
0.80.95 0.10.05 0.765
We started with P(H) 0.59 knowing that Watson
crashed raised the probability to 0.765
38
Prob of Holmes given Icy and Watson
P(I)0.7
Icy
Holmes Crash
Watson Crash
P(HW, I I) P(H I) 0.1
H and W are independent given I, so H and W are
conditionally independent given I
39
Example
  • Topology of network encodes conditional
    independence assertions
  • Weather is independent of the other variables
  • Toothache and Catch are conditionally independent
    given Cavity

40
Example
  • I'm at work, neighbor John calls to say my alarm
    is ringing, but neighbor Mary doesn't call.
    Sometimes it's set off by minor earthquakes. Is
    there a burglar?
  • Variables Burglary, Earthquake, Alarm,
    JohnCalls, MaryCalls
  • Network topology reflects "causal" knowledge
  • A burglar can set the alarm off
  • An earthquake can set the alarm off
  • The alarm can cause Mary to call
  • The alarm can cause John to call

41
Example contd.
42
Compactness
  • A CPT for Boolean Xi with k Boolean parents has
    2k rows for the combinations of parent values
  • Each row requires one number p for Xi true(the
    number for Xi false is just 1-p)
  • If each variable has no more than k parents, the
    complete network requires O(n 2k) numbers
  • I.e., grows linearly with n, vs. O(2n) for the
    full joint distribution
  • For burglary net, 1 1 4 2 2 10 numbers
    (vs. 25-1 31)

43
Semantics
  • The full joint distribution is defined as the
    product of the local conditional distributions
  • P (X1, ,Xn) pi 1 P (Xi Parents(Xi))
  • e.g., P(j ? m ? a ? ?b ? ?e)
  • P (j a) P (m a) P (a ?b, ?e) P (?b) P
    (?e)

n
44
Constructing Bayesian networks
  • 1. Choose an ordering of variables X1, ,Xn
  • 2. For i 1 to n
  • add Xi to the network
  • select parents from X1, ,Xi-1 such that
  • P (Xi Parents(Xi)) P (Xi X1, ... Xi-1)
  • This choice of parents guarantees
  • P (X1, ,Xn) pi 1 P (Xi X1, , Xi-1)
    (chain rule)
  • pi 1P (Xi Parents(Xi))(by construction)

n
n
45
Example
  • Suppose we choose the ordering M, J, A, B, E
  • P(J M) P(J)?

46
Example
  • Suppose we choose the ordering M, J, A, B, E
  • P(J M) P(J)?No
  • P(A J, M) P(A J)? P(A J, M) P(A)?

47
Example
  • Suppose we choose the ordering M, J, A, B, E
  • P(J M) P(J)?No
  • P(A J, M) P(A J)? P(A J, M) P(A)? No
  • P(B A, J, M) P(B A)?
  • P(B A, J, M) P(B)?

48
Example
  • Suppose we choose the ordering M, J, A, B, E
  • P(J M) P(J)?No
  • P(A J, M) P(A J)? P(A J, M) P(A)? No
  • P(B A, J, M) P(B A)? Yes
  • P(B A, J, M) P(B)? No
  • P(E B, A ,J, M) P(E A)?
  • P(E B, A, J, M) P(E A, B)?

49
Example
  • Suppose we choose the ordering M, J, A, B, E
  • P(J M) P(J)?No
  • P(A J, M) P(A J)? P(A J, M) P(A)? No
  • P(B A, J, M) P(B A)? Yes
  • P(B A, J, M) P(B)? No
  • P(E B, A ,J, M) P(E A)? No
  • P(E B, A, J, M) P(E A, B)? Yes

50
Example contd.
  • Deciding conditional independence is hard in
    noncausal directions
  • (Causal models and conditional independence seem
    hardwired for humans!)
  • Network is less compact 1 2 4 2 4 13
    numbers needed

51
Excercises
P(J, M, A, B, E ) ? P(M, A, B) ? P(-M, A, B)
? P(A, B) ? P(M, B) ? P(A J) ?
52
Summary
  • Bayesian networks provide a natural
    representation for (causally induced) conditional
    independence
  • Topology CPTs compact representation of joint
    distribution
  • Generally easy for domain experts to construct
Write a Comment
User Comments (0)
About PowerShow.com