Title: Probabilistic Reasoning and Bayesian Belief Networks
1- Chapter 12
- Probabilistic Reasoning and Bayesian Belief
Networks
2Why probabilistic reasoning?
- Because the world is an uncertain place
3Uncertainty
- Problem with the standard logical approach
- we do not always know complete truth about the
environment - Example
- Leave(t) leave for airport t minutes before
flight - Query ?
4Problems
- Why cant we determine t exactly?
- Partial observability
- road state, other drivers plans
- Uncertainty in action outcomes
- flat tire
- Immense complexity of modeling and predicting
traffic
5Problems
- Three specific issues
- Laziness
- Too much work to list all antecedents or
consequents - Theoretical ignorance
- Not enough information on how the world works
- Practical ignorance
- If if we know all the physics, may not have all
the facts
6What happens with a purely logical approach?
- We either write something which risks falsehood
- Leave(45) will get me there on time
- Or something which leads to conclusions too weak
to do anything with - Leave(45) will get me there on time if theres
no snow and theres no train crossing Airport
Road and my tires remain intact and there isnt a
student riot blocking Hudson road and ... - Leave(1440) might work fine, but then Id have to
spend the night in the airport
7Types of Uncertainty
- Uncertainty in prior knowledgeE.g., some causes
of a disease are unknown and are not represented
in the background knowledge of a
medical-assistant agent
8Types of Uncertainty
- Uncertainty in prior knowledgeE.g., some causes
of a disease are unknown and are not represented
in the background knowledge of a
medical-assistant agent - Uncertainty in actions E.g., actions are
represented with relatively short lists of
preconditions, while these lists are in fact
arbitrary long
9Types of Uncertainty
- For example, to drive my car in the morning
- It must not have been stolen during the night
- It must not have flat tires
- There must be gas in the tank
- The battery must not be dead
- The ignition must work
- I must not have lost the car keys
- No truck should obstruct the driveway
- I must not have suddenly become blind or
paralytic - Etc
- Not only would it not be possible to list all of
them, trying would also be very inefficient!
10Types of Uncertainty
- Uncertainty in prior knowledgeE.g., some causes
of a disease are unknown and are not represented
in the background knowledge of a
medical-assistant agent - Uncertainty in actions E.g., actions are
represented with relatively short lists of
preconditions, while these lists are in fact
arbitrary long - Uncertainty in perceptionE.g., sensors do not
return exact or complete information (locality of
sensor) about the world a robot never knows
exactly its position
11Types of Uncertainty
- Uncertainty in prior knowledgeE.g., some causes
of a disease are unknown and are not represented
in the background knowledge of a
medical-assistant agent - Uncertainty in actions E.g., actions are
represented with relatively short lists of
preconditions, while these lists are in fact
arbitrary long - Uncertainty in perceptionE.g., sensors do not
return exact or complete information (locality of
sensor) about the world a robot never knows
exactly its position
12Methods for handling uncertainty
- Rules with fudge factors
- Leave(25) -gt (0.03) get there on time
- Study for exam -gt(0.8) pass the exam
- Sprinkler -gt(0.99) WetGrass
- WetGrass -gt(0.7) Rain
- Problems with combinations (Sprinkler causes
Rain?)
13Solution Probability
- Given the available evidence, Leave(25) will get
me there on time with probability 0.03 - Probability addresses degree of belief, not
degree of truth - Degree of belief changes as evidence about the
world changes this is different from the WORLD
changing - Degree of truth handled by fuzzy logic
- IsSnowing is true to degree 0.2
14Solution Probability
- Probabilities summarize effects of laziness and
ignorance - We will use combinations of probabilities and
utilities to make decisions
15Notion of Probability
- The probability of a proposition A is a real
number P(A) between 0 and 1 - P(True) 1 and P(False) 0
16Objective Interpretation
- Draw a ball from a bag containing n balls of the
same size, r red and s yellow. - The probability that the proposition A the
ball is red is true corresponds to the relative
frequency with which we expect to draw a red
ball ? P(A) r/n
17Subjective Interpretation
- There are many situations in which there is no
objective frequency interpretation - On a windy day, just before paragliding from the
top of El Capitan, you say there is probability
0.05 that I am going to die - You have worked hard on your AI class and you
believe that the probability that you will get an
A is 0.9
18Subjective or Bayesian probability
- We will make probability estimates based on
knowledge about the world - P(Leave(45) No Snow) 0.75
- Not assertions about the world
- Probability assessment if the world were a
certain way - Probabilities change with new information
- P(Leave(45) No Snow, 5 AM) 0.80
- Analogous to entailment, not truth
19Making decision under uncertainty
- Suppose I believe the following
- P(Leave(35) gets me there on time ...) 0.04
- P(Leave(45) gets me there on time ...) 0.75
- P(Leave(60) gets me there on time ...) 0.95
- P(Leave(1440) gets me there on time ...)
0.9999 - Which action do I choose?
- Depends on my preferences for missing flight vs.
eating in airport, etc. - Utility theory used to represent preferences
- Decision theory takes into account utility and
probabilities
20Axioms of Probability
- For any propositions A and B
- Example
- A computer science major
- B born in Iowa
21Notation and Concepts
- Unconditional probability or prior probability
- P(Cavity) 0.1
- P(Weather Sunny) 0.55
- corresponds to belief prior to arrival of any
(new) evidence - Weather is a multivalued random variable
- Could be one of ltSunny, Rain, Cloudy, Snowgt
- P(Cavity) shorthand for P(Cavitytrue)
22Joint Distribution
- k random variables X1, , Xk
- The joint distribution of these variables is a
table in which each entry gives the probability
of one combination of values of X1, , Xk - Example
23Joint Distribution Says It All
- P(Toothache) P((Toothache ?Cavity) v
(Toothache??Cavity)) - P(Toothache ?Cavity)
P(Toothache??Cavity) - 0.04 0.01 0.05
24Joint Distribution Says It All
- P(Toothache v Cavity) P((Toothache ?Cavity) v
(Toothache??Cavity)
v (?Toothache ?Cavity)) 0.04 0.01
0.06 0.11
25Conditional Probability
- DefinitionP(A?B) P(AB) P(B)
- A?B A?B/B X B/1
- Read P(AB) Probability of A given that we know
B - P(A) is called the prior probability of A
- P(AB) is called the posterior or conditional
probability of A given B
26Example
- P(Cavity?Toothache) P(CavityToothache)
P(Toothache) - P(Cavity) 0.1
- P(CavityToothache) P(Cavity?Toothache) /
P(Toothache)
0.04/0.05 0.8
27Posterior Probabilities
- More knowledge does not change previous
knowledge, but may render old knowledge
unnecessary - P(Cavity Toothache, Cavity) 1
- New evidence may be irrelevant
- P(Cavity Toothache, Paper due Wed.) 0.8
28Car Example
- Three propositions
- Gas
- Battery
- Starts
- P(BatteryGas) P(Battery)Gas and Battery are
independent - P(BatteryGas,?Starts) ? P(Battery?Starts)Gas
and Battery are not independent given ?Starts
29Definition of Conditional Probability
- Two ways to think about it
30Definition of Conditional Probability
- Two ways to think about it
31Bayes Rule
- P(A ? B) P(AB) P(B) P(BA) P(A)
- Bayes rule is extremely useful in trying to
infer probability of a diagnosis, when the
probability of cause is known.
32Bayes Rule
- P(A ? B) P(AB) P(B) P(BA) P(A)
- Bayes rule is extremely useful in trying to
infer probability of a diagnosis, when the
probability of cause is known.
33Example
- Given
- P(Cavity) 0.1
- P(Toothache) 0.05
- P(CavityToothache) 0.8
- Bayes rule tells
- P(ToothacheCavity) (0.8 x 0.05)/0.1
0.4
cause
34Bayes Rule example
- Does my car need a new drive axle?
- If a car needs a new drive axle, with 30
probability this car jerks around - P(jerks needs axle) 0.3
- Unconditional probabilites
- P(car jerks) 1/1000
- P(needs axle) 1/10,000
- Then
- P(needs axle jerks) P(jerks needs axle)
P(needs axle)
------------------------------------------
P(jerks) - (0.3 x 1/10,000) / (1/1000) .03
- Conclusion 3 of every 100 cars that jerk need an
axle
35Not dumb question
- Question
- Why should I be able to provide an estimate of
P(BA) to get P(AB)? - Why not just estimate P(AB) and be done with the
whole thing?
36Not dumb question
- Answer
- Diagnostic knowledge is often more tenuous than
causal knowledge - Suppose drive axles start to go bad in an
epidemic - e.g. poor construction in a major drive axle
brand two years ago is now haunting us - P(needs axle) goes way up, easy to measure
- P(needs axle jerks) should (and does) go up
accordingly but how to estimate? - P(jerks needs axle) is based on causal
information, doesnt change
37Simple Bayesian Concept Learning (1)
- P (HE) is used to represent the probability that
some hypothesis, H, is true, given evidence E. - Let us suppose we have a set of hypotheses H1Hn.
- For each Hi
- Hence, given a piece of evidence, a learner can
determine which is the most likely explanation by
finding the hypothesis that has the highest
posterior probability. -
38Simple Bayesian Concept Learning (2)
- In fact, this can be simplified.
- Since P(E) is independent of Hi it will have the
same value for each hypothesis. - Hence, it can be ignored, and we can find the
hypothesis with the highest value of - We can simplify this further if all the
hypotheses are equally likely, in which case we
simply seek the hypothesis with the highest value
of P(EHi). - This is the likelihood of E given Hi.
-
39Bayesian Belief Networks (1)
- A belief network shows the dependencies between a
group of variables. - If two variables A and B are independent if the
likelihood that A will occur has nothing to do
with whether B occurs.
- C and D are dependent on A D and E are dependent
on B. - The Bayesian belief network has probabilities
associated with each link. E.g., P(CA) 0.2,
P(CA) 0.4