Title: Uncertainty
1Uncertainty
- Russell and Norvig Chapter 14, 15
- Koller article on BNs
- CMCS424 Spring 2002 April 23
2Uncertain Agent
?
environment
?
3An Old Problem
4Types of Uncertainty
- Uncertainty in prior knowledgeE.g., some causes
of a disease are unknown and are not represented
in the background knowledge of a
medical-assistant agent
5Types of Uncertainty
- For example, to drive my car in the morning
- It must not have been stolen during the night
- It must not have flat tires
- There must be gas in the tank
- The battery must not be dead
- The ignition must work
- I must not have lost the car keys
- No truck should obstruct the driveway
- I must not have suddenly become blind or
paralytic - Etc
- Not only would it not be possible to list all of
them, but would trying to do so be efficient?
- Uncertainty in prior knowledgeE.g., some causes
of a disease are unknown and are not represented
in the background knowledge of a
medical-assistant agent - Uncertainty in actions E.g., actions are
represented with relatively short lists of
preconditions, while these lists are in fact
arbitrary long
6Types of Uncertainty
- Uncertainty in prior knowledgeE.g., some causes
of a disease are unknown and are not represented
in the background knowledge of a
medical-assistant agent - Uncertainty in actions E.g., actions are
represented with relatively short lists of
preconditions, while these lists are in fact
arbitrary long - Uncertainty in perceptionE.g., sensors do not
return exact or complete information about the
world a robot never knows exactly its position
7Types of Uncertainty
- Uncertainty in prior knowledgeE.g., some causes
of a disease are unknown and are not represented
in the background knowledge of a
medical-assistant agent - Uncertainty in actions E.g., actions are
represented with relatively short lists of
preconditions, while these lists are in fact
arbitrary long - Uncertainty in perceptionE.g., sensors do not
return exact or complete information about the
world a robot never knows exactly its position
- Sources of uncertainty
- Ignorance
- Laziness (efficiency?)
What we call uncertainty is a summary of all
that is not explicitly taken into account in the
agents KB
8Questions
- How to represent uncertainty in knowledge?
- How to perform inferences with uncertain
knowledge? - Which action to choose under uncertainty?
9How do we deal with uncertainty?
- Implicit
- Ignore what you are uncertain of when you can
- Build procedures that are robust to uncertainty
- Explicit
- Build a model of the world that describe
uncertainty about its state, dynamics, and
observations - Reason about the effect of actions given the model
10Handling Uncertainty
- Approaches
- Default reasoning
- Worst-case reasoning
- Probabilistic reasoning
11Default Reasoning
- Creed The world is fairly normal. Abnormalities
are rare - So, an agent assumes normality, until there is
evidence of the contrary - E.g., if an agent sees a bird x, it assumes that
x can fly, unless it has evidence that x is a
penguin, an ostrich, a dead bird, a bird with
broken wings,
12Representation in Logic
- BIRD(x) ? ?ABF(x) ? FLIES(x)
- PENGUINS(x) ? ABF(x)
- BROKEN-WINGS(x) ? ABF(x)
- BIRD(Tweety)
Very active research field in the 80s ?
Non-monotonic logics defaults, circumscription,
closed-world assumptions Applications to
databases
Default rule Unless ABF(Tweety) can be proven
True, assume it is False
But what to do if several defaults are
contradictory? Which ones to keep? Which one to
reject?
13Worst-Case Reasoning
- Creed Just the opposite! The world is ruled by
Murphys Law - Uncertainty is defined by sets, e.g., the set
possible outcomes of an action, the set of
possible positions of a robot - The agent assumes the worst case, and chooses the
actions that maximizes a utility function in this
case - Example Adversarial search
14Probabilistic Reasoning
- Creed The world is not divided between normal
and abnormal, nor is it adversarial. Possible
situations have various likelihoods
(probabilities) - The agent has probabilistic beliefs pieces of
knowledge with associated probabilities
(strengths) and chooses its actions to maximize
the expected value of some utility function
15How do we represent Uncertainty?
- We need to answer several questions
- What do we represent how we represent it?
- What language do we use to represent our
uncertainty? What are the semantics of our
representation? - What can we do with the representations?
- What queries can be answered? How do we answer
them? - How do we construct a representation?
- Can we ask an expert? Can we learn from data?
16Target Tracking Example
Maximization of worst-case value of utility
vs. of expected value of utility
17Probability
- A well-known and well-understood framework for
uncertainty - Clear semantics
- Provides principled answers for
- Combining evidence
- Predictive Diagnostic reasoning
- Incorporation of new evidence
- Intuitive (at some level) to human experts
- Can be learned
18Notion of Probability
P(Av?A) P(A)P(?A)-P(A ??A) P(True)
P(A)P(?A)-P(False) 1 P(A)
P(?A)So P(A) 1 - P(?A)
You drive on Rt 1 to UMD often, and you notice
that 70of the times there is a traffic slowdown
at the intersection of PaintBranch Rt 1. The
next time you plan to drive on Rt 1, you will
believe that the proposition there is a slowdown
at the intersection of PB Rt 1 is True with
probability 0.7
- The probability of a proposition A is a real
number P(A) between 0 and 1 - P(True) 1 and P(False) 0
- P(AvB) P(A) P(B) - P(A?B)
19Frequency Interpretation
- Draw a ball from a bag containing n balls of the
same size, r red and s yellow. - The probability that the proposition A the
ball is red is true corresponds to the relative
frequency with which we expect to draw a red
ball ? P(A) r/n
20Subjective Interpretation
- There are many situations in which there is no
objective frequency interpretation - On a windy day, just before paragliding from the
top of El Capitan, you say there is probability
0.05 that I am going to die - You have worked hard on your AI class and you
believe that the probability that you will get an
A is 0.9
21Random Variables
- A proposition that takes the value True with
probability p and False with probability 1-p is a
random variable with distribution (p,1-p) - If a bag contains balls having 3 possible colors
red, yellow, and blue the color of a ball
picked at random from the bag is a random
variable with 3 possible values - The (probability) distribution of a random
variable X with n values x1, x2, , xn is
(p1, p2, , pn) with P(Xxi) pi and
Si1,,n pi 1
22Expected Value
- Random variable X with n values x1,,xn and
distribution (p1,,pn)E.g. X is the state
reached after doing an action A under uncertainty - Function U of XE.g., U is the utility of a state
- The expected value of U after doing A is
EU Si1,,n pi U(xi)
23Joint Distribution
- k random variables X1, , Xk
- The joint distribution of these variables is a
table in which each entry gives the probability
of one combination of values of X1, , Xk - Example
Toothache ?Toothache
Cavity 0.04 0.06
?Cavity 0.01 0.89
24Joint Distribution Says It All
Toothache ?Toothache
Cavity 0.04 0.06
?Cavity 0.01 0.89
- P(Toothache) P((Toothache ?Cavity) v
(Toothache??Cavity)) - P(Toothache ?Cavity)
P(Toothache??Cavity) - 0.04 0.01 0.05
- P(Toothache v Cavity) P((Toothache ?Cavity) v
(Toothache??Cavity)
v (?Toothache ?Cavity)) 0.04 0.01
0.06 0.11
25Conditional Probability
- DefinitionP(AB) P(A?B) / P(B)
- Read P(AB) probability of A given B
- can also write this asP(A?B) P(AB) P(B)
- called the product rule
26Example
Toothache ?Toothache
Cavity 0.04 0.06
?Cavity 0.01 0.89
- P(CavityToothache) P(Cavity?Toothache) /
P(Toothache) - P(Cavity?Toothache) ?
- P(Toothache) ?
- P(CavityToothache) 0.04/0.05 0.8
27Generalization
- P(A ? B ? C) P(AB,C) P(BC) P(C)
28Bayes Rule
- P(A ? B) P(AB) P(B) P(BA) P(A)
29Example
Toothache ?Toothache
Cavity 0.04 0.06
?Cavity 0.01 0.89
30Generalization
- P(A?B?C) P(A?BC) P(C)
P(AB,C) P(BC) P(C) - P(A?B?C) P(A?BC) P(C)
P(BA,C) P(AC) P(C) -
31Representing Probability
- Naïve representations of probability run into
problems. - Example
- Patients in hospital are described by several
attributes - Background age, gender, history of diseases,
- Symptoms fever, blood pressure, headache,
- Diseases pneumonia, heart attack,
- A probability distribution needs to assign a
number to each combination of values of these
attributes - 20 attributes require 106 numbers
- Real examples usually involve hundreds of
attributes
32Practical Representation
- Key idea -- exploit regularities
- Here we focus on exploiting conditional
independence properties
33A Bayesian Network
- The ICU alarm network
- 37 variables, 509 parameters (instead of
237)
34Independent Random Variables
- Two variables X and Y are independent if
- P(X xY y) P(X x) for all values x,y
- That is, learning the values of Y does not change
prediction of X - If X and Y are independent then
- P(X,Y) P(XY)P(Y) P(X)P(Y)
- In general, if X1,,Xn are independent, then
- P(X1,,Xn) P(X1)...P(Xn)
- Requires O(n) parameters
35Conditional Independence
- Propositions A and B are (conditionally)
independent iff P(AB) P(A)?
P(A?B) P(A) P(B) - A and B are independent given C iff
P(AB,C) P(AC)? P(A?BC) P(AC) P(BC)
36Conditional Independence
- Unfortunately, random variables of interest are
not independent of each other - A more suitable notion is that of conditional
independence - Two variables X and Y are conditionally
independent given Z if - P(X xY y,Zz) P(X xZz) for all values
x,y,z - That is, learning the values of Y does not change
prediction of X once we know the value of Z - notation Ind( X Y Z )
37Car Example
- Three propositions
- Gas
- Battery
- Starts
- P(BatteryGas) P(Battery)Gas and Battery are
independent - P(BatteryGas,Starts) ? P(BatteryStarts)Gas and
Battery are not independent given Starts
38Example Naïve Bayes Model
- A common model in early diagnosis
- Symptoms are conditionally independent given the
disease (or fault) - Thus, if
- X1,,Xn denote whether the symptoms exhibited by
the patient (headache, high-fever, etc.) and - H denotes the hypothesis about the patients
health - then, P(X1,,Xn,H) P(H)P(X1H)P(XnH),
- This naïve Bayesian model allows compact
representation - It does embody strong independence assumptions
39Markov Assumption
Ancestor
- We now make this independence assumption more
precise for directed acyclic graphs (DAGs) - Each random variable X, is independent of its
non-descendents, given its parents Pa(X) - Formally,Ind(X NonDesc(X) Pa(X))
Parent
Non-descendent
Descendent
40Markov Assumption Example
- In this example
- Ind( E B )
- Ind( B E, R )
- Ind( R A, B, C E )
- Ind( A R B,E )
- Ind( C B, E, R A)
41I-Maps
- A DAG G is an I-Map of a distribution P if the
all Markov assumptions implied by G are satisfied
by P - Examples
42Factorization
- Given that G is an I-Map of P, can we simplify
the representation of P? - Example
- Since Ind(XY), we have that P(XY) P(X)
- Applying the chain rule P(X,Y) P(XY)
P(Y) P(X) P(Y) - Thus, we have a simpler representation of P(X,Y)
43Factorization Theorem
- Thm if G is an I-Map of P, then
44Factorization Example
- P(C,A,R,E,B) P(B)P(EB)P(RE,B)P(AR,B,E)P(CA,R
,B,E) - versus
- P(C,A,R,E,B) P(B) P(E) P(RE) P(AB,E) P(CA)
45Consequences
- We can write P in terms of local conditional
probabilities - If G is sparse,
- that is, Pa(Xi) lt k ,
- ? each conditional probability can be specified
compactly - e.g. for binary variables, these require O(2k)
params. - ? representation of P is compact
- linear in number of variables
46Bayesian Networks
- A Bayesian network specifies a probability
distribution via two components - A DAG G
- A collection of conditional probability
distributions P(XiPai) - The joint distribution P is defined by the
factorization - Additional requirement G is a minimal I-Map of P
47Bayesian Networks
nodes random variables edges direct
probabilistic influence
Network structure encodes independence
assumptions XRay conditionally independent of
Pneumonia given Infiltrates
48Bayesian Networks
T
P
P(I P, T )
0.8
0.2
t
p
p
0.6
0.4
t
p
0.2
0.8
t
t
0.01
0.99
p
- Each node Xi has a conditional probability
distribution P(XiPai) - If variables are discrete, P is usually
multinomial - P can be linear Gaussian, mixture of Gaussians,
49BN Semantics
conditional independencies in BN structure
local probability models
full joint distribution over domain
- Compact natural representation
- nodes have ? k parents ?? 2k n vs. 2n params
50Queries
Full joint distribution specifies answer to any
query P(variable evidence about others)
Tuberculosis
Pneumonia
Lung Infiltrates
Sputum Smear
XRay
Sputum Smear
XRay
51BN Learning
Inducer
Data
- BN models can be learned from empirical data
- parameter estimation via numerical optimization
- structure learning via combinatorial search.
- BN hypothesis space biased towards distributions
with independence structure.
52Questions
- How to represent uncertainty in knowledge?
- How to perform inferences with uncertain
knowledge? - Which action to choose under uncertainty?
If a goal is terribly important, an agent may be
better off choosing a less efficient, but less
uncertain action than a more efficient one
But if the goal is also extremely urgent, and the
less uncertain action is deemed too slow, then
the agent may take its chance with the faster,
but more uncertain action
53Summary
- Types of uncertainty
- Default/worst-case/probabilistic reasoning
- Probability Theory
- Bayesian Networks
- Making decisions under uncertainty
- Exciting Research Area!
54References
- Russell Norvig, chapters 14, 15
- Daphne Kollers BN notes, available from the
class web page - Jean-Claude Latombes excellent lecture
notes,http//robotics.stanford.edu/latombe/cs121
/winter02/home.htm - Nir Friedmans excellent lecture notes,
http//www.cs.huji.ac.il/pmai/
55Questions
- How to represent uncertainty in knowledge?
- How to perform inferences with uncertain
knowledge?
When a doctor receives lab analysis results for
some patient, how do they change his prior
knowledge about the health condition of this
patient?
56Example Robot Navigation
Courtesy S. Thrun
Uncertainty in control
57Worst-Case Planning
58Target Tracking Example
- Open-loop vs. closed-loop strategy
59Target Tracking Example
- Open-loop vs. closed-loop strategy
- Off-line vs. on-line planning/reasoning
60Target Tracking Example
- Open-loop vs. closed-loop strategy
- Off-line vs. on-line planning/reasoning
- Maximization of worst-case value of utility vs.
of expected value of utility