Title: CMSC 723: Introduction to Computational Linguistics
1 Chapter 13 Uncertainty
2Outline
- Acting under Uncertainty
- Basic Probability Notation
- The Axioms of probability
- Inference Using Joint Distributions
- Independence
- Bayess Rule and Its Use
- The Wumpus World Revisited
3Remember the Logics in General Slide?
- Ontological Commitment What exists in the world
TRUTH - Epistemological Commitment What an agent
believes about facts BELIEF
???
???
4Uncertainty
- When an agent knows enough facts about its
environment, the logical approach enables it to
derive plans that are guaranteed to work. - Unfortunately, agents almost never have access to
the whole truth about their environment. - Agents must act under uncertainty.
- For example a wumpus agent often will find
itself unable to discover which of two squares
contains a pit. If those squares are en route to
the gold, then the agent might have to take a
chance and enter one of the two squares.
5Uncertainty
- Let action At leave for airport t minutes
before flight. Will At get me there on time? - Problems
- partial observability (road state, other drivers
plans, etc.) - noisy sensors (traffic reports, etc.)
- uncertainty in outcomes (flat tire, etc.)
- immense complexity modeling and predicting traffic
6Can we take a purely logical approach?
- Risks falsehood A25 will get me there on time
- Leads to conclusions that are too weak for
decision making - A25 will get me there on time if there is no
accident on the bridge and it doesnt rain and my
tires remain intact, etc. - A1440 might reasonably be said to get me there on
time but Id have to stay overnight at the
airport! - A90 may be good
- Logic represents uncertainty by disjunction but
cannot tell us how likely the different
conditions are.
7Methods for Handling Uncertainty
- Default or nonmontonic logic
- Assume my car does not have a flat tire
- Assume A25 works unless contradicted by evidence
- Issues What assumptions are reasonable? How to
handle contradictions? - Logic Rules with fudge factors
- A25 ?0.3 get there on time
- Sprinkler(???)? 0.99 WetGrass
- WetGrass ? 0.7 Rain
- Issues Problems with combination, e.g.,
Sprinkler causes Rain?
8Uncertainty
- The information the agent has cannot guarantee
any of these outcomes for A90, but it can provide
some degree of belief that they will be achieved.
- Other plans, such as Al20, might increase the
agent's belief that it will get to the airport on
time, but also increase the likelihood of a long
wait. - The right thing to dothe rational
decisiontherefore depends on both the relative
importance of various goals and the likelihood
that, and degree to which, they will be achieved.
9Handling Uncertainty knowledge
10An Alternative Use Probability
- Given the available evidence, A25 will get me
there on time with probability 0.04 - Probabilistic assertions summarize the effects of
- Laziness(??) too much work to list the complete
set of antecedents or consequents to ensure no
exceptions - Theoretical ignorance medical science has no
complete theory for the domain - Practical ignorance Even if we know all the
rules, we might be uncertain about a particular
patient
11Degree of Belief
- The connection between toothaches and cavities is
just not a logical consequence in either
direction. - This is typical of the medical domain, as well as
most other judgmental domains law, business,
design, automobile repair, gardening, dating, and
so on. - The agent's knowledge can at best provide only a
degree of belief in the relevant sentences. - Our main tool for dealing with degrees of belief
will be probability theory, which assigns to each
sentence a numerical degree of belief between 0
and 1.
12Probability
- Probability provides a way of summarizing the
uncertainty that comes from our laziness and
ignorance. - We might not know for sure what afflicts a
particular patient, but we believe that there is,
say, an 80 chancethat is, a probability of
0.8that the patient has a cavity if he or she
has a toothache. - That is, we expect that out of all the situations
that are in distinguishable from the current
situation as far as the agent's knowledge goes,
the patient will have a cavity in 80 of them. - This belief could be derived from
- statistical
- general rules, or
- a combination of evidence sources.
- The missing 20 summarizes all the other possible
causes of toothache that we are too lazy or
ignorant to confirm or deny.
13Probability
- Assigning a probability of 0 to a given sentence
corresponds to an unequivocal belief that the
sentence is false, - Assigning a probability of 1 corresponds to an
unequivocal belief that the sentence is true. - Probabilities between 0 and 1 correspond to
intermediate degrees of belief in the truth of
the sentence. - The sentence itself is in fact either true or
false. - It is important to note that a degree of belief
is different from a degree of truth. - A probability of 0.8 does not mean "80 true" but
rather an 80 degree of beliefthat is, a fairly
strong expectation. - Thus, probability theory makes the same
ontological commitment as logicnamely, that
facts either do or do not hold in the world. - Degree of truth, as opposed to degree of belief,
is the subject of fuzzy logic, which is covered
in Section 14.7.
14Probability
- All probability statements must therefore
indicate the evidence with respect to which the
probability is being assessed. - As the agent receives new percepts, its
probability assessments are updated to reflect
the new evidence. - Before the evidence is obtained, we talk about
prior or unconditional probability after the
evidence is obtained, we talk about posterior or
conditional probability. - In most cases, an agent will have some evidence
from its percepts and will be interested in
computing the posterior probabilities of the
outcomes it cares about.
15Uncertainty and rational decisions
- The presence of uncertainty radically changes the
way an agent makes decisions. - A logical agent typically has a goal and executes
any plan that is guaranteed to achieve it. An
action can be selected or rejected on the basis
of whether it achieves the goal, regardless of
what other actions might achieve. - When uncertainty enters the picture, this is no
longer the case. - Consider again the A90 plan for getting to the
airport. Suppose it has a 95 chance of
succeeding. - Does this mean it is a rational choice? Not
necessarily There might be other plans, such as
Al20, with higher probabilities of success. - What about A1440? In most circumstances, this is
not a good choice, because, although it almost
guarantees getting there on time, it involves an
intolerable wait.
16PREFERENCES
- To make such choices, an agent must first have
preferences between the different possible
outcomes of the various plans. - A particular outcome is a completely specified
state, including such factors as whether the
agent arrives on time and the length of the wait
at the airport. - We will be using utility theory to represent and
reason with preferences. - Utility theory says that every state has a degree
of usefulness, or utility, to an agent and that
the agent will prefer states with higher utility.
17Examples
- Suppose I believe the following
- P(A25 gets me there on time) 0.04
- P(A90 gets me there on time) 0.70
- P(A120 gets me there on time) 0.95
- P(A1440 gets me there on time) 0.9999
- Which do I choose? Depends on my preferences
for missing flight vs. airport cuisine, etc.
18Decision theory
- Preferences, as expressed by utilities, are
combined with probabilities in the general theory
of rational decisions called decision theory - Decision theory probability theory utility
theory . - A agent is rational if and only if it chooses the
action that yields the highest expected utility,
averaged over all the possible outcomes of the
action. This is called the principle of Maximum
Expected Utility (MEU).
19Design for a decision-theoretic agent
2013.2 Basic Probability Notation
- We will need a formal language for representing
and reasoning with uncertain knowledge. - Any notation for describing degrees of belief
must be able to deal with two main issues - the nature of the sentences to which degrees of
belief are assigned and - the dependence of the degree of belief on the
agent's experience. - The version of probability theory we present uses
an extension of propositional
21Random Variables
- A random variable is a function that takes
discrete values from a countable domain and maps
them to a number between 0 and 1 - Example Weather is a discrete (propositional)
random variable that has domain ltsunny, rain,
cloudy, snowgt. - sunny is an abbreviation for Weather sunny
- P(Weathersunny)0.72, P(Weatherrain)0.1, etc.
- Can be written P(sunny)0.72, P(rain)0.1, etc.
- Other types of random variables
- Boolean random variable has the domain
lttrue,falsegt, e.g., Cavity (special case of
discrete random variable) - Continuous random variable as the domain of real
numbers, e.g., Temp
22Atomic Events
- An atomic event is a complete specification of
the state of the world about which the agent is
uncertain. It can be thought of as an assignment
of particular values to all the variables of
which the world is composed. - For example, if my world consists of only the
Boolean variables Cavity and Toothache, then
there are just four distinct atomic events - Cavity false ? Toothache true
- Cavity false ? Toothache false
- Cavity true ? Toothache true
- Cavity true ? Toothache false
23Properties of Atomic events
- They are mutually exclusiveat most one can
actually be the case. - For example, cavity ? toothache and cavity ?
?itoothache cannot both be the case. - The set of all possible atomic events is
exhaustiveat least one must be the case. That
is, the disjunction of all atomic events is
logically equivalent to true. - Any particular atomic event entails the truth or
falsehood of every proposition, whether simple or
complex. This can be seen by using the standard
semantics for logical connectives. - For example, the atomic event cavity ? ?
toothache entails the truth of cavity and the
falsehood of cavity ?toothache. - Any proposition is logically equivalent to the
disjunction of all atomic events that entail the
truth of the proposition. - For example, the proposition cavity is equivalent
to disjunction of the atomic events cavity ?
toothache and cavity ? ? toothache.
24Prior Probability
- Prior (unconditional) probability corresponds to
belief prior to arrival of any (new) evidence - P(sunny)0.72, P(rain)0.1, etc.
- Vector notation Weather is one of lt0.7, 0.2,
0.08, 0.02gt - P( Weather sunny) 0.7
- P( Weather rain) 0.2
- P( Weather cloudy) 0.08
- P( Weather snow) 0.02 .
- This statement defines a prior probability
distribution for the random variable Weather
(sums to 1 over the domain)
25Atomic Events and the Universe
- An atomic event is a complete specification of
the state of the world about which the agent is
uncertain (like interpretations in logic). - If the world consists of the Boolean variables
Cavity and Toothache, there are four distinct
atomic events, one of which is Cavityfalse Æ
Toothachetrue - The universe consists of atomic events
- An event is a set of atomic events
- P events ! 0,1
- P(true) 1 P(U)
- P(false) 0 P()
- P(A ? B) P(A) P(B) P(A n B)
26Joint Distribution
- Probability assignment to all combinations of
values of random variables - The sum of the entries in this table has to be 1
- Given this table, we can answer all probability
questions about this domain - Probability of a proposition is the sum of the
probabilities of atomic events in which it holds - P(cavity) 0.1 add elements of cavity row
- P(toothache) 0.05 add elements of toothache
column
27Conditional Probability
- P(cavity)0.1 and P(cavity n toothache)0.04 are
both prior (unconditional) probabilities - Once the agent has new evidence concerning a
previously unknown random variable, e.g.,
toothache, we can specify a posterior
(conditional) probability, e.g., P(cavity
toothache) - P(A B) P(A n B)/P(B) prob of A w/ U limited
to B - P(cavity toothache) 0.04/0.05 0.8
28Conditional Probability (continued)
- Definition of Conditional Probability P(A B)
P(A n B)/P(B) - Product rule gives an alternative formulation
P(A n B) P(A B) P(B) P(B A) P(A) - A general version holds for whole
distributionsP(Weather, Cavity) P( Weather
Cavity) P( Cavity)
29(No Transcript)
3013.3 The Axioms of probability
- All probabilities are between 0 and 1. For any
proposition a, 0ltP(a)lt1. - Necessarily true (i.e., valid) propositions have
probability 1, and necessarily false (i.e.,
unsatisfiable) propositions have probability 0. - P(true) 1 P(false) 0 .
- The probability of a disjunction is given by
- P(a ? b) P(a) P(b) - P(a ? b) .
31Using the axioms of probability
32Probability axioms
- Recall that any proposition a is equivalent to
the disjunction of all the atomic events in which
a holds call this set of events e(a). - Recall also that atomic events are mutually
exclusive, so the probability of any conjunction
of atomic events is zero, by axiom 2. Hence, from
axiom 3, we can derive the following simple
relationship - The probability of a proposition is equal to the
sum of the probabilities of the atomic events in
which it holds that is, - This equation provides a simple method for
computing the probability of any proposition,
given a full joint distribution that specifies
the probabilities of all atomic events.
33Why the axioms of prob. Are reasonable
- The axioms of probability can be seen as
restricting the set of probabilistic beliefs that
an agent can hold. - This is somewhat analogous to the logical case,
where a logical agent cannot simultaneously
believe A, B, and ?(A ? B), for example. - In the logical case, the semantic definition of
conjunction means that at least one of the three
beliefs just mentioned must be false in the
world, so it is unreasonable for an agent to
believe all three. - With probabilities, on the other hand, statements
refer not to the world directly, but to the
agent's own state of knowledge. - Why, then, can an agent not hold the following
set of beliefs, which clearly violates axiom 3? - P(a)0.4 P(a?b)0.0
- P(b) 0.3 P(a ? b) 0.8
34de Finettis augument
- Axioms of probability
- imply that certain logically related events must
have related probabilities - restrict the set of probabilistic beliefs that an
agent can hold - Example P(A ? B) P(A) P(B) P(An B)
- de Finetti (1931) proved An agent who bets
according to probabilities that violate these
axioms can be forced to bet so as to lose money
regardless of the outcome - Lets look at an example
35Example
3613.4 Probabilistic Inference
- Probabilistic inference is the computation from
observed evidence of posterior probabilities for
query propositions. - We use the full joint distribution as the
knowledge base form which answers to questions
may be derived. -
- Probabilities in joint distribution sum to 1
37Probabilistic Inference (cont)
- Probability of any proposition computed by
finding atomic events where proposition is true
and adding their probabilities - P(cavity ? toothache) 0.108 0.012 0.072
0.008 0.016 0.064 0.28 - P(cavity) 0.108 0.012 0.072 0.008 0.2
- P(cavity) is called a marginal probability and
the process of computing this is called
marginalization
38Probabilistic Inference (cont.)
- Can also compute conditional probabilities.
- P(? cavity toothache) P(?cavity?
toothache)/P(toothache) (0.016 0.064) /
(0.108 0.012 0.016 0.064) 0.4 - Denominator is viewed as a normalization
constant Stays constant no matter what the value
of Cavity is. (Book uses a to denote
normalization constant 1/P(X), for random
variable X.)
39(No Transcript)
40(No Transcript)
41Enumerate-Joint-Ask
4213.5 Independence
- How are P(toothache, catch, cavity, Weather
cloudy) and P(toothache, catch, cavity) related?
43Independence
44(No Transcript)
45Independence
- A and B are independent iff
- P(A ? B) P(A) P(B)
- P(A B) P(A)
- P(B A) P(B)
- Independence is essential for efficient
probabilistic reasoning
Cavity Toothache Xray
Cavity Toothache Xray Weather
decomposes into
Weather
P(T, X, C, W) P(T, X, C) P(W)
4613.6 Bayes' Rule
47Bayess Rule
- Product rule P(a?b) P(a b) P(b) P(b a)
P(a)
- ? Bayes' rule P(a b) P(b a) P(a) / P(b)
- or in distribution form
- P(YX) P(XY) P(Y) / P(X) aP(XY) P(Y)
- Useful for assessing diagnostic probability from
causal probability
- P(CauseEffect) P(EffectCause) P(Cause) /
P(Effect)
- E.g., let M be meningitis, S be stiff neck
- P(ms) P(sm) P(m) / P(s) 0.8 0.0001 / 0.1
0.0008
- Note posterior probability of meningitis still
very small!
48Example 1(simple case)
- A doctor knows that the disease meningitis(???)
causes the patient to have a stiff neck(????),
say, 50 of the time. - The prior probability that a patient has
meningitis is 1/50,000, - The prior probability that any patient has a
stiff neck is 1/20. - Letting s be the proposition that the patient has
a stiff neck and - m be the proposition that the patient has
meningitis, we have
49(No Transcript)
50Example 2(combining evidence)
51Conditional independence
52(No Transcript)
53Separates
54(No Transcript)
55Bayes' Rule and conditional independence
- P(Cavity toothache ? catch)
- aP(toothache ? catch Cavity) P(Cavity)
- aP(toothache Cavity) P(catch Cavity)
P(Cavity) -
- This is an example of a naïve Bayes model
- P(Cause,Effect1, ,Effectn) P(Cause)
piP(EffectiCause)
- Total number of parameters is linear in n
56Conditioning
- Idea Use conditional probabilities instead of
joint probabilities - P(A) P(A ? B) P(A ? ? B) P(A B)
P(B) P(A ? B) P(? B) - ExampleP(symptom) P(symptomdisease)
P(disease) P(symptomdisease) P(disease) - More generally P(Y) åz P(Yz) P(z)
- Marginalization and conditioning are useful rules
for derivations involving probability expressions.
57Conditional Independence
- A and B are conditionally independent given C iff
- P(A B, C) P(A C)
- P(B A, C) P(B C)
- P(A ? B C) P(A C) P(B C)
- Toothache (T), Spot in Xray (X),Cavity (C)
- None of these propositions are independent of one
other - But T and X are conditionally independent given C
58Conditional Independence (cont.)
- If I have a cavity, the probability that the XRay
shows a spot doesnt depend on whether I have a
toothache P(XT,C)P(XC) - The same independence holds if I havent got a
cavity P(XT, ? C)P(X?C) - Equivalent statementsP(TX,C)P(TC) and
P(T,XC)P(TC) P(XC) - Write out full joint distribution (chain
rule)P(T,X,C) P(TX,C) P(X,C)
P(TX,C) P(XC) P(C) P(TC)
P(XC) P(C)
59Another Example
- Battery is dead (B)
- Radio plays (R)
- Starter turns over (S)
- None of these propositions are independent of one
another - R and S are conditionally independent given B
60Combining Evidence
- We can do the evidence combination sequentially
61How do we Compute the Normalizing Constant (a)?
6213.7 Wumpus World (Revisited)
- Pij true iff i,j contains pit
- Bij true iff i,j is breezy
- Include only B1,1, B1,2, B2,1 in the probability
model - Known facts
- b b1,1 Æ b1,2 Æ b2,1
- known p1,1 Æ p1,2 Æ p2,1
Query P(P1,3known,b)
63(No Transcript)
64(No Transcript)
65Wumpus World (cont.)
- Key insight Observed breezes conditionally
independent of other variables given the known,
fringe, and query variables - Define UnknownFringe Other
P(bP1,3,Known,Unknown) P(bP1,3,Known,Fringe
) - Book has details.
- Bottom line
- - P(P1,3known,b) ¼ lt0.31,0.69gt
- - P(P2,2known,b) ¼ lt0.86,0.14gt
Query P(P1,3known,b)
66(No Transcript)
67(No Transcript)
68(No Transcript)
69(No Transcript)
70(No Transcript)
71Some Problems for Fun
- Prove the following
- P( A) 1 P(A)
- P(A Ç B Ç C) P(A) P(B) P(C) P(A Æ B)
P(A Æ C) P(B Æ C) P(A Æ B Æ C) - Show that P(A) P(A,B)
- Show that P(AB) P(AB) 1
- Show that the different formulations of
conditional independence are equivalent - P(A B, C) P(A C)
- P(B A, C) P(B C)
- P(A Æ B C) P(A C) P(B C)
- Conditional Bayes rule. Write an expression for
P(A B,C) in terms of P(B A,C).