Title: Uncertainty
1Uncertainty
2Outline
- Uncertainty
- Probability
- Syntax and Semantics
- Inference
- Independence and Bayes' Rule
3Sources of Uncertainty
- Information is partial
- Information is not fully reliable.
- Representation language is inherently imprecise.
- Information comes from multiple sources and it is
conflicting. - Information is approximate
- Non-absolute cause-effect relationships exist
4Basic Probability
- Probability theory enables us to make rational
decisions. - Which mode of transportation is safer
- Car or Plane?
- What is the probability of an accident?
5Basic Probability Theory
- An experiment has a set of potential outcomes,
e.g., throw a dice - The sample space of an experiment is the set of
all possible outcomes, e.g., 1, 2, 3, 4, 5, 6 - An event is a subset of the sample space.
- 2
- 3, 6
- even 2, 4, 6
- odd 1, 3, 5
6Language of probability
- random variables Boolean or discrete
- e.g., Cavity (do I have a cavity?)
- e.g., Weather is one of ltsunny,rainy,cloudy
,snowgt - Domain values must be exhaustive and mutually
exclusive -
- Elementary propositions
- e.g., Weather sunny, Cavity false
(or ?cavity) -
- Complex propositions formed from elementary
propositions and standard logical connectives
e.g., Weather sunny ? Cavity false
7Language of probability
- Atomic event A complete specification of the
state of the world about which the agent is
uncertain
- E.g., if the world consists of only two Boolean
variables Cavity and Toothache, then there are 4
distinct atomic events
- Cavity false ?Toothache false
- Cavity false ? Toothache true
- Cavity true ? Toothache false
- Cavity true ? Toothache true
-
8Axioms of probability
- For any propositions A, B
- 0 P(A) 1
- P(true) 1 and P(false) 0
- P(A ? B) P(A) P(B) - P(A ? B)
9Prior probability
- Prior or unconditional probabilities of
propositions
- e.g., P(Cavity true) 0.1
- P(Weather sunny) 0.72
- belief prior to arrival of any (new) evidence
- Notation for prior probability distribution
-
- E.g., suppose domain of Weather is sunny,
rain, cloudy, snow - We may write
- P(Weather) lt0.7, 0.2, 0.08,
0.02gt - (note we use bold P in this
case) - instead of
- P(Weathersunny) 0.7
- P(Weatherrain) 0.2
-
-
-
10Prior probability
- Joint probability distribution for a set of
random variables gives the probability of every
atomic event on random variables
- P(Weather,Cavity) a 4 2 matrix of values
- Weather sunny rainy cloudy snow
-
- Cavity true 0.144 0.02 0.016 0.02
- Cavity false 0.576 0.08 0.064 0.08
- All questions about a domain can be answered by
the full joint distribution
-
-
11- Example
- 100 attempts are made to swim a length in 30
secs. The swimmer succeeds on 20 occasions
therefore the probability that a swimmer can
complete the length in 30 secs is - 20/100 0.2
- Failure 1-.2 or 0.8
12Conditional probability
- Conditional or posterior probabilities
- e.g., P(cavity toothache) 0.8
- i.e., given that toothache is all I know
-
- If we know more, e.g., cavity is also given, then
we have
- P(cavity toothache,cavity) 1
- New evidence may be irrelevant, allowing
simplification, e.g.,
- P(cavity toothache, sunny)
- P(cavity toothache) 0.8
13Conditional probability
- Definition of conditional probability
- P(a b) P(a ? b) / P(b) if P(b) gt 0
- The definition suggests that conditional
probability can be computed from unconditional
probabilities. - Product rule joint probability in terms of cond.
probability - P(a ? b) P(a b) P(b) P(b a) P(a)
14Probabilistic Reasoning
- Evidence
- What we know about a situation.
- Hypothesis
- What we want to conclude.
- Compute
- P( Hypothesis Evidence )
15Credit Card Authorization
- E is the data about the applicant's age, job,
education, income, credit history, etc, - H is the hypothesis that the credit card will
provide positive return. - The decision of whether to issue the credit card
to the applicant is based on the probability
P(HE).
16Medical Diagnosis
- E is a set of symptoms, such as, coughing,
sneezing, headache, ... - H is a disorder, e.g., common cold, SARS, flu.
- The diagnosis problem is to find an H (disorder)
such that P(HE) is maximum.
17How to Compute P(AB)?
B
A
18Business Students
- Of 100 students completing a course, 20 were
business major. 10 students received an A in the
course, and 3 of these were business majors.
Suppose A is the event that a randomly selected
student got an A in the course, B is the event
that a randomly selected student is a business
major. What is the probability of A? What is the
probability of A after knowing B is true?
19Contd
- If you look at the picture on the last slide,
- you see clearly
- P(AB) 3/200.15
- More formally, you can also calculate it by
- P(AB) P(A,B)/P(B) 0.03/0.20.15
20Inference by enumeration
- Start with the joint probability distribution
-
- For any proposition f, sum the atomic events
where it is true P(f) S??f P(?)
- E.g.
- P(toothache) 0.1080.0120.0160.0640.2
21Inference by enumeration
- Start with the joint probability distribution
-
- For any proposition f, sum the atomic events
where it is true P(f) S??f P(?)
- E.g.
- P(toothache) 0.108 0.012 0.016
0.064 0.2
22Inference by enumeration
- Start with the joint probability distribution
- Can also compute conditional probabilities
- P(?cavity toothache) P(?cavity ? toothache)
- P(toothache)
- 0.0160.064
- 0.2
- 0.4
23Normalization
- Denote 1/P(toothache) by a, which can be viewed
as a normalization constant a for the
distribution P(Cavity toothache), ensuring it
adds up to 1. - We thus write
- P(Cavity toothache) a
P(Cavity,toothache) - a P(Cavity,toothache,catch)
P(Cavity,toothache,? catch) - a lt0.108,0.016gt lt0.012,0.064gt
- a lt0.12,0.08gt lt0.6,0.4gt
- General idea compute distribution on query
variable by fixing evidence variables and summing
over hidden variables
24Inference by enumeration
- Typically, we are interested in
- the posterior joint distribution of the query
variables Y - given specific values e for the evidence
variables E
- Let the hidden variables be H X - Y - E
- Then the required summation of joint entries is
done by summing out the hidden variables
- P(Y E e) aP(Y,E e) aShP(Y,E e, H h)
- The terms in the summation are joint entries
because Y, E and H together exhaust the set of
random variables
- Obvious problems
- Worst-case time complexity O(dn) where d is the
largest arity
- Space complexity O(dn) to store the joint
distribution
25Bayes' Rule
- Product rule P(a?b) P(a b) P(b) P(b a)
P(a)
- ? Bayes' rule P(a b) P(b a) P(a) / P(b)
- or in distribution form
- P(YX) P(XY) P(Y) / P(X) aP(XY) P(Y)
- Useful for assessing diagnostic probability from
causal probability
- P(CauseEffect) P(EffectCause) P(Cause) /
P(Effect)
- E.g., let M be meningitis, S be stiff neck
- P(ms) P(sm) P(m) / P(s) 0.8 0.0001 / 0.1
0.0008
- Note posterior probability of meningitis still
very small!
26Exercise
A patient takes a lab test and the result comes
back positive. The test has a false negative rate
of 2 and false positive rate of 3. Furthermore,
0.01 of the entire population have this
disease. What is the probability of disease if
we know the test result is positive? Some info
(below, d for disease, t for test pos. and -t for
negative) P(td) 0.98
P(t-d)0.03 P(d) 0.0001
27Rough calculation If 10000 people take the
test, we expect 1 to have disease and likely to
test positive. While the rest do not have the
disease, 300 of them will test positive anyway.
So, the chance of a positive test having disease
is 1/300, a very small number.
28 More precisely, with P(td) 0.98,
P(t-d)0.03, P(d) 0.0001 we get P(dt)
P(td)P(d)/P(t)
by Bayes rule P(td)P(d)/P(t,d)P(t,-d)
summing out
P(td)P(d)/P(td)P(d) P(t-d)P(-d)
product rule 0.980.0001/0.980.00010.030.9999
0.00325
29Independence
- A and B are independent iff
- P(AB) P(A) or P(BA) P(B) or P(A, B)
P(A) P(B)
- P(Toothache, Catch, Cavity, Weather)
- P(Toothache, Catch, Cavity) P(Weather)
- 32 entries reduced to 12
- Absolute independence is powerful but rare
- Dentistry is a large field with hundreds of
variables, none of which are independent. What to
do?
30Conditional independence
- P(Toothache, Cavity, Catch) has 23 independent
entries
- If I have a cavity, the probability that the
probe catches in it doesn't depend on whether I
have a toothache
- (1) P(catch toothache, cavity) P(catch
cavity) - The same independence holds if I haven't got a
cavity
- (2) P(catch toothache,?cavity) P(catch
?cavity) -
- Catch is conditionally independent of Toothache
given Cavity
- P(Catch Toothache,Cavity) P(Catch Cavity)
- Equivalent statements
- P(Toothache Catch, Cavity) P(Toothache
Cavity)
- P(Toothache, Catch Cavity) P(Toothache
Cavity) P(Catch Cavity)
31Conditional independence contd.
- Write out full joint distribution using chain
rule
- P(Toothache, Catch, Cavity)
- P(Toothache Catch, Cavity) P(Catch, Cavity)
- P(Toothache Catch, Cavity) P(Catch Cavity)
P(Cavity)
- P(Toothache Cavity) P(Catch Cavity)
P(Cavity)
- I.e., 2 2 1 5 independent numbers
- In most cases, the use of conditional
independence reduces the size of the
representation of the joint distribution from
exponential in n to linear in n. - Conditional independence is our most basic and
robust form of knowledge about uncertain
environments.
32Naïve Bayes model
- This is an example of a naïve Bayes model
- P(Cause,Effect1, ,Effectn) P(Cause)piP(Effecti
Cause)
- This is correct if Effects are all conditionally
independent given Cause.
- Total number of parameters is linear in n
33Summary
- Probability is a rigorous formalism for uncertain
knowledge
- Joint probability distribution specifies
probability of every atomic event - Queries can be answered by summing over atomic
events
- For nontrivial domains, we must find a way to
reduce the joint size
- Independence and conditional independence provide
the tools