Title: Introduction to Reasoning under Uncertainty
1Introduction to Reasoning under Uncertainty
MINDLab. Seminars on Reasoning and Planning under
Uncertainty
- Ugur Kuter
- MIND Lab.
- 8400 Baltimore Avenue, Ste. 200
- College Park, Maryland, 20742
- Web Site for the seminars http//www.cs.umd.edu/u
sers/ukuter/uncertainty/
2From Classical Logic to Uncertainty
- Logical reasoning is the process of deriving
previously-unknown facts based on the known ones - However, it is not always possible to have access
to the entire set of facts to do the reasoning - What we lose under uncertainty
- Deduction (reasoning in stages)
- Incrementality
- Modality Locality Detachment
3Existing Approaches
- Semantic Models for Uncertainty
- Rule-based systems (i.e., expert systems)
- Organizes the knowledge in terms of if-then
rules, each associated with a numeric certainty
measure - Suffer from most of the problems with classical
logic under uncertainty - Declarative Systems
- Organizes knowledge based on likelihood,
relevance, and causation among events - Flexible to uncertainty
4Basic Terminology
- Random variables describe the features of the
world whose status may or may not be known - Propositions are bi-modal (i.e., Boolean) random
variables - We will assume a finite set of propositional
symbols, denoted as A,B, - Sometimes, we will use English sentences to
denote propositions, e.g., -
- Without loss of generality, we will assume that
the world can be described by a finite set of
propositions - An event in the world is an occurrence in the
world that is modeled by assigning truth values
to one or more propositions - e.g., the outcomes of rolling two dice are the
same
5Probability Theory
- Probabilities as a way for representing the
structure of knowledge and reasoning over that
knowledge - Notation P(A) is the probability of the
proposition A being true in the knowledge base
(or alternatively, it is the probability of the
event A occurs in the world) - The Axioms of Probability Theory
- 0 ? P(A) ? 1
- P(certain event) 1
- P(A or B) P(A) P(B) - P(A and B)
6Joint Probability Distributions
- A joint event describes two occurrences at the
same time - e.g., (A and B) specifies that both propositions
A and B is true in the world - A joint probability distribution over a set of
random variables specifies a probability for each
possible combinations of values for those
variables - e.g., a joint probability distribution for
boolean variables X and Y specifies a probability
for four cases - (X and Y), (X and ?Y), (?X and Y), and finally
(?X and ?Y) - The sum of the joint probability of each case
must be equal to 1
7Absolute Independence
- Suppose two events in a joint probability
distribution are independent from each other
(i.e., they are mutually-exclusive) - Then we can decompose the joint probability
distribution into smaller distributions - The joint probability of two events under
absolute independence -
- P(A and B) P(A) P(B)
- However in most problems, absolute independence
does not hold
8Set-theoretic Interpretation of Probabilities
W The set of all possible worlds described by
the knowledge base
WA The set of possible worlds in which A is true
- The probability of A being true is the proportion
of WA to W - Set-theoretic operations correspond logical
connectives i.e., - A and B ? WA ? WB
- A or B ? WA ? WB
- ?A ? W - WA
9Revisiting the Third Axiom
- P(A or B) P(A) P(B) - P(A and B)
- that is, the proportion of WA ? WB to W
- If A and B are mutually-exclusive events (i.e.,
WA ? WB ?) , then we have - P(A or B) P(A) P(B)
WA
WB
10Using the Axioms of Probability
- Rule via the union of joint events
- P(A) P(A and B) P(A and ?B),
- since
- A and B ? WA ? WB
- A and ?B ? WA - WB
- More generally,
-
- P(A) ?i P(A and Bi),
- where B1, B2, , Bk is a set of exhaustive and
mutually-exclusive set of events
11Using the Axioms of Probability
- Rule for absolute falsity
- P(A) P(A and A) P(A and ?A), by rule of the
union of joint events - P(A) P(false), by logical
equivalence - P(false) 0, by algebra
- Rule for Negation
- P( A or ?A) P(A) P(?A) - P(A and ?A), by
axiom 3 - P(true) P(A) P(?A) - P(false), by
logical equivalence - 1 P(A) P(?A) - 0, by axiom
2 - P(A) 1 - P(?A), by algebra
12Conditional Probabilities
- A conditional probability, P(A B), describes
the belief in the event A, under the assumption
that another event B is known with absolute
certainty - Formal Definition
-
- P(A B) P(A and B) / P(B)
- That is, the proportion of
- WA ? WB to WB
13The Bayesians (1763 -- present)
- The basis of the Bayesian Theory is conditional
probabilities - Bayesian Theory sees a conditional probability as
a way to describe the structure and organization
of human knowledge - In this view, A B stands for the event A in the
context of the event B - E.g., the symptom A in the context of a disease B
14Mathematicians vs. Bayesians, i.e.,
- P(A B) P(A and B) / P(B) vs. P(A and B)
P(A B) P(B) - Example
- The probability of the event A ? the outcomes of
two dice are equal - Mathematician would compute the rule for joint
events A and Bi, where - each Bi is the event the outcome of first dice
is i - P(A) ?i P(A and Bi),
- 6 x (1 / 36)
- 1 / 6
15Mathematicians vs. Bayesians, contd
- The Bayesian Mindset for Dice Rolling
-
- P( Equality) ?i P (Outcome of the second dice
is i Bi) P(Bi) - 6 x (1/6) (1/6)
- 1 /6
- Since this is more natural in the
assumption-based mental processes of human
reasoning, i.e. -
- given that I know
-
16The CHAIN Rule
- The probability that a joint event (A1, , Ak)
occurs can be computed via the conditional
probabilities - P (A1, , Ak) P(Ak A1, , Ak-1)
- P(Ak-1 A1, , Ak-2)
-
- P(A2 A1) P(A1)
17Evidential Reasoning The Inversion Rule
- Reasoning about hypotheses and evidences
that do/do not support those hypothesis is the
main venue for Bayesian Inference - P(H e) given that I know about an evidence e,
the probability that my hypothesis H is true - P(H e) P(e H) P(H) / P(e),
- where P(e H) is the probability that evidence
e will actually be observed in the world, if the
hypothesis H is true. -
18Pooling of Evidence
- Suppose the hypothesis H is that a patient has a
particular disease - Let e1, , ek be the possible symptoms of that
disease - If we observed some of the symptoms but not all
of them, then the combined belief that the
hypothesis is true can be computed as - P(e1, , ek H)P(H) / P(e1, , ek
?H)P(?H) - This will require an exponential number of
conditional probabilities to specify
19Naïve Bayes
- Use conditional independence assumption
- e.g., the event that whether we observe a symptom
or not depends only on whether the patient has
the disease, not on other symptoms - Then, the conditional probability P(e1, , ek
H) can be computed as - P(e1, , ek H) ?i P(ei H)
20Incremental Bayesian Updating
- Let H be an hypothesis, E e1, , ek be the
past data (evidence) observed for the hypothesis
H - Suppose we observed a new data (evidence) e
- What is the probability that the hypothesis is
true, given the past and the new evidence? - Computing this probability is in efficient
because it requires to store all the past
evidence and combine it with the new evidence - Bayesian Theory allows us to reformulate the
question as follows - How do we update our belief in H given the new
evidence and our past belief in H?
21Incremental Bayesian Updating (contd)
- Combining prior beliefs with new evidence
- P(H E and e) P(H E) P(e E,H) / P(e E)
- Assuming condition independence between the new
evidence and the old ones given the hypothesis
(i.e., P(e E,H) P(e H))
Normalization constant
Updated belief in H
Probability that the new evidence will be
observed, given the old evidence and the
hypothesis
Prior belief in H
P(H E and e) ? P(H E) P(e H)
22Hierarchical Models
- So far, we assumed that the evidence from the
world is directly linked to the hypothesis we are
evaluating - E.g., a disease and its symptoms
- In many problems, this not the case e.g.,
e1
e2
ek
H
231. Cascading Inference
- What is the probability that a burglary took
place, given the two neighbors testimonies? - P(B G,W) ? P(G, W B) P(B)
- ? P(H) ( P(G,W B, S true)
P(Strue B) - P(G,W B, S false) P(Sfalse B)
) - From conditional independence, we have
- P(G,W B, S true, false) P(G S) P(W
S) - Thus, we have
- P(B G,W) ? P(H) ?Strue,false P(G S)
P(W S) P(S B)
242. Predicting Future
- The daughter will call with some probability, if
she hears the alarm sound - What is the probability that the daughter will
call, given the testimonies of the neighbors?
25Predicting Future
- The probability that
- the daughter will call is
- P(D e) ?Strue,false P(D S) P(S e)
- By inversion rule,
-
- P(Strue,false e) ? P(e Strue,false)
P(Strue,false) - where P(e S ) P(G S) P(W S) as before, and
-
- P(S) ?Btrue,false P(S B) P(B)
Known
Unknown
263. Explaining Away
- If an event has multiple causes, sometimes the
occurrence of one cause reduces our belief in the
occurrence of the other - If the Alarm is sensitive enough to go off when
an earthquake occurs, the occurrence of the
earthquake explains away the burglary hypothesis
27Interactions between Multiple Causes
- Conditional Probability Tables
- (CPTs)
- P(SB,E) P(B) P(E)
- P(S?B,E)P(?B)P(E)
- P(SB, ?E)P(B)P(?E)
- P(S?B, ?E)P(?B)P(?E)
- The size of a CPT is exponential in the size of
the number of causes - i.e., if there are k causes of an event X, then
the CPT for X has 2k entries - This creates serious efficiency problems for
exact reasoning algorithms - Approximation techniques are developed for
reasoning over incomplete CPTs - E.g., Noisy-OR (NOR), Generalized-NOR,
Recursive-NOR - We will discuss these techniques later
28Summary
- Reasoning with classical logic has problems with
plausible reasoning under uncertainty - Probability theory provides a machinery for
uncertain reasoning, but often times, it is very
inefficient to depend on solely probability
theory - Exponential-sized CPTs
- No structural reasoning over the knowledge
- Bayesians use probability theory in order to
describe the organization (i.e., the structure)
of human knowledge and develop reasoning
machinery over such structures - Hierarchical Modeling
- Next Week
- Structural Models of the Bayesians Networks of
Plausible Reasoning