Introduction to Reasoning under Uncertainty - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Introduction to Reasoning under Uncertainty

Description:

Probability Theory ... Bayesian Theory sees a conditional probability as a ... Probability theory provides a machinery for uncertain reasoning, but often times, ... – PowerPoint PPT presentation

Number of Views:1557

Avg rating:3.0/5.0

Slides: 29

Provided by: Office2004855

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Reasoning under Uncertainty

1
Introduction to Reasoning under Uncertainty
MINDLab. Seminars on Reasoning and Planning under
Uncertainty

Ugur Kuter
MIND Lab.
8400 Baltimore Avenue, Ste. 200
College Park, Maryland, 20742
Web Site for the seminars http//www.cs.umd.edu/u
sers/ukuter/uncertainty/

2
From Classical Logic to Uncertainty

Logical reasoning is the process of deriving
previously-unknown facts based on the known ones
However, it is not always possible to have access
to the entire set of facts to do the reasoning
What we lose under uncertainty
Deduction (reasoning in stages)
Incrementality
Modality Locality Detachment

3
Existing Approaches

Semantic Models for Uncertainty
Rule-based systems (i.e., expert systems)
Organizes the knowledge in terms of if-then
rules, each associated with a numeric certainty
measure
Suffer from most of the problems with classical
logic under uncertainty
Declarative Systems
Organizes knowledge based on likelihood,
relevance, and causation among events
Flexible to uncertainty

4
Basic Terminology

Random variables describe the features of the
world whose status may or may not be known
Propositions are bi-modal (i.e., Boolean) random
variables
We will assume a finite set of propositional
symbols, denoted as A,B,
Sometimes, we will use English sentences to
denote propositions, e.g.,
Without loss of generality, we will assume that
the world can be described by a finite set of
propositions
An event in the world is an occurrence in the
world that is modeled by assigning truth values
to one or more propositions
e.g., the outcomes of rolling two dice are the
same

5
Probability Theory

Probabilities as a way for representing the
structure of knowledge and reasoning over that
knowledge
Notation P(A) is the probability of the
proposition A being true in the knowledge base
(or alternatively, it is the probability of the
event A occurs in the world)
The Axioms of Probability Theory
0 ? P(A) ? 1
P(certain event) 1
P(A or B) P(A) P(B) - P(A and B)

6
Joint Probability Distributions

A joint event describes two occurrences at the
same time
e.g., (A and B) specifies that both propositions
A and B is true in the world
A joint probability distribution over a set of
random variables specifies a probability for each
possible combinations of values for those
variables
e.g., a joint probability distribution for
boolean variables X and Y specifies a probability
for four cases
(X and Y), (X and ?Y), (?X and Y), and finally
(?X and ?Y)
The sum of the joint probability of each case
must be equal to 1

7
Absolute Independence

Suppose two events in a joint probability
distribution are independent from each other
(i.e., they are mutually-exclusive)
Then we can decompose the joint probability
distribution into smaller distributions
The joint probability of two events under
absolute independence
P(A and B) P(A) P(B)
However in most problems, absolute independence
does not hold

8
Set-theoretic Interpretation of Probabilities
W The set of all possible worlds described by
the knowledge base
WA The set of possible worlds in which A is true

The probability of A being true is the proportion
of WA to W
Set-theoretic operations correspond logical
connectives i.e.,
A and B ? WA ? WB
A or B ? WA ? WB
?A ? W - WA

9
Revisiting the Third Axiom

P(A or B) P(A) P(B) - P(A and B)
that is, the proportion of WA ? WB to W
If A and B are mutually-exclusive events (i.e.,
WA ? WB ?) , then we have
P(A or B) P(A) P(B)

WA
WB
10
Using the Axioms of Probability

Rule via the union of joint events
P(A) P(A and B) P(A and ?B),
since
A and B ? WA ? WB
A and ?B ? WA - WB
More generally,
P(A) ?i P(A and Bi),
where B1, B2, , Bk is a set of exhaustive and
mutually-exclusive set of events

11
Using the Axioms of Probability

Rule for absolute falsity
P(A) P(A and A) P(A and ?A), by rule of the
union of joint events
P(A) P(false), by logical
equivalence
P(false) 0, by algebra
Rule for Negation
P( A or ?A) P(A) P(?A) - P(A and ?A), by
axiom 3
P(true) P(A) P(?A) - P(false), by
logical equivalence
1 P(A) P(?A) - 0, by axiom
2
P(A) 1 - P(?A), by algebra

12
Conditional Probabilities

A conditional probability, P(A B), describes
the belief in the event A, under the assumption
that another event B is known with absolute
certainty
Formal Definition
P(A B) P(A and B) / P(B)
That is, the proportion of
WA ? WB to WB

13
The Bayesians (1763 -- present)

The basis of the Bayesian Theory is conditional
probabilities
Bayesian Theory sees a conditional probability as
a way to describe the structure and organization
of human knowledge
In this view, A B stands for the event A in the
context of the event B
E.g., the symptom A in the context of a disease B

14
Mathematicians vs. Bayesians, i.e.,

P(A B) P(A and B) / P(B) vs. P(A and B)
P(A B) P(B)
Example
The probability of the event A ? the outcomes of
two dice are equal
Mathematician would compute the rule for joint
events A and Bi, where
each Bi is the event the outcome of first dice
is i
P(A) ?i P(A and Bi),
6 x (1 / 36)
1 / 6

15
Mathematicians vs. Bayesians, contd

The Bayesian Mindset for Dice Rolling
P( Equality) ?i P (Outcome of the second dice
is i Bi) P(Bi)
6 x (1/6) (1/6)
1 /6
Since this is more natural in the
assumption-based mental processes of human
reasoning, i.e.
given that I know

16
The CHAIN Rule

The probability that a joint event (A1, , Ak)
occurs can be computed via the conditional
probabilities
P (A1, , Ak) P(Ak A1, , Ak-1)
P(Ak-1 A1, , Ak-2)
P(A2 A1) P(A1)

17
Evidential Reasoning The Inversion Rule

Reasoning about hypotheses and evidences
that do/do not support those hypothesis is the
main venue for Bayesian Inference
P(H e) given that I know about an evidence e,
the probability that my hypothesis H is true
P(H e) P(e H) P(H) / P(e),
where P(e H) is the probability that evidence
e will actually be observed in the world, if the
hypothesis H is true.

18
Pooling of Evidence

Suppose the hypothesis H is that a patient has a
particular disease
Let e1, , ek be the possible symptoms of that
disease
If we observed some of the symptoms but not all
of them, then the combined belief that the
hypothesis is true can be computed as
P(e1, , ek H)P(H) / P(e1, , ek
?H)P(?H)
This will require an exponential number of
conditional probabilities to specify

19
Naïve Bayes

Use conditional independence assumption
e.g., the event that whether we observe a symptom
or not depends only on whether the patient has
the disease, not on other symptoms
Then, the conditional probability P(e1, , ek
H) can be computed as
P(e1, , ek H) ?i P(ei H)

20
Incremental Bayesian Updating

Let H be an hypothesis, E e1, , ek be the
past data (evidence) observed for the hypothesis
H
Suppose we observed a new data (evidence) e
What is the probability that the hypothesis is
true, given the past and the new evidence?
Computing this probability is in efficient
because it requires to store all the past
evidence and combine it with the new evidence
Bayesian Theory allows us to reformulate the
question as follows
How do we update our belief in H given the new
evidence and our past belief in H?

21
Incremental Bayesian Updating (contd)

Combining prior beliefs with new evidence
P(H E and e) P(H E) P(e E,H) / P(e E)
Assuming condition independence between the new
evidence and the old ones given the hypothesis
(i.e., P(e E,H) P(e H))

Normalization constant
Updated belief in H
Probability that the new evidence will be
observed, given the old evidence and the
hypothesis
Prior belief in H
P(H E and e) ? P(H E) P(e H)
22
Hierarchical Models

So far, we assumed that the evidence from the
world is directly linked to the hypothesis we are
evaluating
E.g., a disease and its symptoms
In many problems, this not the case e.g.,

e1
e2
ek
H
23
1. Cascading Inference

What is the probability that a burglary took
place, given the two neighbors testimonies?
P(B G,W) ? P(G, W B) P(B)
? P(H) ( P(G,W B, S true)
P(Strue B)
P(G,W B, S false) P(Sfalse B)
)
From conditional independence, we have
P(G,W B, S true, false) P(G S) P(W
S)
Thus, we have
P(B G,W) ? P(H) ?Strue,false P(G S)
P(W S) P(S B)

24
2. Predicting Future

The daughter will call with some probability, if
she hears the alarm sound
What is the probability that the daughter will
call, given the testimonies of the neighbors?

25
Predicting Future

The probability that
the daughter will call is
P(D e) ?Strue,false P(D S) P(S e)
By inversion rule,
P(Strue,false e) ? P(e Strue,false)
P(Strue,false)
where P(e S ) P(G S) P(W S) as before, and
P(S) ?Btrue,false P(S B) P(B)

Known
Unknown
26
3. Explaining Away

If an event has multiple causes, sometimes the
occurrence of one cause reduces our belief in the
occurrence of the other
If the Alarm is sensitive enough to go off when
an earthquake occurs, the occurrence of the
earthquake explains away the burglary hypothesis

27
Interactions between Multiple Causes

Conditional Probability Tables
(CPTs)
P(SB,E) P(B) P(E)
P(S?B,E)P(?B)P(E)
P(SB, ?E)P(B)P(?E)
P(S?B, ?E)P(?B)P(?E)
The size of a CPT is exponential in the size of
the number of causes
i.e., if there are k causes of an event X, then
the CPT for X has 2k entries
This creates serious efficiency problems for
exact reasoning algorithms
Approximation techniques are developed for
reasoning over incomplete CPTs
E.g., Noisy-OR (NOR), Generalized-NOR,
Recursive-NOR
We will discuss these techniques later

28
Summary

Reasoning with classical logic has problems with
plausible reasoning under uncertainty
Probability theory provides a machinery for
uncertain reasoning, but often times, it is very
inefficient to depend on solely probability
theory
Exponential-sized CPTs
No structural reasoning over the knowledge
Bayesians use probability theory in order to
describe the organization (i.e., the structure)
of human knowledge and develop reasoning
machinery over such structures
Hierarchical Modeling
Next Week
Structural Models of the Bayesians Networks of
Plausible Reasoning