Title: Modelling uncertainty
1Modelling uncertainty
2Probability of an event
- Classical method
- If an experiment has n possible outcomes assign
a probability of 1/n to each experimental
outcome. - Relative frequency method
- Probability is the relative frequency of the
number of events satisfying the constraints. - Subjective method
- Probability is a number characterising the
likelihood of an event degree of belief
3Axioms of the probability theory
Axiom I The probability value assigned to each
experimental outcome must be between 0 and 1.
Axiom II The sum of all the experimental
outcome probabilities must be 1.
4Conditional probability
denoted by P(AB) expresses belief that event A
is true assuming that event B is true (events A
and B are dependent)
Definition Let the probability of event B be
positive. Conditional probability of event A
under condition B is calculated as follows
5Joint probability
If events A1, A2,... Are mutually exclusive and
cover the sample space ?, and P(Ai) gt 0 for i
1, 2,... then for any event B the following
equality holds
6Bayes Theorem
Thomas Bayes (1701-1761)
If the events A1, A2,... fulfil the assumptions
of the joint probability theorem, and P(B) gt 0,
then for i 1, 2,... The following equality holds
7Bayes Theorem
Prior probabilities
New information
Bayes theorem
Posterior probabilities
Let us denote H hipothesis E evidence The
Bayes rule has the form
8Difficulties with joint probability distribution
(tabular approach)
- the joint probability distribution has to be
defined and stored in memory - high computational effort required to calculate
marginal and conditional probabilities
9n sample points 2n probabilities
P(B,M)
10Wymagania odnosnie do modelu niepewnosci w
systemach regulowych
- W systemach wnioskowania logicznego regula
postaci A ? B pozwala wywnioskowac B, gdy tylko
zachodzi A, niezaleznie od innych faktów. W
systemach probabilis-tycznych trzeba wziac pod
uwage wszystkie dostepne przeslanki. - Jezeli przeprowadzimy dowód jakiejs tezy, to tezy
tej mozna uzyc w kolejnych dowodach bez potrzeby
ponownego jej dowodzenia. W systemach
probabilis-tycznych przeslanki uzyte do dowodu
moga ulec zmianie. - W logice prawdziwosc zdan zlozonych mozna
wywnioskowac na podstawie wartosci logicznej
termów. Wnioskowanie probabilistyczne nie
zachowuje tej wlasnosci, chyba, ze nalozymy silne
ograniczenia o niezaleznosci.
11Certainty factor
- Buchanan, Shortliffe 1975
- Model developed for the rule expert system MYCIN
If E then H
hipothesis
evidence (observation)
12Belief
- MBH, E measure of the increase of belief that
H is true based on observation E.
13Disbelief
- MDH, E measure of the increase of disbelief
that H is true based on observation E.
14Certainty factor
CF ? 1, 1
15Interpretation of the certainty factor
Certainty factor is associated with a rule If
evidence then hipothesis and denotes the change
in belief that H is true after observation E.
CF(H, E)
E
H
16Uncertainty propagation
Parallel rules
17Uncertainty propagation
Serial rules
If CF(H,?E2) is not defined, it is assumed to be
0.
18Certainty factor probabilistic definition
Heckerman 1986
19Certainty measure
Grzymala-Busse 1991
C(H)
C(E)
CF(H, E)
E
H
20Example 1
C(s1 ? s2) min(0,2 0,1) 0,1
CF(h, s1 ? s2) 0,4 0 0
C(h) 0,3 (1 0,3) 0 0,3 0 0,3
21Example 2
C(s1 ? s2) min(0,2 0,8) 0,2
CF(h, s1 ? s2) 0,4 0,2 0,08
C(h) 0,3 (1 0,3) 0,08 0,3 0,7 0,08
0,356
22Dempster-Shafer theory
Each hipothesis is characterised by two values
balief and plausibility. It models not only
belief, but also the amount of acquired
information.
23Density probability function
24Belief
Belief Bel ? 0,1 measures the value of
acquired information supporting the belief that
the considered set hipothesis is true.
25Plausibility
Plausibility Pl ? 0,1 measures how much the
belief that A is true is limited by evidence
supporting ?A.
26Combining various sources of evidence
Assume two sources of evidence X and Y
represented by respective subsets of ? X1,...,Xm
and Y1,...,Yn. Probability density functions m1
and m2 are defined on X and Y respectively.
Combining observations from two sources a new
value m3(Z) is calculated for each subset of ? as
follows
27Example
A allergy F flu C cold P - pneumonia
m1(?) 1
? A, F, C, P
Observation 1
m2(A, F, C) 0,6
m2(?) 0,4
m2(A, F, C) 0,6
m2(?) 0,4
m1(?) 1
m3(A, F, C) 0,6
m3(?) 0,4
28Example
m3(A, F, C) 0,6
m3(?) 0,4
Observation 2
m4(F,C,P) 0,8
m4(?) 0,2
m4(F,C,P) 0,8
m4(?) 0,2
m5(F,C) 0,48
m5(A,F,C) 0,12
m3(A,F,C) 0,6
m3(?) 0,4
m5(F,C,P) 0,32
m5(?) 0,08
29Example
m5(F,C) 0,48
m5(A,F,C) 0,12
m5(F,C,P) 0,32
m5(?) 0,08
Observation 3
m6(A) 0,75
m6(?) 0,25
m6(A) 0,75
m6(?) 0,25
m7(?) 0,36
m7(F,C) 0,12
m5(F,C) 0,48
m7(A) 0,09
m7(A,F,C) 0,03
m5(A,F,C) 0,12
m7(?) 0,24
m7(F,C,P) 0,08
m5(F,C,P) 0,32
m7(A) 0,06
m7(?) 0,02
m5(?) 0,08
30Example
m5(F,C) 0,48
m5(A,F,C) 0,12
m5(F,C,P) 0,32
m5(?) 0,08
Observation 3
m6(A) 0,75
m6(?) 0,25
m7(?) 0,6
m6(A) 0,75
m6(?) 0,25
m7(?) 0,36
m7(F,C) 0,12
m5(F,C) 0,48
m7(A) 0,09
m7(A,F,C) 0,03
m5(A,F,C) 0,12
m7(?) 0,24
m7(F,C,P) 0,08
m5(F,C,P) 0,32
m7(A) 0,06
m7(?) 0,02
m5(?) 0,08
31Example
m7(A) 0,375
m7(A) 0,15
m7(F,C) 0,3
m7(F,C) 0,12
m7(A,F,C) 0,075
m7(A,F,C) 0,03
m7(F,C,P) 0,2
m7(F,C,P) 0,08
m7(?) 0,05
m7(?) 0,02
1 0,3 0,2
A 0,375, 0,500 F 0, 0,625 C 0,
0,625 P 0, 0,250
1 0,375
1 0,375 0,3 0,075
32Fuzzy sets (Zadeh)
Rough sets (Pawlak)
33Probabilistic reasoning
34Probabilistic reasoning
B burglary E earthquake A alarm J John
calls M Mary calls
?
Joint probability distribution P(B,E,A,J,M)
35Joint probability distribution
36Probabilistic reasoning
What is the probability of a burglary if Mary
called? P(ByMy) ?
Marginal probability
Conditional probability
37Advantages of probabilistic reasoning
- Sound mathematical theory
- On the basis of the joint probability
distribution one can reason about - the reasons on the basis of the observed
consequences, - consequences on the basis of given evidence,
- Any combination of the above ones.
- Clear semantics based on the interpretation of
probability. - Model can be taught with statistical data.
38Complexity of probabilistic reasoning
- in the alarm example
- (25 1) 31 values,
- direct acces to unimportant information, e.g.
- P(B1,E1,A1,J1,M1)
- calculating any practical value, e.g. P(B1M1)
requires 29 elementary operations. - in general
- P(X1, ..., Xn) requires storing 2n-1 values
- difficult knowledge acquisition (not natural)
- exponential complexity
39Bayes theorem
40Bayes theorem
B depends on A
P(BA)
41The chain rule
P(X1,X2) P(X1)P(X2X1) P(X1,X2,X3)
P(X1)P(X2X1)P(X3X1,X2) .........................
....................................... P(X1,X2,..
.,Xn) P(X1)P(X2X1)...P(XnX1,...,Xn-1)
42Conditional independence of variables in a domain
In any domain one can define a set of variables
pa(Xi)?X1, ..., Xi1 such that Xi is
independent of variables from the set X1, ...,
Xi1 \ pa(Xi). Thus P(XiX1, ..., Xi 1)
P(Xipa(Xi)) and P(X1, ..., Xn) ? P(Xipa(Xi))
n
i1
43Bayesian network
P(AB1, ..., Bn)
Bi directly influences A
44Example
45Example
P(B) 0.001
P(E) 0.002
B E P(A) T T 0.950 T F 0.940 F T 0.290 F F
0.001
A P(J) T 0.90 F 0.05
A P(M) T 0.70 F 0.01
46Complexity of the representation
- Instead of 31 values it is enough to store 10.
- Easy construction of the model
- Less parameters.
- More intuitive parameters.
- Easy reasoning.
47Bayesian networks
- Bayesian network is an acyclic directed graph
which - nodes represent formulas or variables in the
considered domain, - arcs represent dependence relation of variables,
with related probability distributions.
48Bayesian networks
variable A with parent nodes pa(A)
B1,...,Bn conditional probablity table
P(AB1,...,Bn) or P(Apa(A)) if pa(A) ? a
priori probability equals P(A)
49Bayesian networks
pa(A)
P(AB1, B2, ..., Bn)
Event Bi has no predecesors (pa(Bi) ?) a
priori probability P(Bi)
50Local semantics of Bayesian network
- Only direct dependence relations between
variables. - Local conditional probability distribution.
- Assumption about conditional independence of
variables not bounded in the graph.
51Global semantics of bayesian network
Joint probability distribution given implicite.
It can be calculated using the following rule
52Global semantics of bayesian network
Node numbering node index is smaller than
indices of its predecessors.
Finally
Bayesian network is a complete probabilistic
model.
53Global probability distribution
pa(A2)
pa(A1)
A1
P(A2B3, ...Bn)
P(A1B1, ...Bn)
54Global probability distribution
pa(A2)
pa(A1)
A2
A1
55Reasoning in Bayesian networks
- Updating evidence that a hipothesis H is true
given some ecidence E, i.e. defining conditional
probability distribution P(HE). - Two types of reasoning
- probability of a single hipothesis
- probability of all hipothesis.
56Example
John calls (J) and Mary calls (M). What is the
probability that neither burglary nor earthquake
occurred if the alarm rang?
57Example
58Example
59Example
60Example
61Example
62Example
63Example
64Types of reasoning in Bayesian networks
Evidence B occurs and we qould like to update
probability of hipothesis J. Interpretation. Ther
e was a burglary, what is the probability that
John will call?
A
P(JB) P(JA)P(AB) 0.9 0.95 0.86
65Types of reasoning in Bayesian networks
B
P(B) 0.001
Wnioskowanie diagnostyczne We observe J what is
the probability that B is true? Diagnosis. John
calls. What is the probability of a burglary?
B P(A) T 0.95 F 0.01
A
J
A P(J) T 0.90 F 0.05
P(BJ) P(JB)P(B)/P(J) (0,950,90,001)/(0,9
0,05) 0,0009
diagnostic
66Types of reasoning in Bayesian networks
We observe E. What is the probability that B is
true? Alarm rang, so P(BA) 0.376, but if
earthuake is observed as well then P(BA,E) 0.03
A
67Types of reasoning in Bayesian networks
We observe E and J What is the probability of A.
John calls and we know that there was an
earthquake. What is the probability that alarm
rang? P(AJ,E) 0.03
mixed
68Types of reasoning in Bayesian networks
przyczynowe diagnostyczne
miedzy-przyczynowe mieszane
69Multiply connected Bayesian network
70Summary
- Models of uncertainty
- Certainty factor, certainty measure
- Dempster-Shafer theory
- Bayesian networks
- Fuzzy sets
- Raough sets
71Summary
- Bayesian networks represent joint probability
distribution. - Reasoning in multiply connected BN is NP-hard.
- Exponential complexity may be avoided by
- Constructing the net as a polytree
- Transforming a network to a polytree
- Approximate reasoning