Title: Probabilistic Representation and Reasoning
1Probabilistic Representation and Reasoning
- Given a set of facts/beliefs/rules/evidence
- Evaluate a given statement
- Determine the probability of a statement
- Find a statement that optimizes a set of
constraints - Most probable explanation (MPE) (Setting of
hidden variables that best explains observations.)
2Probability Theory
- Random Variables
- Boolean W1,2 (just like propositional logic).
Two possible values true, false - Discrete Weather 2 sunny, cloudy, rainy, snow
- Continuous Temperature 2 lt
- Propositions
- W1,2 true, Weather sunny, Temperature 65
- These can be combined as in propositional logic
- Consider a car described by 3 random variables
- Gas 2 true, false There is gas in the tank
- Meter 2 empty,full The gas gauge shows the
tank is empty or full - Starts 2 yes,no The car starts when you turn
the key in the ignition
4Joint Probability Distribution
- Each row is called a primitive event
- Rows are mutually exclusive and exhaustive
- Corresponds to an 8-sided coin with the
indicated probabilities
5Any Query Can Be Answered from the Joint
- P(Gas false Æ Meter full Æ Starts yes)
0.0006 - P(Gas false) 0.2, this is the sum of all
cells where Gas false - In general To compute P(Q), for any proposition
Q, add up the probability in all cells where Q is
- P(G,M,S) denotes the entire joint distribution
(In the book, P is boldface). It is a table or
function that maps from G, M, and S to a
probability. - P(true,empty,no) denotes a single probability
value - P(Gastrue Æ Meterempty Æ Startsno)
7Operations on Probability Tables (1)
- Marginalization (summing away)
- ?M,S P(G,M,S) P(G)
- P(G) is called a marginal probability
distribution. It consists of two probabilities -
8Conditional Probability
- Suppose we observe that Mfull. What is the
probability that the car will start? - P(Syes Mfull)
- Definition P(AB) P(A Æ B) / P(B)
9Conditional Probability
- Select cells that match the condition (Mfull)
- Delete remaining cells and M column
- Renormalize the table to obtain P(S,GMfull)
- Sum away Gas ?G P(S,G Mfull)
P(SMfull) - Read answer from P(Syes Mfull) cell
10Operations on Probability Tables
- Construct P(G,S M) by normalizing the subtable
corresponding to Mfull and normalizing the
subtable corresponding to Mempty
11Chain Rule of Probability
- P(A,B,C) P(AB,C) P(BC) P(C)
- Proof
12Chain Rule (2)
- Holds for distributions too
- P(A,B,C) P(A B,C) P(B C) P(C)
- This means that for each setting of A,B, and C,
we can substitute into the equation, and it is
13Belief Networks (1)Independence
- Defn Two random variables X and Y are
independent iff - P(X,Y) P(X) P(Y)
- Example
- X is a coin with P(Xheads) 0.4
- Y is a coin with P(Yheads) 0.8
- Joint distribution
14Belief Networks (2)Conditional Independence
- Defn Two random variables X and Y are
conditionally independent given Z iff - P(X,Y Z) P(XZ) P(YZ)
- Example
- P(S,M G) P(S G) P(M G)
- Intuition G independently causes S and M
15Operations on Probability Tables (3)Conformal
- Allocate space for resulting table and then fill
in each cell with the product of the
corresponding cells - P(S,M G) P(S G) P(M G)
16Properties of Conformal Products
- Commutative
- Associative
- Work on normalized or unnormalized tables
- Work on joint or conditional tables
17Conditional Independence Allows Us to Simplify
the Joint Distribution
- P(G,M,S) P(M,S G) P(G) chain rule
- P(M G) P(S G) P(G) CI
18Bayesian Networks
- One node for each random variable
- Each node stores a probability distribution
P(node parents(node)) - Only direct dependencies are shown
- Joint distribution is conformal product of node
distributions - P(G,M,S) P(G) P(M G) P(S G)
19Inference in Bayesian Networks
- Suppose we observe that Mfull. What is the
probability that the car will start? - P(Syes Mfull)
- Before, we handled this by the following steps
- Remove all rows corresponding to Mempty
- Normalize remaining rows to get P(S,GMfull)
- Sum over G ?G P(S,GMfull) P(S Mfull)
- Read answer from the Syes entry in the table
- We want to get the same result, but without
constructing the joint distribution first.
20Inference in Bayesian Networks (2)
- Remove all rows corresponding to Mempty from all
nodes - P(G) unchanged
- P(M G) becomes PG
- P(S G) unchanged
- Sum over G ?G P(G) PG P(S G)
- Normalize to get P(SMfull)
- Read answer from the Syes entry in the table
21Inference with Tables
22Inference with Tables
Step 1 Delete Mempty rows from all tables
23Inference with Tables
Step 1 Delete Mempty rows from all tables
Step 2 Perform algebra to push summation inwards
(no-op in this case)
24Inference with Tables
Step 1 Delete Mempty rows from all tables
Step 2 Perform algebra to push summation inwards
(no-op in this case)
Step 3 Form conformal product
25Inference with Tables
Step 1 Delete Mempty rows from all tables
Step 2 Perform algebra to push summation inwards
(no-op in this case)
Step 3 Form conformal product
Step 4 Sum away G
26Inference with Tables
Step 1 Delete Mempty rows from all tables
Step 2 Perform algebra to push summation inwards
(no-op in this case)
Step 3 Form conformal product
Step 4 Sum away G
Step 5 Normalize
27Inference with Tables
Step 1 Delete Mempty rows from all tables
Step 2 Perform algebra to push summation inwards
(no-op in this case)
Step 3 Form conformal product
Step 4 Sum away G
Step 5 Normalize
Step 6 Read answer from table 0.6469
- We never created the joint distribution
- Deleting the Mempty rows from the individual
table followed by conformal product has the same
effect as performing the conformal product first
and then deleting the Mempty rows - Normalization can be postponed to the end
29Another Example Asia(all variables Boolean)
- Suppose we observe Sneeze
- What is P(Cold Sneeze) P(CoS)?
30Answering the query
- Joint distribution
- ?A,Ca,Sc P(Co) P(A) P(Sn Co,A)
P(Ca) P(A Ca) P(Sc Ca) - Apply evidence sn (Sneeze true)
- ?A,Ca,Sc P(Co) P(A) PCo,A P(Ca) P(A
Ca) P(Sc Ca) - Push summations in as far as possible
- P(Co) ?A P(A) PCo,A ?Ca P(A Ca) P(Ca)
?Sc P(Sc Ca) - Evaluate
- P(Co) ?A P(A) PCo,A ?Ca P(A Ca) P(Ca)
PCa - P(Co) ?A P(A) PCo,A PA
- P(Co) PCo
- PCo
- Normalize and extract answer
31Pruning Leaves
- Leaf nodes not involved in the evidence or the
query can be pruned. - Example Scratch
32Greedy algorithm for choosing the elimination
- nodes set of tables (after evidence)
- V variables to sum over
- while nodes gt 1 do
- Generate all pairs of tables in nodes that share
at least one variable - Compute size of table that would result from
conformal product of each pair (summing over as
many variables in V as possible) - Let (T1,T2) be the pair with smallest resulting
size - Delete T1 and T2 from nodes
- Add conformal product ?V T1T2 to nodes
- end
33Example of Greedy Algorithm
- Given tables P(Co), PCo,A, P(ACa), P(Ca)
- Variables to sum A, Ca
- Choose PA ?Ca P(ACa) P(Ca)
34Example of Greedy Algorithm (2)
- Given tables P(Co), PCo,A, PA
- Variables to sum A
- Choose PCo ?A PCo,A PA
35Example of Greedy Algorithm (3)
- Given tables P(Co), PCo
- Variables to sum none
- Choose P2Co P(Co) PCo
- Normalize and extract answer
36Bayesian Network For WUMPUS
- P(P1,1,P1,2, , P4,4, B1,1, B1,2, , B4,4)
37Probabilistic Inference in WUMPUS
- Suppose we have observed
- No breeze in 1,1
- Breeze in 1,2 and 2,1
- No pit in 1,1, 1,2, and 1,3
- What is the probability of a pit in 1,3?
- P(P1,3B1,1,B1,2,B2,1, P1,1,P1,2,P2,1)
38What isP(P1,3B1,1,B1,2,B2,1,
39Prune Leaves Not Involved in Query or Evidence
40Prune Independent Nodes
41Solve Remaining Network
?P2,2,P3,1 P(B1,1P1,1,P1,2,P2,1)
P(B1,2P1,1,P1,2,P1,3) P(B2,1P1,1,P2,1,P2,2,P3,
1) P(P1,1) P(P1,2) P(P2,1) P(P2,2)
P(P1,3) P(P3,1)
42Performing the Inference
NORM ?P2,2,P3,1 P(B1,1P1,1,P1,2,P2,1)
P(B1,2P1,1,P1,2,P1,3) P(B2,1P1,1,P2,1,P2,2,P3,
1) P(P1,1) P(P1,2) P(P2,1) P(P2,2)
P(P1,3) P(P3,1)
NORM ?P2,2,P3,1 PP1,3 PP2,2,P3,1 P(P2,2)
P(P1,3) P(P3,1)
NORM PP1,3 P(P1,3) ?P2,2 P(P2,2) ?P3,1
PP2,2,P3,1 P(P3,1)
P(P1,3) h0.69, 0.31i 31 chance of WUMPUS!
We have reduced the inference to a simple
computation over 2x2 tables.
- The Joint Distribution is analogous to the truth
table for propositional logic. It exponentially
large, but any query can be answered using it - Conditional independence allows us to factor the
joint distribution using conformal products - Conditional independence relationships are
conveniently visualized and encoded in a belief
network DAG - Given evidence, we can reason efficiently by
algebraic manipulation of the factored