CMSC 723: Introduction to Computational Linguistics - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

CMSC 723: Introduction to Computational Linguistics

Description:

... domains: law, business, design, automobile repair, gardening, dating, and so on. ... the dependence of the degree of belief on the agent's experience. ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 72
Provided by: ericg179
Category:

less

Transcript and Presenter's Notes

Title: CMSC 723: Introduction to Computational Linguistics


1
Chapter 13 Uncertainty
2
Outline
  • Acting under Uncertainty
  • Basic Probability Notation
  • The Axioms of probability
  • Inference Using Joint Distributions
  • Independence
  • Bayess Rule and Its Use
  • The Wumpus World Revisited

3
Remember the Logics in General Slide?
  • Ontological Commitment What exists in the world
    TRUTH
  • Epistemological Commitment What an agent
    believes about facts BELIEF

???
???
4
Uncertainty
  • When an agent knows enough facts about its
    environment, the logical approach enables it to
    derive plans that are guaranteed to work.
  • Unfortunately, agents almost never have access to
    the whole truth about their environment.
  • Agents must act under uncertainty.
  • For example a wumpus agent often will find
    itself unable to discover which of two squares
    contains a pit. If those squares are en route to
    the gold, then the agent might have to take a
    chance and enter one of the two squares.

5
Uncertainty
  • Let action At leave for airport t minutes
    before flight. Will At get me there on time?
  • Problems
  • partial observability (road state, other drivers
    plans, etc.)
  • noisy sensors (traffic reports, etc.)
  • uncertainty in outcomes (flat tire, etc.)
  • immense complexity modeling and predicting traffic

6
Can we take a purely logical approach?
  • Risks falsehood A25 will get me there on time
  • Leads to conclusions that are too weak for
    decision making
  • A25 will get me there on time if there is no
    accident on the bridge and it doesnt rain and my
    tires remain intact, etc.
  • A1440 might reasonably be said to get me there on
    time but Id have to stay overnight at the
    airport!
  • A90 may be good
  • Logic represents uncertainty by disjunction but
    cannot tell us how likely the different
    conditions are.

7
Methods for Handling Uncertainty
  • Default or nonmontonic logic
  • Assume my car does not have a flat tire
  • Assume A25 works unless contradicted by evidence
  • Issues What assumptions are reasonable? How to
    handle contradictions?
  • Logic Rules with fudge factors
  • A25 ?0.3 get there on time
  • Sprinkler(???)? 0.99 WetGrass
  • WetGrass ? 0.7 Rain
  • Issues Problems with combination, e.g.,
    Sprinkler causes Rain?

8
Uncertainty
  • The information the agent has cannot guarantee
    any of these outcomes for A90, but it can provide
    some degree of belief that they will be achieved.
  • Other plans, such as Al20, might increase the
    agent's belief that it will get to the airport on
    time, but also increase the likelihood of a long
    wait.
  • The right thing to dothe rational
    decisiontherefore depends on both the relative
    importance of various goals and the likelihood
    that, and degree to which, they will be achieved.

9
Handling Uncertainty knowledge
10
An Alternative Use Probability
  • Given the available evidence, A25 will get me
    there on time with probability 0.04
  • Probabilistic assertions summarize the effects of
  • Laziness(??) too much work to list the complete
    set of antecedents or consequents to ensure no
    exceptions
  • Theoretical ignorance medical science has no
    complete theory for the domain
  • Practical ignorance Even if we know all the
    rules, we might be uncertain about a particular
    patient

11
Degree of Belief
  • The connection between toothaches and cavities is
    just not a logical consequence in either
    direction.
  • This is typical of the medical domain, as well as
    most other judgmental domains law, business,
    design, automobile repair, gardening, dating, and
    so on.
  • The agent's knowledge can at best provide only a
    degree of belief in the relevant sentences.
  • Our main tool for dealing with degrees of belief
    will be probability theory, which assigns to each
    sentence a numerical degree of belief between 0
    and 1.

12
Probability
  • Probability provides a way of summarizing the
    uncertainty that comes from our laziness and
    ignorance.
  • We might not know for sure what afflicts a
    particular patient, but we believe that there is,
    say, an 80 chancethat is, a probability of
    0.8that the patient has a cavity if he or she
    has a toothache.
  • That is, we expect that out of all the situations
    that are in distinguishable from the current
    situation as far as the agent's knowledge goes,
    the patient will have a cavity in 80 of them.
  • This belief could be derived from
  • statistical
  • general rules, or
  • a combination of evidence sources.
  • The missing 20 summarizes all the other possible
    causes of toothache that we are too lazy or
    ignorant to confirm or deny.

13
Probability
  • Assigning a probability of 0 to a given sentence
    corresponds to an unequivocal belief that the
    sentence is false,
  • Assigning a probability of 1 corresponds to an
    unequivocal belief that the sentence is true.
  • Probabilities between 0 and 1 correspond to
    intermediate degrees of belief in the truth of
    the sentence.
  • The sentence itself is in fact either true or
    false.
  • It is important to note that a degree of belief
    is different from a degree of truth.
  • A probability of 0.8 does not mean "80 true" but
    rather an 80 degree of beliefthat is, a fairly
    strong expectation.
  • Thus, probability theory makes the same
    ontological commitment as logicnamely, that
    facts either do or do not hold in the world.
  • Degree of truth, as opposed to degree of belief,
    is the subject of fuzzy logic, which is covered
    in Section 14.7.

14
Probability
  • All probability statements must therefore
    indicate the evidence with respect to which the
    probability is being assessed.
  • As the agent receives new percepts, its
    probability assessments are updated to reflect
    the new evidence.
  • Before the evidence is obtained, we talk about
    prior or unconditional probability after the
    evidence is obtained, we talk about posterior or
    conditional probability.
  • In most cases, an agent will have some evidence
    from its percepts and will be interested in
    computing the posterior probabilities of the
    outcomes it cares about.

15
Uncertainty and rational decisions
  • The presence of uncertainty radically changes the
    way an agent makes decisions.
  • A logical agent typically has a goal and executes
    any plan that is guaranteed to achieve it. An
    action can be selected or rejected on the basis
    of whether it achieves the goal, regardless of
    what other actions might achieve.
  • When uncertainty enters the picture, this is no
    longer the case.
  • Consider again the A90 plan for getting to the
    airport. Suppose it has a 95 chance of
    succeeding.
  • Does this mean it is a rational choice? Not
    necessarily There might be other plans, such as
    Al20, with higher probabilities of success.
  • What about A1440? In most circumstances, this is
    not a good choice, because, although it almost
    guarantees getting there on time, it involves an
    intolerable wait.

16
PREFERENCES
  • To make such choices, an agent must first have
    preferences between the different possible
    outcomes of the various plans.
  • A particular outcome is a completely specified
    state, including such factors as whether the
    agent arrives on time and the length of the wait
    at the airport.
  • We will be using utility theory to represent and
    reason with preferences.
  • Utility theory says that every state has a degree
    of usefulness, or utility, to an agent and that
    the agent will prefer states with higher utility.

17
Examples
  • Suppose I believe the following
  • P(A25 gets me there on time) 0.04
  • P(A90 gets me there on time) 0.70
  • P(A120 gets me there on time) 0.95
  • P(A1440 gets me there on time) 0.9999
  • Which do I choose? Depends on my preferences
    for missing flight vs. airport cuisine, etc.

18
Decision theory
  • Preferences, as expressed by utilities, are
    combined with probabilities in the general theory
    of rational decisions called decision theory
  • Decision theory probability theory utility
    theory .
  • A agent is rational if and only if it chooses the
    action that yields the highest expected utility,
    averaged over all the possible outcomes of the
    action. This is called the principle of Maximum
    Expected Utility (MEU).

19
Design for a decision-theoretic agent
20
13.2 Basic Probability Notation
  • We will need a formal language for representing
    and reasoning with uncertain knowledge.
  • Any notation for describing degrees of belief
    must be able to deal with two main issues
  • the nature of the sentences to which degrees of
    belief are assigned and
  • the dependence of the degree of belief on the
    agent's experience.
  • The version of probability theory we present uses
    an extension of propositional

21
Random Variables
  • A random variable is a function that takes
    discrete values from a countable domain and maps
    them to a number between 0 and 1
  • Example Weather is a discrete (propositional)
    random variable that has domain ltsunny, rain,
    cloudy, snowgt.
  • sunny is an abbreviation for Weather sunny
  • P(Weathersunny)0.72, P(Weatherrain)0.1, etc.
  • Can be written P(sunny)0.72, P(rain)0.1, etc.
  • Other types of random variables
  • Boolean random variable has the domain
    lttrue,falsegt, e.g., Cavity (special case of
    discrete random variable)
  • Continuous random variable as the domain of real
    numbers, e.g., Temp

22
Atomic Events
  • An atomic event is a complete specification of
    the state of the world about which the agent is
    uncertain. It can be thought of as an assignment
    of particular values to all the variables of
    which the world is composed.
  • For example, if my world consists of only the
    Boolean variables Cavity and Toothache, then
    there are just four distinct atomic events
  • Cavity false ? Toothache true
  • Cavity false ? Toothache false
  • Cavity true ? Toothache true
  • Cavity true ? Toothache false

23
Properties of Atomic events
  • They are mutually exclusiveat most one can
    actually be the case.
  • For example, cavity ? toothache and cavity ?
    ?itoothache cannot both be the case.
  • The set of all possible atomic events is
    exhaustiveat least one must be the case. That
    is, the disjunction of all atomic events is
    logically equivalent to true.
  • Any particular atomic event entails the truth or
    falsehood of every proposition, whether simple or
    complex. This can be seen by using the standard
    semantics for logical connectives.
  • For example, the atomic event cavity ? ?
    toothache entails the truth of cavity and the
    falsehood of cavity ?toothache.
  • Any proposition is logically equivalent to the
    disjunction of all atomic events that entail the
    truth of the proposition.
  • For example, the proposition cavity is equivalent
    to disjunction of the atomic events cavity ?
    toothache and cavity ? ? toothache.

24
Prior Probability
  • Prior (unconditional) probability corresponds to
    belief prior to arrival of any (new) evidence
  • P(sunny)0.72, P(rain)0.1, etc.
  • Vector notation Weather is one of lt0.7, 0.2,
    0.08, 0.02gt
  • P( Weather sunny) 0.7
  • P( Weather rain) 0.2
  • P( Weather cloudy) 0.08
  • P( Weather snow) 0.02 .
  • This statement defines a prior probability
    distribution for the random variable Weather
    (sums to 1 over the domain)

25
Atomic Events and the Universe
  • An atomic event is a complete specification of
    the state of the world about which the agent is
    uncertain (like interpretations in logic).
  • If the world consists of the Boolean variables
    Cavity and Toothache, there are four distinct
    atomic events, one of which is Cavityfalse Æ
    Toothachetrue
  • The universe consists of atomic events
  • An event is a set of atomic events
  • P events ! 0,1
  • P(true) 1 P(U)
  • P(false) 0 P()
  • P(A ? B) P(A) P(B) P(A n B)

26
Joint Distribution
  • Probability assignment to all combinations of
    values of random variables
  • The sum of the entries in this table has to be 1
  • Given this table, we can answer all probability
    questions about this domain
  • Probability of a proposition is the sum of the
    probabilities of atomic events in which it holds
  • P(cavity) 0.1 add elements of cavity row
  • P(toothache) 0.05 add elements of toothache
    column

27
Conditional Probability
  • P(cavity)0.1 and P(cavity n toothache)0.04 are
    both prior (unconditional) probabilities
  • Once the agent has new evidence concerning a
    previously unknown random variable, e.g.,
    toothache, we can specify a posterior
    (conditional) probability, e.g., P(cavity
    toothache)
  • P(A B) P(A n B)/P(B) prob of A w/ U limited
    to B
  • P(cavity toothache) 0.04/0.05 0.8

28
Conditional Probability (continued)
  • Definition of Conditional Probability P(A B)
    P(A n B)/P(B)
  • Product rule gives an alternative formulation
    P(A n B) P(A B) P(B) P(B A) P(A)
  • A general version holds for whole
    distributionsP(Weather, Cavity) P( Weather
    Cavity) P( Cavity)

29
(No Transcript)
30
13.3 The Axioms of probability
  • All probabilities are between 0 and 1. For any
    proposition a, 0ltP(a)lt1.
  • Necessarily true (i.e., valid) propositions have
    probability 1, and necessarily false (i.e.,
    unsatisfiable) propositions have probability 0.
  • P(true) 1 P(false) 0 .
  • The probability of a disjunction is given by
  • P(a ? b) P(a) P(b) - P(a ? b) .

31
Using the axioms of probability
32
Probability axioms
  • Recall that any proposition a is equivalent to
    the disjunction of all the atomic events in which
    a holds call this set of events e(a).
  • Recall also that atomic events are mutually
    exclusive, so the probability of any conjunction
    of atomic events is zero, by axiom 2. Hence, from
    axiom 3, we can derive the following simple
    relationship
  • The probability of a proposition is equal to the
    sum of the probabilities of the atomic events in
    which it holds that is,
  • This equation provides a simple method for
    computing the probability of any proposition,
    given a full joint distribution that specifies
    the probabilities of all atomic events.

33
Why the axioms of prob. Are reasonable
  • The axioms of probability can be seen as
    restricting the set of probabilistic beliefs that
    an agent can hold.
  • This is somewhat analogous to the logical case,
    where a logical agent cannot simultaneously
    believe A, B, and ?(A ? B), for example.
  • In the logical case, the semantic definition of
    conjunction means that at least one of the three
    beliefs just mentioned must be false in the
    world, so it is unreasonable for an agent to
    believe all three.
  • With probabilities, on the other hand, statements
    refer not to the world directly, but to the
    agent's own state of knowledge.
  • Why, then, can an agent not hold the following
    set of beliefs, which clearly violates axiom 3?
  • P(a)0.4 P(a?b)0.0
  • P(b) 0.3 P(a ? b) 0.8

34
de Finettis augument
  • Axioms of probability
  • imply that certain logically related events must
    have related probabilities
  • restrict the set of probabilistic beliefs that an
    agent can hold
  • Example P(A ? B) P(A) P(B) P(An B)
  • de Finetti (1931) proved An agent who bets
    according to probabilities that violate these
    axioms can be forced to bet so as to lose money
    regardless of the outcome
  • Lets look at an example

35
Example
36
13.4 Probabilistic Inference
  • Probabilistic inference is the computation from
    observed evidence of posterior probabilities for
    query propositions.
  • We use the full joint distribution as the
    knowledge base form which answers to questions
    may be derived.
  • Probabilities in joint distribution sum to 1

37
Probabilistic Inference (cont)
  • Probability of any proposition computed by
    finding atomic events where proposition is true
    and adding their probabilities
  • P(cavity ? toothache) 0.108 0.012 0.072
    0.008 0.016 0.064 0.28
  • P(cavity) 0.108 0.012 0.072 0.008 0.2
  • P(cavity) is called a marginal probability and
    the process of computing this is called
    marginalization

38
Probabilistic Inference (cont.)
  • Can also compute conditional probabilities.
  • P(? cavity toothache) P(?cavity?
    toothache)/P(toothache) (0.016 0.064) /
    (0.108 0.012 0.016 0.064) 0.4
  • Denominator is viewed as a normalization
    constant Stays constant no matter what the value
    of Cavity is. (Book uses a to denote
    normalization constant 1/P(X), for random
    variable X.)

39
(No Transcript)
40
(No Transcript)
41
Enumerate-Joint-Ask
42
13.5 Independence
  • How are P(toothache, catch, cavity, Weather
    cloudy) and P(toothache, catch, cavity) related?

43
Independence
44
(No Transcript)
45
Independence
  • A and B are independent iff
  • P(A ? B) P(A) P(B)
  • P(A B) P(A)
  • P(B A) P(B)
  • Independence is essential for efficient
    probabilistic reasoning

Cavity Toothache Xray
Cavity Toothache Xray Weather
decomposes into
Weather
P(T, X, C, W) P(T, X, C) P(W)
46
13.6 Bayes' Rule
47
Bayess Rule
  • Product rule P(a?b) P(a b) P(b) P(b a)
    P(a)
  • ? Bayes' rule P(a b) P(b a) P(a) / P(b)
  • or in distribution form
  • P(YX) P(XY) P(Y) / P(X) aP(XY) P(Y)
  • Useful for assessing diagnostic probability from
    causal probability
  • P(CauseEffect) P(EffectCause) P(Cause) /
    P(Effect)
  • E.g., let M be meningitis, S be stiff neck
  • P(ms) P(sm) P(m) / P(s) 0.8 0.0001 / 0.1
    0.0008
  • Note posterior probability of meningitis still
    very small!

48
Example 1(simple case)
  • A doctor knows that the disease meningitis(???)
    causes the patient to have a stiff neck(????),
    say, 50 of the time.
  • The prior probability that a patient has
    meningitis is 1/50,000,
  • The prior probability that any patient has a
    stiff neck is 1/20.
  • Letting s be the proposition that the patient has
    a stiff neck and
  • m be the proposition that the patient has
    meningitis, we have

49
(No Transcript)
50
Example 2(combining evidence)
51
Conditional independence
52
(No Transcript)
53
Separates
54
(No Transcript)
55
Bayes' Rule and conditional independence
  • P(Cavity toothache ? catch)
  • aP(toothache ? catch Cavity) P(Cavity)
  • aP(toothache Cavity) P(catch Cavity)
    P(Cavity)
  • This is an example of a naïve Bayes model
  • P(Cause,Effect1, ,Effectn) P(Cause)
    piP(EffectiCause)
  • Total number of parameters is linear in n

56
Conditioning
  • Idea Use conditional probabilities instead of
    joint probabilities
  • P(A) P(A ? B) P(A ? ? B) P(A B)
    P(B) P(A ? B) P(? B)
  • ExampleP(symptom) P(symptomdisease)
    P(disease) P(symptomdisease) P(disease)
  • More generally P(Y) åz P(Yz) P(z)
  • Marginalization and conditioning are useful rules
    for derivations involving probability expressions.

57
Conditional Independence
  • A and B are conditionally independent given C iff
  • P(A B, C) P(A C)
  • P(B A, C) P(B C)
  • P(A ? B C) P(A C) P(B C)
  • Toothache (T), Spot in Xray (X),Cavity (C)
  • None of these propositions are independent of one
    other
  • But T and X are conditionally independent given C

58
Conditional Independence (cont.)
  • If I have a cavity, the probability that the XRay
    shows a spot doesnt depend on whether I have a
    toothache P(XT,C)P(XC)
  • The same independence holds if I havent got a
    cavity P(XT, ? C)P(X?C)
  • Equivalent statementsP(TX,C)P(TC) and
    P(T,XC)P(TC) P(XC)
  • Write out full joint distribution (chain
    rule)P(T,X,C) P(TX,C) P(X,C)
    P(TX,C) P(XC) P(C) P(TC)
    P(XC) P(C)

59
Another Example
  • Battery is dead (B)
  • Radio plays (R)
  • Starter turns over (S)
  • None of these propositions are independent of one
    another
  • R and S are conditionally independent given B

60
Combining Evidence
  • We can do the evidence combination sequentially

61
How do we Compute the Normalizing Constant (a)?
62
13.7 Wumpus World (Revisited)
  • Pij true iff i,j contains pit
  • Bij true iff i,j is breezy
  • Include only B1,1, B1,2, B2,1 in the probability
    model
  • Known facts
  • b b1,1 Æ b1,2 Æ b2,1
  • known p1,1 Æ p1,2 Æ p2,1

Query P(P1,3known,b)
63
(No Transcript)
64
(No Transcript)
65
Wumpus World (cont.)
  • Key insight Observed breezes conditionally
    independent of other variables given the known,
    fringe, and query variables
  • Define UnknownFringe Other
    P(bP1,3,Known,Unknown) P(bP1,3,Known,Fringe
    )
  • Book has details.
  • Bottom line
  • - P(P1,3known,b) ¼ lt0.31,0.69gt
  • - P(P2,2known,b) ¼ lt0.86,0.14gt

Query P(P1,3known,b)
66
(No Transcript)
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
(No Transcript)
71
Some Problems for Fun
  • Prove the following
  • P( A) 1 P(A)
  • P(A Ç B Ç C) P(A) P(B) P(C) P(A Æ B)
    P(A Æ C) P(B Æ C) P(A Æ B Æ C)
  • Show that P(A) P(A,B)
  • Show that P(AB) P(AB) 1
  • Show that the different formulations of
    conditional independence are equivalent
  • P(A B, C) P(A C)
  • P(B A, C) P(B C)
  • P(A Æ B C) P(A C) P(B C)
  • Conditional Bayes rule. Write an expression for
    P(A B,C) in terms of P(B A,C).
Write a Comment
User Comments (0)
About PowerShow.com