Title: Bayesian Networks and Markov Models: User Modeling and Natural Language Processing
1Bayesian Networks and Markov Models User
Modeling and Natural Language Processing
- Bayesian networks and Markov models
- Applications in User Modeling and Natural
Language Processing
2Bayesian Networks and Markov Models
- Bayesian AI
- Bayesian networks
- Decision networks
- Reasoning about changes over time
- Dynamic Bayesian Networks
- Markov models
3Introduction to Bayesian AI
- Reasoning under uncertainty
- Probabilities
- Bayesian approach
- Bayes Theorem conditionalization
- Bayesian decision theory
4Reasoning under Uncertainty
- Uncertainty the quality or state of being not
clearly known - distinguishes deductive knowledge from inductive
belief - Sources of uncertainty
- Ignorance
- Complexity
- Physical randomness
- Vagueness
5Probability Calculus
- Classic approach to reasoning under uncertainty
(origin Pascal and Fermat) - Kolmogorovs axioms
- Conditional probability
- Independence
6Rev. Thomas Bayes (1702-1761)
7Bayes Theorem Conditionalization
- Due to Rev. Thomas Bayes (1764)Conditionalizati
on - Also read as
- Assumptions
- Joint priors over hi and e exist
- Total evidence e is observed
8Example Breast Cancer
- Let Pr(h)0.01, Pr(eh)0.8 and Pr(eh)0.1
- Bayes theorem yields
9Bayesian Decision Theory
- Frank Ramsey (1926)
- Decision making under uncertainty what action
to take when the state of the world is unknown - Bayesian answer Find the utility of each
possible outcome (action-state pair), and take
the action that maximizes expected utility
10Bayesian Decision Theory Example
- Expected utilities
- E(Take umbrella) 30?0.410?0.618
- E(Leave umbrella) -100?0.450?0.6-10
11Bayesian Conception of an AI
- An autonomous agent that
- has a utility structure (preferences)
- can learn about its world and the relationship
(probabilities) between its actions and future
states maximizes its expected utility - The techniques used to learn about the world are
mainly statistical?Data mining
12Bayesian Networks and Markov Models
- Bayesian AI
- Bayesian networks
- Decision networks
- Reasoning about changes over time
- Dynamic Bayesian Networks
- Markov models
13Bayesian Networks (BNs) Overview
- Introduction to BNs
- Nodes, structure and probabilities
- Reasoning with BNs
- Understanding BNs
- Extensions of BNs
- Decision Networks
- Dynamic Bayesian Networks (DBNs)
14Bayesian Networks
- A data structure that represents the dependence
between variables - Gives a concise specification of the joint
probability distribution - A Bayesian Network is a directed acyclic graph
(DAG) in which the following holds - A set of random variables makes up the nodes in
the network - A set of directed links connects pairs of nodes
- Each node has a probability distribution that
quantifies the effects of its parents
15Example Lung Cancer Diagnosis
- A patient has been suffering from shortness of
breath (called dyspnoea) and visits the doctor,
worried that he has lung cancer. - The doctor knows that other diseases, such as
tuberculosis and bronchitis are possible causes,
as well as lung cancer. She also knows that other
relevant information includes whether or not the
patient is a smoker (increasing the chances of
cancer and bronchitis) and what sort of air
pollution he has been exposed to. A positive Xray
would indicate either TB or lung cancer.
16Nodes and Values
- Q What are the nodes to represent and what
values can they take? - A Nodes can be discrete or continuous
- Boolean nodes represent propositions taking
binary valuesExample Cancer node represents
proposition the patient has cancer - Ordered valuesExample Pollution node with
values low, medium, high - Integral valuesExample Age with possible values
1-120
17Lung Cancer Example Nodes and Values
18Lung Cancer Example Network Structure
Pollution
Smoker
Cancer
Xray
Dyspnoea
19Conditional Probability Tables (CPTs)
- After specifying topology, must specify
- the CPT for each discrete node
- Each row contains the conditional probability of
each node value for each possible combination of
values in its parent nodes - Each row must sum to 1
- A CPT for a Boolean variable with n Boolean
parents contains 2n1 probabilities - A node with no parents has one row (its prior
probabilities)
20Lung Cancer Example CPTs
21The Markov Property
- Modeling with BNs requires assuming the Markov
Property - There are no direct dependencies in the system
being modelled which are not already explicitly
shown via arcs - Example smoking can influence dyspnoea only
through causing cancer
22Reasoning with Bayesian Networks
- Basic task for any probabilistic inference
systemCompute the posterior probability
distribution for a set of query variables, given
new information about some evidence variables - Also called conditioning or belief updating or
inference
23Types of Reasoning
24Reasoning with Numbers Using Netica software
25Understanding Bayesian Networks
- A (more compact) representation of the joint
probability distribution - understand how to construct a network
- Encoding of a collection of conditional
independence statements - understand how to design inference procedures
- via Markov propertyEach conditional
independence implied by the graph is present in
the probability distribution
26Representing the Joint Probability Distribution
Example
27Conditional Independence
- The relationship between conditional independence
and BN structure is important for understanding
how BNs work
28Conditional Independence Causal Chains
- Causal chains give rise to conditional
independenceExample Smoking causes cancer,
which causes dyspnoea
29Conditional Independence Common Causes
- Common Causes (or ancestors) also give rise to
conditional independenceExample Cancer
is a common cause of the two symptoms a positive
Xray and dyspnoea
B
A
C
30Conditional Dependence Common Effects
- Common effects (or their descendants) give rise
to conditional dependenceExample Cancer
is a common effect of pollution and
smokingGiven cancer, smoking explains away
pollution
A
C
B
31D-separation
- Graphical criterion of conditional independence
- We can determine whether a set of nodes X is
independent of another set Y, given a set of
evidence nodes E, via the Markov property - If every undirected path from a node in X to a
node in Y is d-separated by E, then X and Y are
conditionally independent given E
32Determining D-separation
- A set of nodes E d-separates two sets of nodes X
and Y, if every undirected path from a node in X
to a node in Y is blocked given E - A path is blocked given a set of nodes E, if
there is a node Z on the path for which one of
three conditions holds - Z is in E and Z has one arrow on the path leading
in and one arrow out (chain) - Z is in E and Z has both path arrows leading out
(common cause) - Neither Z nor any descendant of Z is in E, and
both path arrows lead into Z (common effect)
33Determining D-separation (cont)
Chain Common cause Common effect
34Bayesian Networks Summary
- Bayes rule allows unknown probabilities to be
computed from known ones - Conditional independence (due to causal
relationships) allows efficient updating - BNs are a natural way to represent conditional
independence info - qualitative links between nodes
- quantitative conditional probability tables
(CPTs) - BN inference
- computes the probability of query variables given
evidence variables - is flexible we can enter evidence about any
node and update beliefs in other nodes
35Bayesian Networks and Markov Models
- Bayesian AI
- Bayesian networks
- Decision networks
- Reasoning about changes over time
- Dynamic Bayesian Networks
- Markov models
36Decision Networks
- Extension of BNs to support making decisions
- Utility theory represents preferences between
different outcomes of various plans - Decision theory Utility theory Probability
theory
37Expected Utility
- E available evidence
- A a non-deterministic action
- Oi a possible outcome state
- U utility
38Decision Networks
- A Decision network represents
- information about
- the agents current state
- its possible actions
- the state that will result from theagent's
action - the utility of that state
- Also called, Influence Diagrams
- (Howard Matheson, 1981)
39Types of Nodes
- Chance nodes (ovals) random variables (same as
BNs) - Have an associated CPT
- Parents can be decision nodes and other chance
nodes - Decision nodes (rectangles) points where the
decision maker has a choice of actions - Utility nodes (Value nodes) (diamonds) the
agent's utility function - Have an associated table representing a
multi-attribute utility function - Parents are variables describing the outcome
states that directly affect utility
40Types of Links
- Informational Links indicate when a chance node
needs to be observed before a decision is made - Conditioning links indicate the variables on
which the probability assignment to a chance node
will be conditioned
41Fever Problem Description
- Suppose that you know that a fever can be caused
by the flu. You can use a thermometer, which is
fairly reliable, to test whether or not you have
a fever. Suppose you also know that if you take
aspirin it will almost certainly lower a fever to
normal. Some people (about 5 of the population)
have a negative reaction to aspirin. You'll be
happy to get rid of your fever, so long as you
don't suffer an adverse reaction if you take
aspirin.
42Fever Decision Network
43Fever Decision Table
44Bayesian Networks and Markov Models
- Bayesian AI
- Bayesian networks
- Decision networks
- Reasoning about changes over time
- Dynamic Bayesian networks
- Markov models
45Dynamic Bayesian Networks (DBNs)
- One node for each variable for each time step
- Intra-slice arcs XiT? XjT
- Inter-slice (temporal) arcs
- XiT? XiT1
- XiT? XjT1
46Fever DBN
47DBN Reasoning
- Can calculate distributions for events at time
t1 and further probabilistic projection - Reasoning can be done using standard BN updating
algorithms - This type of DBN gets very large, very quickly
- Usually keep only two time slices of the network
48Dynamic Decision Networks
- Decision networks can be extended to include
temporal aspects - Sequence of decisions taken Plan
49Fever DDN
50Bayesian Networks and Markov Models
- Bayesian AI
- Bayesian networks
- Decision networks
- Reasoning about changes over time
- Dynamic Bayesian networks
- Markov models
51Markov Models Assumptions
- Stationary process a process of change that is
governed by laws that dont change over time - Markov assumption the current state depends
only on a finite history of the previous states - First-order MM
- Second-order MM
52Markov Prediction Models Example
- Observation Sequence of document requests
arriving at a Web siteD1, D2, D3, D2, D1, D4,
D2, D3, - Task Predict the next requested document
- First-order MMCalculate Pr(DiDj)
- Second-order MMD1,D2?D3, D2,D3?D2
,D3,D2?D1, D2,D1?D4, D1,D4?D2,
D4,D2?D3,Calculate Pr(DiDj,Dk)
53Hidden Markov Models (HMMs)
- A HMM is a temporal probabilistic model for a
process, where the state of the process is
described by a single discrete random variable - The possible values of the variable are the
possible states of the world - Additional state variables are added by combining
them into one mega variable
54Hidden Markov Models (cont)
- State transitions in a HMM
- x hidden statesy observable outputsa
transition probabilitiesb output
probabilities
55Hidden Markov Models Example
Raint-1
Raint
Raint1
Umbrellat-1
Umbrellat
Umbrellat1
56Summary (I) Bayesian Networks and Markov Models
- BNs are graphical probabilistic models that
express causal and evidential relations between
propositions - Dynamic Bayesian Networks (DBNs) the BN is
replicated for each time slice - Markov models are graphical probabilistic models
that represent transitions between states
57Summary (II) Static versus Temporal Reasoning
- Static reasoning
- Bayesian networks
- Decision networks
- Temporal reasoning
- Markov models and HMMs
- Dynamic Bayesian networks