Title: Learning Bayesian Networks
1Learning Bayesian Networks
(From David Heckermans tutorial)
2Learning Bayes Nets From Data
Bayes net(s)
data
X1
X2
Bayes-net learner
X3
X4
X5
X6
X7
prior/expert information
X8
X9
3Overview
- Introduction to Bayesian statisticsLearning a
probability - Learning probabilities in a Bayes net
- Learning Bayes-net structure
4Learning Probabilities Classical Approach
Simple case Flipping a thumbtack
True probability q is unknown
Given iid data, estimate q using an estimator
with good properties low bias, low variance,
consistent (e.g., ML estimate)
5Learning Probabilities Bayesian Approach
True probability q is unknown Bayesian
probability density for q
6Bayesian Approach use Bayes' rule to compute a
new density for q given data
prior
likelihood
posterior
7The Likelihood
binomial distribution
8Example Application of Bayes rule to the
observation of a single "heads"
p(qheads)
p(q)
p(headsq) q
q
q
q
0
1
0
1
0
1
prior
likelihood
posterior
9A Bayes net for learning probabilities
10Sufficient statistics
(h,t) are sufficient statistics
11The probability of heads on the next toss
12Prior Distributions for q
- Direct assessment
- Parametric distributions
- Conjugate distributions (for convenience)
- Mixtures of conjugate distributions
13Conjugate Family of Distributions
Beta distribution
Properties
14Intuition
- The hyperparameters ah and at can be thought of
as imaginary counts from our prior experience,
starting from "pure ignorance" - Equivalent sample size ah at
- The larger the equivalent sample size, the more
confident we are about the true probability
15Beta Distributions
Beta(3, 2 )
Beta(1, 1 )
Beta(19, 39 )
Beta(0.5, 0.5 )
16Assessment of a Beta Distribution
Method 1 Equivalent sample - assess ah and
at - assess ahat and ah/(ahat) Method 2
Imagined future samples
17Generalization to m discrete outcomes("multinomia
l distribution")
Dirichlet distribution
Properties
18More generalizations(see, e.g., Bernardo
Smith, 1994)
- Likelihoods from the exponential family
- Binomial
- Multinomial
- Poisson
- Gamma
- Normal
19Overview
- Intro to Bayesian statisticsLearning a
probability - Learning probabilities in a Bayes net
- Learning Bayes-net structure
20From thumbtacks to Bayes nets
Thumbtack problem can be viewed as learning the
probability for a very simple BN
X
heads/tails
21The next simplest Bayes net
22The next simplest Bayes net
?
QY
case 1
Y1
case 2
Y2
YN
case N
23The next simplest Bayes net
"parameter independence"
QY
case 1
Y1
case 2
Y2
YN
case N
24The next simplest Bayes net
"parameter independence"
QY
case 1
Y1
ß
case 2
Y2
two separate thumbtack-like learning problems
YN
case N
25A bit more difficult...
- Three probabilities to learn
- qXheads
- qYheadsXheads
- qYheadsXtails
26A bit more difficult...
QX
QYXheads
QYXtails
heads
X1
Y1
case 1
tails
X2
Y2
case 2
27A bit more difficult...
QX
QYXheads
QYXtails
X1
Y1
case 1
X2
Y2
case 2
28A bit more difficult...
?
?
QX
QYXheads
QYXtails
?
X1
Y1
case 1
X2
Y2
case 2
29A bit more difficult...
QX
QYXheads
QYXtails
X1
Y1
case 1
X2
Y2
case 2
3 separate thumbtack-like problems
30In general
- Learning probabilities in a BN is straightforward
if - Local distributions from the exponential family
(binomial, poisson, gamma, ...) - Parameter independence
- Conjugate priors
- Complete data
31Incomplete data makes parameters dependent
QX
QYXheads
QYXtails
X1
Y1
case 1
X2
Y2
case 2
32Overview
- Intro to Bayesian statisticsLearning a
probability - Learning probabilities in a Bayes net
- Learning Bayes-net structure
33Learning Bayes-net structure
Given data, which model is correct?
X
Y
model 1
X
Y
model 2
34Bayesian approach
Given data, which model is correct? more likely?
X
Y
model 1
Data d
X
Y
model 2
35Bayesian approach Model Averaging
Given data, which model is correct? more likely?
X
Y
model 1
Data d
X
Y
model 2
average predictions
36Bayesian approach Model Selection
Given data, which model is correct? more likely?
X
Y
model 1
Data d
X
Y
model 2
Keep the best model - Explanation -
Understanding - Tractability
37To score a model, use Bayes rule
Given data d
model score
"marginal likelihood"
likelihood
38Thumbtack example
X
heads/tails
conjugate prior
39More complicated graphs
3 separate thumbtack-like learning problems
X
YXheads
YXtails
40Model score for a discrete BN
41Computation of Marginal Likelihood
- Efficient closed form if
- Local distributions from the exponential family
(binomial, poisson, gamma, ...) - Parameter independence
- Conjugate priors
- No missing data (including no hidden variables)
42Practical considerations
- The number of possible BN structures for n
variables is super exponential in n - How do we find the best graph(s)?
- How do we assign structure and parameter priors
to all possible graph?
43Model search
- Finding the BN structure with the highest score
among those structures with at most k parents is
NP hard for kgt1 (Chickering, 1995) - Heuristic methods
- Greedy
- Greedy with restarts
- MCMC methods
44Structure priors
- 1. All possible structures equally likely
- 2. Partial ordering, required / prohibited arcs
- 3. p(m) a similarity(m, prior BN)
45Parameter priors
- All uniform Beta(1,1)
- Use a prior BN
46Parameter priors
- Recall the intuition behind the Beta prior for
the thumbtack - The hyperparameters ah and at can be thought of
as imaginary counts from our prior experience,
starting from "pure ignorance" - Equivalent sample size ah at
- The larger the equivalent sample size, the more
confident we are about the long-run fraction
47Parameter priors
imaginary count for any variable configuration
equivalent sample size
parameter modularity
parameter priors for any BN structure for X1Xn
48Combine user knowledge and data
prior networkequivalent sample size
improved network(s)
data