Learning With Bayesian Networks - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Learning With Bayesian Networks

Description:

Parameter Learning. Many models have uncertain variables that must be estimated from data. ... Given network structure Sh and parameter vector ?s ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 29

Provided by: csCol6

Category:

more less

Transcript and Presenter's Notes

Title: Learning With Bayesian Networks

1
Learning With Bayesian Networks
2
Parameter Learning

Many models have uncertain variables that must be
estimated from data.
e.g., Mozer ordinal category model category
prototypes
e.g., thumbtack bias in thumbtack flipping
example

3
ALL SLIDES STOLEN FROM DAVID HECKERMAN TUTORIAL
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
background knowledge
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
NO
20
Learning Probabilities in a Bayes Net

Fix network structure
Use data to update the probabilities
As in thumbtack example, probabilities are
variables
Given network structure Sh and parameter vector
?s
Given random sample D x1, x2, ..., xN,
compute the posterior distribution p(?s D, Sh)
Probabilistic formulation of all supervised and
unsupervised learning problems.

21
Local Distribution Functions

Local distribution function is a probabilistic
classification or regression function.
e.g., linear regression, neural net, SVM, random
forest
Consider unrestricted multinomial distribution
each variable Xi is discrete, with values xi1,
... xirI
i index over nodes of graphj index over
configurations of the parents of node ik index
over values of node i
unrestricted means one parameter per probability,
vs. low-dimensional functions of paij

LOCAL DISTRIBUTION FUNCTION
22
Computing Posterior DistributionOver Parameters

Assume (1) no missing data, and (2) parameter
vectors are mutually independent.
E.g., net structure X?Y

23
Simplifying Learning

Given complete data and
Explanation
Given complete data, each setof parameters is
disconnected fromeach other set of parameters in
the graph

?x
X
?x
?x
Y
?x
24
Prediction

Given prior distribution
Posterior distribution is
Prediction with
How can this be used for supervised learning?

25
Missing Data

Y observed variablesZ unobserved variables
How do we do parameter updates in this case?
Dirichlet mixture
unless Xi and Pai are observed in Y, each case
increases number of mixture components

dirichlet
26
Gibbs Samplingto Handle Missing Data

1. Given a set of observed incomplete data,D
y1, ..., yN
2. Fill in arbitrary values for unobserved
variables for each case
3. For each unobserved variable xi in case l,
sample
4. evaluate posterior density on complete data
Dc'
5. repeat steps 3 and 4, and compute mean of
posterior density

27
Gaussian Approximationto Handle Missing Data

Approximateas a multivariate Gaussian.
Appropriate if sample size is large, which is
also the case when Monte Carlo is inefficient
1. define
2. find the configuration that maximizes g(.),
3. approximate using 2nd degree Taylor
polynomial
4. leads to approximate result that is Gaussian

negative Hessian of g(.) eval at
28
Further Approximations

As the data sample size increases,
Gaussian peak becomes sharper, so can make
predictions based on the MAP configuration
can ignore priors (diminishing importance)
How to do MAP estimation
gradient ascent
EM
E step compute expected values of missing data
M step maximize parameters given complete data Dc

Write a Comment

User Comments (0)