Learning With Bayesian Networks - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Learning With Bayesian Networks

Description:

Parameter Learning. Many models have uncertain variables that must be estimated from data. ... Given network structure Sh and parameter vector ?s ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 29
Provided by: csCol6
Category:

less

Transcript and Presenter's Notes

Title: Learning With Bayesian Networks


1
Learning With Bayesian Networks
2
Parameter Learning
  • Many models have uncertain variables that must be
    estimated from data.
  • e.g., Mozer ordinal category model category
    prototypes
  • e.g., thumbtack bias in thumbtack flipping
    example

3
ALL SLIDES STOLEN FROM DAVID HECKERMAN TUTORIAL
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
background knowledge
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
NO
20
Learning Probabilities in a Bayes Net
  • Fix network structure
  • Use data to update the probabilities
  • As in thumbtack example, probabilities are
    variables
  • Given network structure Sh and parameter vector
    ?s
  • Given random sample D x1, x2, ..., xN,
    compute the posterior distribution p(?s D, Sh)
  • Probabilistic formulation of all supervised and
    unsupervised learning problems.

21
Local Distribution Functions
  • Local distribution function is a probabilistic
    classification or regression function.
  • e.g., linear regression, neural net, SVM, random
    forest
  • Consider unrestricted multinomial distribution
  • each variable Xi is discrete, with values xi1,
    ... xirI
  • i index over nodes of graphj index over
    configurations of the parents of node ik index
    over values of node i
  • unrestricted means one parameter per probability,
    vs. low-dimensional functions of paij

LOCAL DISTRIBUTION FUNCTION
22
Computing Posterior DistributionOver Parameters
  • Assume (1) no missing data, and (2) parameter
    vectors are mutually independent.
  • E.g., net structure X?Y

23
Simplifying Learning
  • Given complete data and
  • Explanation
  • Given complete data, each setof parameters is
    disconnected fromeach other set of parameters in
    the graph

?x
X
?x
?x
Y
?x
24
Prediction
  • Given prior distribution
  • Posterior distribution is
  • Prediction with
  • How can this be used for supervised learning?

25
Missing Data
  • Y observed variablesZ unobserved variables
  • How do we do parameter updates in this case?
  • Dirichlet mixture
  • unless Xi and Pai are observed in Y, each case
    increases number of mixture components

dirichlet
26
Gibbs Samplingto Handle Missing Data
  • 1. Given a set of observed incomplete data,D
    y1, ..., yN
  • 2. Fill in arbitrary values for unobserved
    variables for each case
  • 3. For each unobserved variable xi in case l,
    sample
  • 4. evaluate posterior density on complete data
    Dc'
  • 5. repeat steps 3 and 4, and compute mean of
    posterior density

27
Gaussian Approximationto Handle Missing Data
  • Approximateas a multivariate Gaussian.
  • Appropriate if sample size is large, which is
    also the case when Monte Carlo is inefficient
  • 1. define
  • 2. find the configuration that maximizes g(.),
  • 3. approximate using 2nd degree Taylor
    polynomial
  • 4. leads to approximate result that is Gaussian


negative Hessian of g(.) eval at
28
Further Approximations
  • As the data sample size increases,
  • Gaussian peak becomes sharper, so can make
    predictions based on the MAP configuration
  • can ignore priors (diminishing importance)
  • How to do MAP estimation
  • gradient ascent
  • EM
  • E step compute expected values of missing data
  • M step maximize parameters given complete data Dc
Write a Comment
User Comments (0)
About PowerShow.com