Learning Bayesian networks - PowerPoint PPT Presentation

About This Presentation
Title:

Learning Bayesian networks

Description:

Learning Bayesian networks Slides by Nir Friedman – PowerPoint PPT presentation

Number of Views:199
Avg rating:3.0/5.0
Slides: 28
Provided by: NirF92
Category:

less

Transcript and Presenter's Notes

Title: Learning Bayesian networks


1
Learning Bayesian networks
  • Slides by Nir Friedman

2
Learning Bayesian networks
Inducer
3
Known Structure -- Incomplete Data
E, B, A ltY,N,Ngt ltY,?,Ygt ltN,N,Ygt ltN,Y,?gt .
. lt?,Y,Ygt
Inducer
  • Network structure is specified
  • Data contains missing values
  • We consider assignments to missing values

4
Known Structure / Complete Data
  • Given a network structure G
  • And choice of parametric family for P(XiPai)
  • Learn parameters for network from complete data
  • Goal
  • Construct a network that is closest to
    probability distribution that generated the data

5
Maximum Likelihood Estimation in Binomial Data
  • Applying the MLE principle we get
  • (Which coincides with what one would expect)

Example (NH,NT ) (3,2) MLE estimate is 3/5
0.6
6
Learning Parameters for a Bayesian Network
  • Training data has the form

7
Learning Parameters for a Bayesian Network
  • Since we assume i.i.d. samples,likelihood
    function is

8
Learning Parameters for a Bayesian Network
  • By definition of network, we get

9
Learning Parameters for a Bayesian Network
  • Rewriting terms, we get

10
General Bayesian Networks
  • Generalizing for any Bayesian network
  • The likelihood decomposes according to the
    structure of the network.

i.i.d. samples
Network factorization
11
General Bayesian Networks (Cont.)
  • Complete Data ? Decomposition
  • ? Independent Estimation Problems
  • If the parameters for each family are not
    related, then they can be estimated independently
    of each other.
  • (Not true in Genetic Linkage analysis).

12
Learning Parameters Summary
  • For multinomial we collect sufficient statistics
    which are simply the counts N (xi,pai)
  • Parameter estimation
  • Bayesian methods also require choice of priors
  • Both MLE and Bayesian are asymptotically
    equivalent and consistent.

13
Known Structure -- Incomplete Data
E, B, A ltY,N,Ngt ltY,?,Ygt ltN,N,Ygt ltN,Y,?gt .
. lt?,Y,Ygt
Inducer
  • Network structure is specified
  • Data contains missing values
  • We consider assignments to missing values

14
Learning Parameters from Incomplete Data
  • Incomplete data
  • Posterior distributions can become interdependent
  • Consequence
  • ML parameters can not be computed separately for
    each multinomial
  • Posterior is not a product of independent
    posteriors

15
Learning Parameters from Incomplete Data (cont.).
  • In the presence of incomplete data, the
    likelihood can have multiple global maxima
  • Example
  • We can rename the values of hidden variable H
  • If H has two values, likelihood has two global
    maxima
  • Similarly, local maxima are also replicated
  • Many hidden variables ? a serious problem

16
MLE from Incomplete Data
  • Finding MLE parameters nonlinear optimization
    problem

L(?D)
?
17
MLE from Incomplete Data
  • Finding MLE parameters nonlinear optimization
    problem

L(?D)
?
18
MLE from Incomplete Data
Both Ideas Find local maxima only. Require
multiple restarts to find approximation to the
global maximum.
19
Gradient Ascent
  • Main result
  • Theorem GA

Requires computation P(xi,paiom,?) for all
i, m Inference replaces taking derivatives.
20
Gradient Ascent (cont)
Proof
21
Gradient Ascent (cont)
  • Since

22
Gradient Ascent (cont)
  • Putting all together we get

23
Expectation Maximization (EM)
  • A general purpose method for learning from
    incomplete data
  • Intuition
  • If we had access to counts, then we can estimate
    parameters
  • However, missing values do not allow to perform
    counts
  • Complete counts using current parameter
    assignment

24
Expectation Maximization (EM)
Expected Counts
Data
P(YHXH,ZT,?) 0.3
Y
Z
X
N (X,Y )
HTHHT
??HTT
TT?TH
X
Y

1.30.41.71.6
Current model
HHTT
HTHT
These numbers are placed for illustration they
have not been computed.
P(YHXT, ZT, ?) 0.4
25
EM (cont.)
Initial network (G,?0)
?
Training Data
26
Expectation Maximization (EM)
  • In practice, EM converges rather quickly at start
    but converges slowly near the (possibly-local)
    maximum.
  • Hence, often EM is used few iterations and then
    Gradient Ascent steps are applied.

27
Final Homework
  • Question 1 Develop an algorithm that given a
    pedigree input, provides the most probably
    haplotype of each individual in the pedigree. Use
    the Bayesian network model of superlink to
    formulate the problem exactly as a query.
    Specify the algorithm at length discussing as
    many details as you can. Analyze its efficiency.
    Devote time to illuminating notation and
    presentation.
  • Question 2 Specialize the formula given in
    Theorem GA for ? in genetic linkage analysis. In
    particular, assume exactly 3 loci Marker 1,
    Disease 2, Marker 3, with ? being the
    recombination between loci 2 and 1 and 0.1- ?
    being the recombination between loci 3 and 2.
  • Specify the formula for a pedigree with two
    parents and two children.
  • Extend the formula for arbitrary pedigrees.
  • Note that ? is the same in many local probability
    tables.
Write a Comment
User Comments (0)
About PowerShow.com