Learning Bayesian networks - PowerPoint PPT Presentation

About This Presentation

Title:

Learning Bayesian networks

Description:

Learning Bayesian networks Slides by Nir Friedman – PowerPoint PPT presentation

Number of Views:202

Avg rating:3.0/5.0

Slides: 28

Provided by: NirF92

Category:

more less

Transcript and Presenter's Notes

Title: Learning Bayesian networks

1
Learning Bayesian networks

Slides by Nir Friedman

2
Learning Bayesian networks
Inducer
3
Known Structure -- Incomplete Data
E, B, A ltY,N,Ngt ltY,?,Ygt ltN,N,Ygt ltN,Y,?gt .
. lt?,Y,Ygt
Inducer

Network structure is specified
Data contains missing values
We consider assignments to missing values

4
Known Structure / Complete Data

Given a network structure G
And choice of parametric family for P(XiPai)
Learn parameters for network from complete data
Goal
Construct a network that is closest to
probability distribution that generated the data

5
Maximum Likelihood Estimation in Binomial Data

Applying the MLE principle we get
(Which coincides with what one would expect)

Example (NH,NT ) (3,2) MLE estimate is 3/5
0.6
6
Learning Parameters for a Bayesian Network

Training data has the form

7
Learning Parameters for a Bayesian Network

Since we assume i.i.d. samples,likelihood
function is

8
Learning Parameters for a Bayesian Network

By definition of network, we get

9
Learning Parameters for a Bayesian Network

Rewriting terms, we get

10
General Bayesian Networks

Generalizing for any Bayesian network
The likelihood decomposes according to the
structure of the network.

i.i.d. samples
Network factorization
11
General Bayesian Networks (Cont.)

Complete Data ? Decomposition
? Independent Estimation Problems
If the parameters for each family are not
related, then they can be estimated independently
of each other.
(Not true in Genetic Linkage analysis).

12
Learning Parameters Summary

For multinomial we collect sufficient statistics
which are simply the counts N (xi,pai)
Parameter estimation
Bayesian methods also require choice of priors
Both MLE and Bayesian are asymptotically
equivalent and consistent.

13
Known Structure -- Incomplete Data
E, B, A ltY,N,Ngt ltY,?,Ygt ltN,N,Ygt ltN,Y,?gt .
. lt?,Y,Ygt
Inducer

Network structure is specified
Data contains missing values
We consider assignments to missing values

14
Learning Parameters from Incomplete Data

Incomplete data
Posterior distributions can become interdependent
Consequence
ML parameters can not be computed separately for
each multinomial
Posterior is not a product of independent
posteriors

15
Learning Parameters from Incomplete Data (cont.).

In the presence of incomplete data, the
likelihood can have multiple global maxima
Example
We can rename the values of hidden variable H
If H has two values, likelihood has two global
maxima
Similarly, local maxima are also replicated
Many hidden variables ? a serious problem

16
MLE from Incomplete Data

Finding MLE parameters nonlinear optimization
problem

L(?D)
?
17
MLE from Incomplete Data

Finding MLE parameters nonlinear optimization
problem

L(?D)
?
18
MLE from Incomplete Data
Both Ideas Find local maxima only. Require
multiple restarts to find approximation to the
global maximum.
19
Gradient Ascent

Main result
Theorem GA

Requires computation P(xi,paiom,?) for all
i, m Inference replaces taking derivatives.
20
Gradient Ascent (cont)
Proof
21
Gradient Ascent (cont)

Since

22
Gradient Ascent (cont)

Putting all together we get

23
Expectation Maximization (EM)

A general purpose method for learning from
incomplete data
Intuition
If we had access to counts, then we can estimate
parameters
However, missing values do not allow to perform
counts
Complete counts using current parameter
assignment

24
Expectation Maximization (EM)
Expected Counts
Data
P(YHXH,ZT,?) 0.3
Y
Z
X
N (X,Y )
HTHHT
??HTT
TT?TH
X
Y

1.30.41.71.6
Current model
HHTT
HTHT
These numbers are placed for illustration they
have not been computed.
P(YHXT, ZT, ?) 0.4
25
EM (cont.)
Initial network (G,?0)
?
Training Data
26
Expectation Maximization (EM)

In practice, EM converges rather quickly at start
but converges slowly near the (possibly-local)
maximum.
Hence, often EM is used few iterations and then
Gradient Ascent steps are applied.

27
Final Homework

Question 1 Develop an algorithm that given a
pedigree input, provides the most probably
haplotype of each individual in the pedigree. Use
the Bayesian network model of superlink to
formulate the problem exactly as a query.
Specify the algorithm at length discussing as
many details as you can. Analyze its efficiency.
Devote time to illuminating notation and
presentation.

Question 2 Specialize the formula given in
Theorem GA for ? in genetic linkage analysis. In
particular, assume exactly 3 loci Marker 1,
Disease 2, Marker 3, with ? being the
recombination between loci 2 and 1 and 0.1- ?
being the recombination between loci 3 and 2.
Specify the formula for a pedigree with two
parents and two children.
Extend the formula for arbitrary pedigrees.
Note that ? is the same in many local probability
tables.