Title: Learning Bayesian networks
1Learning Bayesian networks
2Learning Bayesian networks
Inducer
3Known Structure -- Incomplete Data
E, B, A ltY,N,Ngt ltY,?,Ygt ltN,N,Ygt ltN,Y,?gt .
. lt?,Y,Ygt
Inducer
- Network structure is specified
- Data contains missing values
- We consider assignments to missing values
4Known Structure / Complete Data
- Given a network structure G
- And choice of parametric family for P(XiPai)
- Learn parameters for network from complete data
- Goal
- Construct a network that is closest to
probability distribution that generated the data
5Maximum Likelihood Estimation in Binomial Data
- Applying the MLE principle we get
- (Which coincides with what one would expect)
Example (NH,NT ) (3,2) MLE estimate is 3/5
0.6
6Learning Parameters for a Bayesian Network
- Training data has the form
7Learning Parameters for a Bayesian Network
- Since we assume i.i.d. samples,likelihood
function is
8Learning Parameters for a Bayesian Network
- By definition of network, we get
9Learning Parameters for a Bayesian Network
10General Bayesian Networks
- Generalizing for any Bayesian network
- The likelihood decomposes according to the
structure of the network.
i.i.d. samples
Network factorization
11General Bayesian Networks (Cont.)
- Complete Data ? Decomposition
- ? Independent Estimation Problems
- If the parameters for each family are not
related, then they can be estimated independently
of each other. - (Not true in Genetic Linkage analysis).
12Learning Parameters Summary
- For multinomial we collect sufficient statistics
which are simply the counts N (xi,pai) - Parameter estimation
- Bayesian methods also require choice of priors
- Both MLE and Bayesian are asymptotically
equivalent and consistent.
13Known Structure -- Incomplete Data
E, B, A ltY,N,Ngt ltY,?,Ygt ltN,N,Ygt ltN,Y,?gt .
. lt?,Y,Ygt
Inducer
- Network structure is specified
- Data contains missing values
- We consider assignments to missing values
14Learning Parameters from Incomplete Data
- Incomplete data
- Posterior distributions can become interdependent
- Consequence
- ML parameters can not be computed separately for
each multinomial - Posterior is not a product of independent
posteriors
15Learning Parameters from Incomplete Data (cont.).
- In the presence of incomplete data, the
likelihood can have multiple global maxima - Example
- We can rename the values of hidden variable H
- If H has two values, likelihood has two global
maxima - Similarly, local maxima are also replicated
- Many hidden variables ? a serious problem
16MLE from Incomplete Data
- Finding MLE parameters nonlinear optimization
problem
L(?D)
?
17MLE from Incomplete Data
- Finding MLE parameters nonlinear optimization
problem
L(?D)
?
18MLE from Incomplete Data
Both Ideas Find local maxima only. Require
multiple restarts to find approximation to the
global maximum.
19Gradient Ascent
Requires computation P(xi,paiom,?) for all
i, m Inference replaces taking derivatives.
20Gradient Ascent (cont)
Proof
21Gradient Ascent (cont)
22Gradient Ascent (cont)
- Putting all together we get
23Expectation Maximization (EM)
- A general purpose method for learning from
incomplete data - Intuition
- If we had access to counts, then we can estimate
parameters - However, missing values do not allow to perform
counts - Complete counts using current parameter
assignment
24Expectation Maximization (EM)
Expected Counts
Data
P(YHXH,ZT,?) 0.3
Y
Z
X
N (X,Y )
HTHHT
??HTT
TT?TH
X
Y
1.30.41.71.6
Current model
HHTT
HTHT
These numbers are placed for illustration they
have not been computed.
P(YHXT, ZT, ?) 0.4
25EM (cont.)
Initial network (G,?0)
?
Training Data
26Expectation Maximization (EM)
- In practice, EM converges rather quickly at start
but converges slowly near the (possibly-local)
maximum. - Hence, often EM is used few iterations and then
Gradient Ascent steps are applied.
27Final Homework
- Question 1 Develop an algorithm that given a
pedigree input, provides the most probably
haplotype of each individual in the pedigree. Use
the Bayesian network model of superlink to
formulate the problem exactly as a query.
Specify the algorithm at length discussing as
many details as you can. Analyze its efficiency.
Devote time to illuminating notation and
presentation.
- Question 2 Specialize the formula given in
Theorem GA for ? in genetic linkage analysis. In
particular, assume exactly 3 loci Marker 1,
Disease 2, Marker 3, with ? being the
recombination between loci 2 and 1 and 0.1- ?
being the recombination between loci 3 and 2. - Specify the formula for a pedigree with two
parents and two children. - Extend the formula for arbitrary pedigrees.
- Note that ? is the same in many local probability
tables.