Hierarchical%20Mixture%20of%20Experts - PowerPoint PPT Presentation

About This Presentation
Title:

Hierarchical%20Mixture%20of%20Experts

Description:

Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005 Outline Background Hierarchical tree structure Gating ... – PowerPoint PPT presentation

Number of Views:204
Avg rating:3.0/5.0
Slides: 24
Provided by: 1149143
Category:

less

Transcript and Presenter's Notes

Title: Hierarchical%20Mixture%20of%20Experts


1
Hierarchical Mixture of Experts
  • Presented by Qi An
  • Machine learning reading group
  • Duke University
  • 07/15/2005

2
Outline
  • Background
  • Hierarchical tree structure
  • Gating networks
  • Expert networks
  • E-M algorithm
  • Experimental results
  • Conclusions

3
Background
  • The idea of mixture of experts
  • First presented by Jacobs and Hintons in 1988
  • Hierarchical mixture of experts
  • Proposed by Jordan and Jacobs in 1994
  • Difference from previous mixture model
  • Mixing weights depends on both the input and the
    output

4
Example (ME)
5
One-layer structure
µ
Ellipsoidal Gating function
g1
g2
g3
Gating Network
x
µ1
µ2
µ3
Expert Network
Expert Network
Expert Network
x
x
x
6
Example (HME)
7
Hierarchical tree structure
Linear Gating function
8
  • Expert network
  • At the leaves of trees
  • for each expert

linear predictor
output of the expert
link function For example logistic function for
binary classification
9
  • Gating network
  • At the nonterminal of the tree
  • top layer other layer

10
  • Output
  • At the non-leaves nodes
  • top node other nodes

11
Probability model
  • For each expert, assume the true output y is
    chosen from a distribution P with mean µij
  • Therefore, the total probability of generating y
    from x is given by

12
Posterior probabilities
  • Since the gij and gi are computed based only on
    the input x, we refer them as prior
    probabilities.
  • We can define the posterior probabilities with
    the knowledge of both the input x and the output
    y using Bayes rule

13
E-M algorithm
  • Introduce auxiliary variables zij which have an
    interpretation as the labels that corresponds to
    the experts.
  • The probability model can be simplified with the
    knowledge of auxiliary variables

14
E-M algorithm
  • Complete-data likelihood
  • The E-step

15
E-M algorithm
  • The M-step

16
IRLS
  • Iteratively reweighted least squares alg.
  • An iterative algorithm for computing the maximum
    likelihood estimates of the parameters of a
    generalized linear model
  • A special case for Fisher scoring method

17
Algorithm
E-step
M-step
18
Online algorithm
  • This algorithm can be used for online regression
  • For Expert network
  • where Rij is the inverse covariance matrix for
    EN(i,j)

19
Online algorithm
  • For Gating network
  • where Si is the inverse covariance matrix
  • and
  • where Sij is the inverse covariance matrix

20
Results
  • Simulated data of a four-joint robot arm moving
    in three-dimensional space

21
Results
22
Conclusions
  • Introduce a tree-structured architecture for
    supervised learning
  • Much faster than traditional back-propagation
    algorithm
  • Can be used for on-line learning

23
Thank you
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com