Sparse Approximations to Bayesian Gaussian Processes - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Sparse Approximations to Bayesian Gaussian Processes

Description:

GPs lead to very powerful Bayesian methods for function fitting, classification, ... Multiple updates: Cavity method [Opper, Winther], EP [Minka] ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 25
Provided by: Matthia93
Category:

less

Transcript and Presenter's Notes

Title: Sparse Approximations to Bayesian Gaussian Processes


1
Sparse Approximations toBayesian Gaussian
Processes
  • Matthias Seeger
  • University of Edinburgh

2
Collaborators
  • Neil Lawrence (Sheffield)
  • Chris Williams (Edinburgh)
  • Ralf Herbrich (MSR Cambridge)

3
Overview of the Talk
  • Gaussian processes and approximations
  • Understanding sparse schemes aslikelihood
    approximations
  • Two schemes and their relationships
  • Fast greedy selection for the projected latent
    variables scheme (GP regression)

4
Why Sparse Approximations?
  • GPs lead to very powerful Bayesian methods for
    function fitting, classification, etc. Yet
    (Almost) Nobody uses them!
  • Reason Horrible scaling O(n3)
  • If sparse approximations work, there is a host of
    applications, e.g. as building blocks in Bayesian
    networks, etc.

5
Gaussian Process Models
Gaussian prior(dense),kernel K
  • Target y separated by latent u from all other
    variables Inference a finite problem

6
Parameterisation
  • Data D (xi,yi) i1,,n.Latent outputs u
    (u1,,un).
  • Approximate posterior process P(u() D)by GP
    Q(u() D)

7
GP Approximations
  • Most (non-MCMC) GP approximations use this
    representation
  • Exact computation of Q(u D) intractable,
    needs
  • Attractive for sparse approximationsSequential
    fitting of Q(u D) to P(u D)

8
Assumed Density Filtering
  • Update (ADF step)

9
Towards Sparsity
  • ADF Bayesian Online Opper.Multiple updates
    Cavity method Opper, Winther, EP Minka
  • Generalizations EP Minka, ADATAP
    Csato,Opper,Winther COW
  • Sequential updates suitable for sparse online or
    greedy methods

10
Likelihood Approximations
  • Active set I ½ 1,,n, I d n
  • Several sparse schemes can be understood
    aslikelihood approximations

11
Likelihood Approximations (II)
y1
y4
u1
u4
x1
x4
Active Set I 2,3
12
Likelihood Approximations (III)
  • For such sparse schemes
  • O(d2) parameters at most
  • Prediction in O(d2), O(d) for mean only
  • Approximations to marginal likelihood
    (variational lower bound, ADATAP COW), PAC
    bounds Seeger, etc., become cheap as well!

13
Two Schemes
  • IVM Lawrence, Seeger, Herbrich LSHADF with
    fast greedy forward selection
  • Sparse Greedy GPR Smola, Bartlett SBGreedy,
    expensive. Can be sped upProjected Latent
    Variables Seeger, Lawrence, Williams. More
    generalSparse batch ADATAP COW
  • Not here Sparse Online GP Csato, Opper

14
Informative Vector Machine
  • ADF, stopped after dinclusions could do
    deletions, exchanges
  • Fast greedy forward selection using criteria
    known in active learning
  • Faster than SVM on hard MNIST binary tasks, yet
    probabilistic (error bars, etc.)

15
Why So Simple?
  • Locality Property of ADFMarginal Qnew(ui) in
    O(1) from Q(ui)
  • Locality Property and GaussianityRelations
    like Fast evaluation of differential
    criteria

16
KL-Optimal Projections
  • Csato/Opper observed

17
KL-Optimal Projections (II)
  • For Gaussian likelihood
  • Can be used online or batch
  • A bit unfortunate We use relative entropy both
    ways around!

18
Projected Latent Variables
  • Full GPR samples uI P(uI), uR P(uR uI), y
    N(y u, s2 I).
  • Instead y N(y Eu uI, s2 I). Latent
    variables uR replaced by projections in
    likelihood SB (without interpret.)
  • Note Sparse batch ADATAP COW more general
    (non-Gaussian likelihoods)

19
Fast Greedy Selections
  • With this likelihood approximation, typical
    forward selection criteria (MAP SB diff.
    entropy, info-gain LSH) are too expensive
  • Problem Upon inclusion, latent ui is coupled
    with all targets y
  • Cheap criterion Ignore most couplings for score
    evaluation (not for inclusion!)

20
Yet Another Approximation
  • To score xi, we approximate Qnew(u D) after
    inclusion of i by
  • Example Information gain

21
Fast Greedy Selections (II)
  • Leads to O(1) criteria.Cost of searching over
    all remaining points dominated by cost for
    inclusion
  • Can easily be generalized to allow for couplings
    between ui and some targets, if desired
  • Can be done for sparse batch ADATAP as well

22
Marginal Likelihood
  • The marginal likelihood is
  • Can be optimized efficiently w.r.t. s and kernel
    parameters, O(n d (dp)) per gradient, p number
    of parameters
  • Keep I fixed during line searches, reselect for
    search directions

23
Conclusions
  • Most sparse approximations can be understood as
    likelihood approximations
  • Several schemes available, all O(n d2), yet
    constants do matter here!
  • Fast information-theoretic criteria effective for
    classification Extension to active
    learning straightforward

24
Conclusions (II)
  • Missing Experimental comparison, esp. to test
    effectiveness of marginal likelihood optimization
  • Extensions
  • C classes Easy in O(n d2 C2), maybe in O(n d2 C)
  • Integrate with Bayesian networksFriedman,
    Nachman
Write a Comment
User Comments (0)
About PowerShow.com