Sparse Approximations to Bayesian Gaussian Processes - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Sparse Approximations to Bayesian Gaussian Processes

Description:

GPs lead to very powerful Bayesian methods for function fitting, classification, ... Multiple updates: Cavity method [Opper, Winther], EP [Minka] ... – PowerPoint PPT presentation

Number of Views:115

Avg rating:3.0/5.0

Slides: 25

Provided by: Matthia93

Category:

more less

Transcript and Presenter's Notes

Title: Sparse Approximations to Bayesian Gaussian Processes

1
Sparse Approximations toBayesian Gaussian
Processes

Matthias Seeger
University of Edinburgh

2
Collaborators

Neil Lawrence (Sheffield)
Chris Williams (Edinburgh)
Ralf Herbrich (MSR Cambridge)

3
Overview of the Talk

Gaussian processes and approximations
Understanding sparse schemes aslikelihood
approximations
Two schemes and their relationships
Fast greedy selection for the projected latent
variables scheme (GP regression)

4
Why Sparse Approximations?

GPs lead to very powerful Bayesian methods for
function fitting, classification, etc. Yet
(Almost) Nobody uses them!
Reason Horrible scaling O(n3)
If sparse approximations work, there is a host of
applications, e.g. as building blocks in Bayesian
networks, etc.

5
Gaussian Process Models
Gaussian prior(dense),kernel K

Target y separated by latent u from all other
variables Inference a finite problem

6
Parameterisation

Data D (xi,yi) i1,,n.Latent outputs u
(u1,,un).
Approximate posterior process P(u() D)by GP
Q(u() D)

7
GP Approximations

Most (non-MCMC) GP approximations use this
representation
Exact computation of Q(u D) intractable,
needs

Attractive for sparse approximationsSequential
fitting of Q(u D) to P(u D)

8
Assumed Density Filtering

Update (ADF step)

9
Towards Sparsity

ADF Bayesian Online Opper.Multiple updates
Cavity method Opper, Winther, EP Minka
Generalizations EP Minka, ADATAP
Csato,Opper,Winther COW
Sequential updates suitable for sparse online or
greedy methods

10
Likelihood Approximations

Active set I ½ 1,,n, I d n
Several sparse schemes can be understood
aslikelihood approximations

11
Likelihood Approximations (II)
y1
y4
u1
u4
x1
x4
Active Set I 2,3
12
Likelihood Approximations (III)

For such sparse schemes
O(d2) parameters at most
Prediction in O(d2), O(d) for mean only
Approximations to marginal likelihood
(variational lower bound, ADATAP COW), PAC
bounds Seeger, etc., become cheap as well!

13
Two Schemes

IVM Lawrence, Seeger, Herbrich LSHADF with
fast greedy forward selection
Sparse Greedy GPR Smola, Bartlett SBGreedy,
expensive. Can be sped upProjected Latent
Variables Seeger, Lawrence, Williams. More
generalSparse batch ADATAP COW
Not here Sparse Online GP Csato, Opper

14
Informative Vector Machine

ADF, stopped after dinclusions could do
deletions, exchanges
Fast greedy forward selection using criteria
known in active learning
Faster than SVM on hard MNIST binary tasks, yet
probabilistic (error bars, etc.)

15
Why So Simple?

Locality Property of ADFMarginal Qnew(ui) in
O(1) from Q(ui)
Locality Property and GaussianityRelations
like Fast evaluation of differential
criteria

16
KL-Optimal Projections

Csato/Opper observed

17
KL-Optimal Projections (II)

For Gaussian likelihood
Can be used online or batch
A bit unfortunate We use relative entropy both
ways around!

18
Projected Latent Variables

Full GPR samples uI P(uI), uR P(uR uI), y
N(y u, s2 I).
Instead y N(y Eu uI, s2 I). Latent
variables uR replaced by projections in
likelihood SB (without interpret.)
Note Sparse batch ADATAP COW more general
(non-Gaussian likelihoods)

19
Fast Greedy Selections

With this likelihood approximation, typical
forward selection criteria (MAP SB diff.
entropy, info-gain LSH) are too expensive
Problem Upon inclusion, latent ui is coupled
with all targets y
Cheap criterion Ignore most couplings for score
evaluation (not for inclusion!)

20
Yet Another Approximation

To score xi, we approximate Qnew(u D) after
inclusion of i by

Example Information gain

21
Fast Greedy Selections (II)

Leads to O(1) criteria.Cost of searching over
all remaining points dominated by cost for
inclusion
Can easily be generalized to allow for couplings
between ui and some targets, if desired
Can be done for sparse batch ADATAP as well

22
Marginal Likelihood

The marginal likelihood is

Can be optimized efficiently w.r.t. s and kernel
parameters, O(n d (dp)) per gradient, p number
of parameters
Keep I fixed during line searches, reselect for
search directions

23
Conclusions

Most sparse approximations can be understood as
likelihood approximations
Several schemes available, all O(n d2), yet
constants do matter here!
Fast information-theoretic criteria effective for
classification Extension to active
learning straightforward

24
Conclusions (II)

Missing Experimental comparison, esp. to test
effectiveness of marginal likelihood optimization
Extensions
C classes Easy in O(n d2 C2), maybe in O(n d2 C)
Integrate with Bayesian networksFriedman,
Nachman

Write a Comment

User Comments (0)