Title: Gaussian process modelling
1Gaussian process modelling
2Outline
- Emulators
- The basic GP emulator
- Practical matters
3Emulators
4Simulator, meta-model, emulator
- Ill refer to a computer model as a simulator
- It aims to simulate some real-world phenomenon
- A meta-model is a simplified representation or
approximation of a simulator - Built using a training set of simulator runs
- Importantly, it should run much more quickly than
the simulator itself - So it serves as a quick surrogate for the
simulator, for any task that would require many
simulator runs - An emulator is a particular kind of meta-model
- More than just an approximation, it makes fully
probabilistic predictions of what the simulator
would produce - And those probability statements correctly
reflect the training information
5Meta-models
- Various kinds of meta-models have been proposed
by modellers and model users - Notably regression models and neural networks
- But misrepresenttraining data
- Line does not passthrough the points
- Variance around theline also has thewrong form
6Emulation
- Desirable properties for a meta-model
- If asked to predict the simulator output at one
of the training data points, it returns the
observed output with zero variance - Assuming the simulator output doesnt have random
noise - So it must be sufficiently flexible to pass
through all the training data points - Not restricted to some regression form
- If asked to predict output at another point its
predictions will have non-zero variance,
reflecting realistic uncertainty - Given enough training data it should be able to
predict simulator output to any desired accuracy - These properties characterise what we call an
emulator
72 code runs
- Consider one input and one output
- Emulator estimate interpolates data
- Emulator uncertainty grows between data points
83 code runs
- Adding another point changes estimate and reduces
uncertainty
95 code runs
10The basic GP emulator
11Gaussian processes
- A Gaussian process (GP) is a probability
distribution for an unknown function - A kind of infinite dimensional multivariate
normal distribution - If a function f(x) has a GP distribution we write
- f(.) GP(m(.), c(.,.))
- m(.) is the mean function
- c(.,.) is the covariance function
- f(x) has a normal distribution with mean m(x) and
variance c(x,x) - c(x,x') is the covariance between f(x) and f(x')
- A GP emulator represents the simulator as a GP
- Conditional on some unknown parameters
- Estimated from the training data
12The mean function
- The emulators mean function provides the central
estimate for predicting the model output f(x) - It has two parts
- A conventional regression component
- r(x) µ ß1h1(x) ß2h2(x) ßphp(x)
- The regression terms hj(x) are a modelling choice
- Should reflect how we expect the simulator to
respond to its inputs - E.g. r(x) µ ß1x1 ß2x2 ßpxp models a
general linear trend - The coefficients µ and ßj are estimated from the
training data - A smooth interpolator of the residuals yi
r(xi) at the training points - Smoothness is controlled by correlation length
parameters - Also estimated from the training data
13The mean function example
Red dots are training data Green line is
regression line Black line is emulator mean
Red dots are residuals from regression through
training data Black line is smoothed residuals.
14The prediction variance
- The variance of f(x) depends on where x is
relative to training data - At a training data point, it is zero
- Moving away from a training point, it grows
- Growth depends on correlation lengths
- When far from any training point (relative to
correlation lengths), it resolves into two
components - The usual regression variance
- An interpolator variance
- Estimated from observed variance of residuals
- The mean function is then just the regression part
15Correlation length
- Correlation length parameters are crucial
- But difficult to estimate
- There is one correlation length for each input
- Points less than one correlation length away in a
single input are highly correlated - Learning f(x') says a lot about f(x)
- So if x' is a training point, the predictive
uncertainty about f(x) is small - But if we go more than about two correlation
lengths away, the correlation is minimal - We now ignore f(x') when predicting f(x)
- Just use regression
- Large correlation length signifies an input with
very smooth and predictable effect on simulator
output - Small correlation length denotes an input with
more variable and fine scale influence on the
output
16Correlation length and variance
Examples of GP realisations. GEM-SA uses a
roughness parameter b which is the inverse square
of correlation length. s2 is the interpolation
variance.
17Practical matters
18Modelling
- The main modelling decision is to choose the
regression terms hj(x) - Want to capture the broad shape of the response
of the simulator to its inputs - Then residuals are small
- Emulator predicts f(x) with small variance
- And predicts realistically for x far from
training data - If we get it wrong
- Residuals will be unnecessarily large
- Emulator has unnecessarily large variance when
interpolating - And extrapolates wrongly
19Design
- Another choice is the set of training data points
- This is a kind of experimental design problem
- We want points spread over the part of the input
space for which the emulator is needed - So that no prediction is too far from a training
point - We want this to be true also when we project the
points into lower dimensions - So that prediction points are not too far from
training points in dimensions (inputs) with small
correlation lengths - We also want some points closer to each other
- To estimate correlation lengths better
- Conventional designs dont take account of this
yet!
20Validation
- No emulator is perfect
- The GP emulator is based on assumptions
- A particular form of covariance function
parametrised by just one correlation length
parameter per input - Homogeneity of variance and correlation structure
- Simulators rarely behave this nicely!
- Getting the regression component right
- Normality
- Not usually a big issue
- Estimating parameters accurately from the
training data - Can be a problem for correlation lengths
- Failure of these assumptions will mean the
emulator does not predict faithfully - f(x) will too often lie outside the range of its
predictive distribution - So we need to apply suitable diagnostic checks
21When to use GP emulation
- The simulator output should vary smoothly in
response to changing its inputs - Discontinuities are difficult to emulate
- Very rapid and erratic responses to inputs also
may need unreasonably many training data points - The simulator is computer intensive
- So its not practical to run many thousands of
times for Monte Carlo methods - But not so that we cant run it a few hundred
times to build a good emulator - Not too many inputs
- Fitting the emulator is hard
- Particularly if more than a few inputs influence
the output strongly
22Stochastic simulators
- Throughout this course we are assuming the
simulator is deterministic - Running it again at the same inputs will produce
the same outputs - If there is random noise in the outputs we can
modify the emulation theory - Mean function doesnt have to pass through the
data - Noise increases predictive variance
- The benefits of the GP emulator are less
compelling - But we are working on this!
23References
- O'Hagan, A. (2006). Bayesian analysis of computer
code outputs a tutorial. Reliability Engineering
and System Safety 91, 1290-1300. - Santner, T. J., Williams, B. J. and Notz, W. I.
(2003). The Design and Analysis of Computer
Experiments. New York Springer. - Rasmussen, C. E., and Williams, C. K. I. (2006).
Gaussian Processes for Machine Learning.
Cambridge, MA MIT Press.