Title: Ratto Lecture 1
1 Bayesian uncertainty estimationand global
sensitivity analysis M. Ratto, IPSC-JRC,
European Commission
2Bayesian estimation
Let us consider a given deterministic model that
has to be estimated according to some set of
observations. The analyst has a set of prior
assumptions on the model factors, expressed as
prior distributions . The posterior
distribution of the model factors given the data
set D is, applying the Bayes chain rule,
3Bayesian estimation
The likelihood function tells us how much the
experimental data support the various possible
statistical hypotheses on model parameters. It
is a scalar function of model parameters. e.g.
in a standard linear regression model where the
variance of the error terms is known, the
log-likelihood is proportional to the sum of
squared errors. The likelihood is where the
model structure and the data enter the problem,
possibly in terms of sums of squared errors as in
the case of a simple linear regression model.
4Bayesian estimation
If Y is the model output of interest, its
posterior mean and variance are Bayesian
parameter estimates can be given in terms of
posterior mean and covariance as
5Bayesian estimation
Integrated likelihood is a measure of fit for
the model.
6Bayesian hypothesis testing
When a set of models Mh, h1, H is
available comparisons are made considering
Bayes factors obtained by integrating (not
maximising) the likelihood over the parameter
space.
7Bayesian Hypothesis Testing
The Bayes factor is a "summary of the evidence
provided by the data in favour of one scientific
theory, represented by a statistical model, as
opposed to another" (Kass and Raftery, 1995).
8Bayesian estimation
Bayesian model averaging (BMA) is an approach to
modelling in which all possible sources of
uncertainty are taken into account (model
structures, model parameters and data
uncertainty) based on Bayesian theory.
9Bayesian methods
- Bayesian methods are an intuitively attractive
solution to the problem of accounting for the
different sources of uncertainty, but presents
several difficulties. - The integrals implicit can in general be hard to
compute - Monte Carlo methods (acceptance sampling,
importance sampling) or better - Markov chain Monte Carlo methods (Gibbs sampler,
Metropolis-Hastings).
10Monte Carlo methods
The most trivial way is to sample from the prior,
so
11Monte Carlo methods
Trivial route has problems since, usually, the
posterior is concentrated w.r.t. the prior. the
rate of success of this MC strategy is very
small, i.e. most of the sampled x(i)s have very
small likelihood values, implying that all
inference is dominated by a few points with large
likelihood values and that the algorithm
converges very slowly.
12Monte Carlo methods
Other methods have to be applied, Gibbs and
Metropolis (MCMC) being the most advisable. With
samples from the posterior And the
integrated likelihood
13Generalised Likelihood Uncertainty Estimation
(GLUE)
GLUE can be seen an extension of RSA or as a
simplification of rigorous Bayesian approaches
Beven K.J., Binley A., The Future of Distributed
Models Model Calibration and Uncertainty
Prediction, Hydrological Processes, 6, 279-298,
1992. Romanowicz R., Beven K., Tawn J.,
Evaluation of predictive uncertainty in nonlinear
hydrological models using a Bayesian approach. In
Statistics for the Environment 2, Water Related
Issues. Barnett V., Turkman F. (eds) pp.
297-315, Wiley New York, 1994.
14What is GLUE
data
MC model runs
Y(t)
weighting function
Uncertainty estimation
15What is GLUE
Most typical weighting functions where
is
the mean square error
for the i-th run
16GLUE vs. RSA
- works with multiple sets of factors (via MC
sampling)
- model runs weighted and ranked on a likelihood
scale via conditioning on observations and the
weights are used to formulate a cumulative
distribution of predictions.
Comparing with RSA, no filtering is made
(on-off), BUT a smooth function is used to
evaluate behaviour of simulations.
17GLUE vs. Bayesian methods
- Two main problematic aspects with GLUE
- the definition of the weighting function is a
fundamental aspect and the uncertainty prediction
can strongly depend on that definition - in a Bayesian framework, this is connected to
how errors in the observations and in the model
structure are represented by a statistical model
in GLUE the 'qualitative' definition of the
weights, based essentially on an inverse
relationship to the mean square error, makes this
procedure easier and more flexible, but a bit
ambiguous
18GLUE vs. Bayesian methods
- the sampling strategy of GLUE has very poor
efficiency properties, which can make the
statistical properties of the GLUE inference
poorly significant - the use of importance sampling could be a first
step in the direction of improving efficiency,
without introducing too much complication in the
methodology (e.g. Young and Romanowicz, 2003).
19A role for SA?
The aim of applying sensitivity analysis in the
Bayesian framework is to address the problem of
describing the acceptable parameter structure,
i.e. characterise the parameter
calibration/estimation
As in RSA, complex interaction structures are
typical complex structured factor subsets
having similar likelihood of being simulators of
reality are usually identified.
20A role for SA?
There are at least two possibilities of
performing a sensitivity analysis on the model
output itself or on the likelihood. The
calibration issue is addressed by performing a
sensitivity analysis of the likelihood (or of the
weighting function for GLUE). Ratto et al. (2001)
21A role for SA?
- When trying to map back the output uncertainty to
the input factors, as for RSA -
- conventional multivariate statistics (PCA ) is
not very revealing (Spear et al., 1994)
- complex parametric interactions do not become
evident from looking at univariate marginal
probability densities (Spear, 1997)
22GSA in Bayesian analysis
Likelihood methodological issues for SA
- non monotonic input-output mapping
- high level of interaction between input factors.
? variance-based Global Sensitivity Analysis
(GSA) quantitative methods, without restrictions
about monotony or additivity of the model.
23Why global SA?
- global SA of the likelihood, means decomposing
the variance of the likelihood, i.e. we are
asking ourselves - which parameters drive the most of the variation
of the likelihood?. - Does the analysis of the variance give
information on which model parameters mainly
drive the goodness of the model?
24Why global SA?
- Sensitivity index of the likelihood
- measures the variation of the univariate function
- Around the unconditional mean , i.e.
the integrated likelihood
25Why global SA?
- Note
- Decomposing the log-likelihood, would allow to
factorise the likelihood!
26Why global SA?
- We can call
- The conditional integrated likelihood.
- If the main effect of Xi there will be values of
Xi for which is significantly smaller than and
other values for which is significantly larger
than . - If we were allowed to tune (or fix) the values of
Xi, we would be able to let the integrated
likelihood increase (i.e. increase the goodness
of the model). - The same holds if groups of factors, allowing to
identify interaction structures.
27Why GSA?
- The determination of both main effect (as other
methods) and total effect allows a classification
in terms of necessary and sufficient conditions
- factors with high main effect
- affect model output singularly, independently of
interaction
- factors with small main effect but high total
effect - influence on the model output mainly through
interaction
- factors with small main and total effect
- have a negligible effect on the model output and
can be fixed or neglected (? more powerful than
RSA!).
28Why GSA?
- groups of factors having high group effect
- affect model output singularly, independently of
interaction with parameters outside the group
29How do we apply GSA in Bayesian analysis?
We should do in a way that, with the same set of
model runs,
- predictive uncertainty can be estimated
- sensitivity indices computed and
- analysis of the interaction structure
performed.
30How do we apply SA in Bayesian analysis?
If we sample from the prior (as in GLUE) this is
very simple. Factors have usually independent
prior distributions, so we can design Sobol,
FAST, sampling strategies. In a general context,
when a sample from the posterior is available,
methods pattern recognition smoothing can be
applied. Moreover, the decomposition of the
integrated likelihood has to deal with an
harmonic mean estimator, implying specific
methodological issues still under analysis.
31A chemical system
We consider an isothermal first order
irreversible reaction in a batch system A?B
32A chemical system
Model solution
Model parameters
33A chemical system
Input factors Outputs 1) yB(t) (time
dependent output) 2) a weighting function, as
defined in GLUE
34A chemical system
Weighting function
where
is the mean square error for the i-th run
35A chemical system
Time dependent output Experimental
time series and 5 and 95 confidence bounds for
the output yB.
36A chemical system
Sensitivity of yB(t)
? main effects sum up to 1 (small interaction) ?
kinetics mainly dominates the process
37A chemical system
Sensitivity of weights 1/s2
?no predominance in main effect ?kinetic
parameters have big interaction
38A chemical system
Sensitivity of weights (likelihood)
- There is a relationship between
- the difference between total and first order
sensitivity indices, - the indeterminacy of the optimisation
(estimation) problem, and - the interaction structure of the input factors in
the posterior distribution after conditioning to
the observations
39A chemical system
Scatter plots ? Main effect sensitivity indices
? RSA
40A chemical system
- initial condition is unimportant (smallest ST)
- chemical rate factors mainly drive the model fit
to the experimental data (highest Si and SiT) - BUT, chemical rate factors cannot be precisely
estimated, because their effect is dominated by
interaction terms - high correlation between estimates of k? and E
is usually detected
41A chemical system
Correlation analysis the coefficient
between kinetic parameters is positive the
couple of factors acts in the model as a
quotient/difference.
42A chemical system
Let us now consider the same chemical system, but
assume that 9 sets of observations are available
at 9 different temperatures.
43A chemical system
Sensitivity indices Correlation analysis
44A chemical system
- Comparing the same model with a different
experimental evidence - interaction between the kinetic factors is
strongly reduced - the correlation structure is now very weak
- a model can be over-parameterised or not,
according to the evidence with which it is
compared, implying that the calibration and
sensitivity analysis exercise is useful also when
the same mathematical model is used to describe
different realities
45Caveats
The performance of a global SA to the
weighting/likelihood functions gives a full
interaction structure, which obviously is not the
same as a functional representation. The latter
is something additional with respect to the
performance of a global SA and can be in some
cases a formidable task. This task usually
requires the use of computationally intensive
methods and/or the formulation of hypotheses
about the interaction structure and the
introduction of a certain degree of arbitrariness
for such representation.
46Caveats
- On the other hand, the application of a global SA
provides a quantitative evaluation about
fundamental aspects such as - which factors are important for calibration, i.e.
are somehow conditioned by observations - the degree of complexity of the interaction
structure - which factors are involved in the interaction
structure. - Such information has a general validity, since it
is obtained without assumptions about the model
structure and/or the error structure.
47Interaction
- Possible approaches
- Correlation analysis of the posterior pdf (B
subset) - Variance based GSA
- Spears Tree-Structured Densitiy Estimation
technique - HDMR, cut-HDMR
- Morris