Ratto Lecture 1 - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Ratto Lecture 1

Description:

Let us consider a given deterministic model that has to be estimated ... quantitative methods, without restrictions about monotony or additivity of the model. ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 48

Provided by: niclagi

Category:

more less

Transcript and Presenter's Notes

Title: Ratto Lecture 1

1
Bayesian uncertainty estimationand global
sensitivity analysis M. Ratto, IPSC-JRC,
European Commission
2
Bayesian estimation
Let us consider a given deterministic model that
has to be estimated according to some set of
observations. The analyst has a set of prior
assumptions on the model factors, expressed as
prior distributions . The posterior
distribution of the model factors given the data
set D is, applying the Bayes chain rule,
3
Bayesian estimation
The likelihood function tells us how much the
experimental data support the various possible
statistical hypotheses on model parameters. It
is a scalar function of model parameters. e.g.
in a standard linear regression model where the
variance of the error terms is known, the
log-likelihood is proportional to the sum of
squared errors. The likelihood is where the
model structure and the data enter the problem,
possibly in terms of sums of squared errors as in
the case of a simple linear regression model.
4
Bayesian estimation
If Y is the model output of interest, its
posterior mean and variance are Bayesian
parameter estimates can be given in terms of
posterior mean and covariance as
5
Bayesian estimation
Integrated likelihood is a measure of fit for
the model.
6
Bayesian hypothesis testing
When a set of models Mh, h1, H is
available comparisons are made considering
Bayes factors obtained by integrating (not
maximising) the likelihood over the parameter
space.
7
Bayesian Hypothesis Testing
The Bayes factor is a "summary of the evidence
provided by the data in favour of one scientific
theory, represented by a statistical model, as
opposed to another" (Kass and Raftery, 1995).
8
Bayesian estimation
Bayesian model averaging (BMA) is an approach to
modelling in which all possible sources of
uncertainty are taken into account (model
structures, model parameters and data
uncertainty) based on Bayesian theory.
9
Bayesian methods

Bayesian methods are an intuitively attractive
solution to the problem of accounting for the
different sources of uncertainty, but presents
several difficulties.
The integrals implicit can in general be hard to
compute
Monte Carlo methods (acceptance sampling,
importance sampling) or better
Markov chain Monte Carlo methods (Gibbs sampler,
Metropolis-Hastings).

10
Monte Carlo methods
The most trivial way is to sample from the prior,
so
11
Monte Carlo methods
Trivial route has problems since, usually, the
posterior is concentrated w.r.t. the prior. the
rate of success of this MC strategy is very
small, i.e. most of the sampled x(i)s have very
small likelihood values, implying that all
inference is dominated by a few points with large
likelihood values and that the algorithm
converges very slowly.
12
Monte Carlo methods
Other methods have to be applied, Gibbs and
Metropolis (MCMC) being the most advisable. With
samples from the posterior And the
integrated likelihood
13
Generalised Likelihood Uncertainty Estimation
(GLUE)
GLUE can be seen an extension of RSA or as a
simplification of rigorous Bayesian approaches
Beven K.J., Binley A., The Future of Distributed
Models Model Calibration and Uncertainty
Prediction, Hydrological Processes, 6, 279-298,
1992. Romanowicz R., Beven K., Tawn J.,
Evaluation of predictive uncertainty in nonlinear
hydrological models using a Bayesian approach. In
Statistics for the Environment 2, Water Related
Issues. Barnett V., Turkman F. (eds) pp.
297-315, Wiley New York, 1994.
14
What is GLUE
data
MC model runs
Y(t)
weighting function
Uncertainty estimation
15
What is GLUE
Most typical weighting functions where
is
the mean square error
for the i-th run
16
GLUE vs. RSA
- works with multiple sets of factors (via MC
sampling)
- model runs weighted and ranked on a likelihood
scale via conditioning on observations and the
weights are used to formulate a cumulative
distribution of predictions.
Comparing with RSA, no filtering is made
(on-off), BUT a smooth function is used to
evaluate behaviour of simulations.
17
GLUE vs. Bayesian methods

Two main problematic aspects with GLUE
the definition of the weighting function is a
fundamental aspect and the uncertainty prediction
can strongly depend on that definition
in a Bayesian framework, this is connected to
how errors in the observations and in the model
structure are represented by a statistical model
in GLUE the 'qualitative' definition of the
weights, based essentially on an inverse
relationship to the mean square error, makes this
procedure easier and more flexible, but a bit
ambiguous

18
GLUE vs. Bayesian methods

the sampling strategy of GLUE has very poor
efficiency properties, which can make the
statistical properties of the GLUE inference
poorly significant
the use of importance sampling could be a first
step in the direction of improving efficiency,
without introducing too much complication in the
methodology (e.g. Young and Romanowicz, 2003).

19
A role for SA?
The aim of applying sensitivity analysis in the
Bayesian framework is to address the problem of
describing the acceptable parameter structure,
i.e. characterise the parameter
calibration/estimation
As in RSA, complex interaction structures are
typical complex structured factor subsets
having similar likelihood of being simulators of
reality are usually identified.
20
A role for SA?
There are at least two possibilities of
performing a sensitivity analysis on the model
output itself or on the likelihood. The
calibration issue is addressed by performing a
sensitivity analysis of the likelihood (or of the
weighting function for GLUE). Ratto et al. (2001)
21
A role for SA?

When trying to map back the output uncertainty to
the input factors, as for RSA

conventional multivariate statistics (PCA ) is
not very revealing (Spear et al., 1994)

complex parametric interactions do not become
evident from looking at univariate marginal
probability densities (Spear, 1997)

22
GSA in Bayesian analysis
Likelihood methodological issues for SA

non monotonic input-output mapping

high level of interaction between input factors.

? variance-based Global Sensitivity Analysis
(GSA) quantitative methods, without restrictions
about monotony or additivity of the model.
23
Why global SA?

global SA of the likelihood, means decomposing
the variance of the likelihood, i.e. we are
asking ourselves
which parameters drive the most of the variation
of the likelihood?.
Does the analysis of the variance give
information on which model parameters mainly
drive the goodness of the model?

24
Why global SA?

Sensitivity index of the likelihood
measures the variation of the univariate function
Around the unconditional mean , i.e.
the integrated likelihood

25
Why global SA?

Note
Decomposing the log-likelihood, would allow to
factorise the likelihood!

26
Why global SA?

We can call
The conditional integrated likelihood.
If the main effect of Xi there will be values of
Xi for which is significantly smaller than and
other values for which is significantly larger
than .
If we were allowed to tune (or fix) the values of
Xi, we would be able to let the integrated
likelihood increase (i.e. increase the goodness
of the model).
The same holds if groups of factors, allowing to
identify interaction structures.

27
Why GSA?

The determination of both main effect (as other
methods) and total effect allows a classification
in terms of necessary and sufficient conditions

factors with high main effect
affect model output singularly, independently of
interaction

factors with small main effect but high total
effect
influence on the model output mainly through
interaction

factors with small main and total effect
have a negligible effect on the model output and
can be fixed or neglected (? more powerful than
RSA!).

28
Why GSA?

Group of factors

groups of factors having high group effect
affect model output singularly, independently of
interaction with parameters outside the group

29
How do we apply GSA in Bayesian analysis?
We should do in a way that, with the same set of
model runs,
- predictive uncertainty can be estimated
- sensitivity indices computed and
- analysis of the interaction structure
performed.
30
How do we apply SA in Bayesian analysis?
If we sample from the prior (as in GLUE) this is
very simple. Factors have usually independent
prior distributions, so we can design Sobol,
FAST, sampling strategies. In a general context,
when a sample from the posterior is available,
methods pattern recognition smoothing can be
applied. Moreover, the decomposition of the
integrated likelihood has to deal with an
harmonic mean estimator, implying specific
methodological issues still under analysis.
31
A chemical system
We consider an isothermal first order
irreversible reaction in a batch system A?B
32
A chemical system
Model solution
Model parameters
33
A chemical system
Input factors Outputs 1) yB(t) (time
dependent output) 2) a weighting function, as
defined in GLUE
34
A chemical system
Weighting function
where
is the mean square error for the i-th run
35
A chemical system
Time dependent output Experimental
time series and 5 and 95 confidence bounds for
the output yB.
36
A chemical system
Sensitivity of yB(t)
? main effects sum up to 1 (small interaction) ?
kinetics mainly dominates the process
37
A chemical system
Sensitivity of weights 1/s2
?no predominance in main effect ?kinetic
parameters have big interaction
38
A chemical system
Sensitivity of weights (likelihood)

There is a relationship between
the difference between total and first order
sensitivity indices,
the indeterminacy of the optimisation
(estimation) problem, and
the interaction structure of the input factors in
the posterior distribution after conditioning to
the observations

39
A chemical system
Scatter plots ? Main effect sensitivity indices
? RSA
40
A chemical system

initial condition is unimportant (smallest ST)
chemical rate factors mainly drive the model fit
to the experimental data (highest Si and SiT)
BUT, chemical rate factors cannot be precisely
estimated, because their effect is dominated by
interaction terms
high correlation between estimates of k? and E
is usually detected

41
A chemical system
Correlation analysis the coefficient
between kinetic parameters is positive the
couple of factors acts in the model as a
quotient/difference.
42
A chemical system
Let us now consider the same chemical system, but
assume that 9 sets of observations are available
at 9 different temperatures.
43
A chemical system
Sensitivity indices Correlation analysis
44
A chemical system

Comparing the same model with a different
experimental evidence
interaction between the kinetic factors is
strongly reduced
the correlation structure is now very weak
a model can be over-parameterised or not,
according to the evidence with which it is
compared, implying that the calibration and
sensitivity analysis exercise is useful also when
the same mathematical model is used to describe
different realities

45
Caveats
The performance of a global SA to the
weighting/likelihood functions gives a full
interaction structure, which obviously is not the
same as a functional representation. The latter
is something additional with respect to the
performance of a global SA and can be in some
cases a formidable task. This task usually
requires the use of computationally intensive
methods and/or the formulation of hypotheses
about the interaction structure and the
introduction of a certain degree of arbitrariness
for such representation.
46
Caveats

On the other hand, the application of a global SA
provides a quantitative evaluation about
fundamental aspects such as
which factors are important for calibration, i.e.
are somehow conditioned by observations
the degree of complexity of the interaction
structure
which factors are involved in the interaction
structure.
Such information has a general validity, since it
is obtained without assumptions about the model
structure and/or the error structure.

47
Interaction