Title: WATER QUALITY PREDICTION FOR RIVER BASIN MANAGEMENT
1WATER QUALITY PREDICTION FOR RIVER BASIN
MANAGEMENT
Olli MalveFinnish Environment Institute
Model identification, parameter estimation,
validation, prediction
2Cost efficient implementation of Water Framework
Directive (WFD) in EUOPEAN UNION
- the EU Water Framework Directive (WFD) was
adopted on 23 October 2000, with the following
key aims - to expand the scope of water protection to all
waters, surface waters and groundwater - to achieve "a good status" for all waters by a
set deadline (2015) - to implement water management based on river
basins - to introduce a "combined approach" laying down
emission limit values and quality standards - to involve citizens more closely
- to streamline the legislation
- to implement river basin management with
reasonable costs.
3Tight schedule of river basin plannning (by 2009
according to WFD).
- A river basin is managed as a natural
geographical and hydrological unit instead of
according to administrative or political
boundaries. Under the EU Water Framework
Directive a management plan needs to be
established for every river basin and updated
every six years. - River basin management plan is a detailed account
of how the objectives set for a river basin
(ecological status, quantitative status, chemical
status and protected area objectives) are to be
reached within the time scale required. The plan
should include the characteristics of the river
basin, a review of the impact of human activity
on the status of the water in the basin,
estimates of the effects of existing legislation,
the remaining "gap" to be closed in order to meet
these objectives and a set of measures designed
to fill that gap. Public participation is
essential, i.e. all interested parties should be
fully involved in the discussion of the
cost-effectiveness of the various possible
measures and in the preparation of the river
basin management plan as a whole.
4What is the most demanding in the planning and
decision making in river basin management?
- Predicting or guessing what is likely to happen
if we perform a certain management action - and
- planning and selecting actions which are likely
to attain selected water quality standard with
given probability.
5- The more accurate and precise predictions the
more efficient management actions - ?no over or under design of actions.
6Decision making and learning from the experience
of taken actions and accumulating observations
Design of experiments and monitoring
Planning or approval of management actions
Water quality prediction
Scientific learning
Decision making
Testing of the attainment of management objectives
Implementation of management plans
Data collection
7Framework for scientific learning and decision
making through prediction
- Statistical inference and causal reasoning
Structural equations
Differential equations
Frequentist
Bayesian
Regression equations
8Water quality prediction
- How can we predict water quality?
Theoretical understanding
Observational data
Water quality prediction
Causal structure
Predictive model
Water body
Experimental data
Model parameters
Not available in full scale
9How water quality prediction is efficiently
performed and used in planning and decision
making in river basin management?
MANAGEMENT
Selection of management objectives
?
Planning and selection of management actions
?
Water quality prediction
?
Attainment of management goals
10Connections between management, data collection
and prediction
11Basic elements of river basin management
Sustainable use and management and good
ecological status
Primary objective
Attainment of water quality standards
Secondary objective
Selection of feasible management actions
Decision
Design of management actions
Planning
Targeting of pollutant load reduction
Prediction
Set up of water quality standards and acceptable
probability of exceedance.
Decision
12Problems is water quality prediction and river
basin management
Inefficient and biased river basin management
Biased target pollutant load estimates
Large number of lakes, rivers.
Inefficient fitting, validation and prediction
Update of river basin plans every six years
Difficulty of coding, debugging, fitting and
validation
Biased predictions and unrealistic error
estimates
Large number of unknown parameters
Long simulation time
Imprecise parameter estimates
Approximate error estimates
Small longitudinal sample size
Large model errors
Complexity of mechanistic models
13To overcome the difficulties we need efficient
prediction methods
Efficient, precise and unbiased update of river
basin plans every six years
Management objective
Accuracy and precision
Easy of update
Realistic error estimates
Criterion of prediction
HIERARCHICAL model structure
Developed Bayesian methods
Bayesian inference and MCMC methods
Pooling of cross-sectional data
Synthesis of mechanistic and statistical
approaches
Objectives of this study
Mechanistic models
Statistical models
Traditional methods
14(No Transcript)
15(No Transcript)
16Adaptive management procedure (two update cycles)
Design of experiments and monitoring
Planning or approval of management actions
Water quality prediction
1. Update of predictions
2. Update of management plans
Testing of the attainment of management objectives
Implementation of management plans
Data collection
Decision making
Scientific learning
17Prediction methods for statistical decision making
- Different ways to causal reasoning and
probability estimation - Bayesian inference and MCMC methods
- Structural Equation Models (SEM)
- Hierarchical linear model
- Logistic model
- Mass balance models
- Differential equation models
- Bayes nets and decision trees (Decision making
tools)
18Classification of prediction methods
- 1. Mechanistic / statistical modelling
- causal structure - mechanistic model
- predictive error - statistical model
- 2. Classical / Bayesian statistical analysis
- Point estimate classical
- Full distribution - Bayes
- 3. Cross-Sectional / Longitudinal data
- several lakes cross-sectional
- one lake longitudinal
- 4. Hierarchical non hierarchical model
- lakes within lake types hierarchical
19Classification of prediction methods
Modeling approach
Mechanistic
Statistical
Bayesian
Model structure
Single level
Hierarchical Multilevel
Scientific discipline
Hydrological.
Biological.
Chemical
Orientation of data
Longitudinal
Cross-sectional
Mixture
20Why to estimate uncertainty of model parameters
and predictions 1. Experiment design (research)
(Scientific learning!) -----gt Selection of
variables to measure, timing of samples,
determination of accuracy and number of
measurements in oder to minimize uncertainty of
model 2. Decisoin making -----gt Statistical
inference of results -----gt realistic foundation
to decision making in water resources
management -----gt optimal water protection
measures
21Causes of model uncertainty
y
x
x
x
x
x
x
x
1. Inaccurate measurements 2. Low number of
measurements 3. Unfavorable timing or location of
measurements 3. Stochastic variation 4. Model
structure (unidentifiability of model
parameters) 5. etc.
x
x
22How accuracy of measurements, uncertainty of
models and stochastic behavior of phenomenons
has been taken into account in water resources
research?
Physics, hydrology, hydraulics -determinism
-differential equation models -sensitivity
analysis - hard-modelling
Biology, limnology, hydrology -statistical
inference -regression analysis -Soft-modelling
23How to integrate methodically hard- ja soft-
modelling? -to combine saving graces hard
deterministic (explanatory) modelling of
reaction kinetics, mass- and energy
balances soft statistical inference and
decision making with incomplete
information -promote co-operation between
environmental sciences
24Bayesian statistical inference with Markov chain
based Monte Carlo (MCMC) sampling
Parameters of a deterministic differential
equation model are taken as stochastic
variables, which has certain statistical
attributes (mean, standard deviation).
25Using Bayesin inference with MCMC methods we
can combine prior information with new
measurements ---gt posterior-distribution!
New measurement
Posterior-distribution
A prior information
Posterior can bee calculated with MCMC-methods
26After posterior-distributions of parameters has
been calculated confidence limits of predictions
can calculated with Monte Carlo sampling
y
Credible intervals
x
x
x
x
x
x
x
x
prediction
x
27Parameters of complicated environmental models
are seldom identifiable (parameters are
correlated)
Variance of parameters (uncertainty) is great!
Posterior-distribution of a parameter
Confidence limits of predictions are broad!
Confidence limits of a prediction
28BAYESIAN INFERENCE USING MCMC SAMPLING ALGORITHMS
- Model error and parameter uncertainty can be
estimated using Bayesian inference and MCMC
sampling techniques - ? full statistical distribution of predictions
- ? realistic design of margin of safety of
management plans - ? no over or under design of management actions
- ? cost effective implementation of management
29Bayesian posterior predictive inference
Target pollutant load estimate
Model validation and update
Posterior predictive distributions
New observations
Posterior simulation
Posterior distributions of parameters
Water quality standards
MCMC sampling Fitting of the model
Mechanistic model of processes which produce data
Observations
Prior distributions for model parameters
Statistical model of parameters and model error
30Bayesian inference and MCMC-sampling methods
P(m)Prior, information about parameters and
model structure before observations P(dm)likelih
ood-function, probability of data given the
model and error distribution P(md)posterior,
probability of the model given data.
Information about parameters and model structure
after observations.
31Frequentist (Classical) inference
- Fitting of non linear model to data
- Observationmodelerror
- Fitting with least squares methods
- ? parameter is a unknown constant, that is to be
estimated. Its confidence interval, which is the
measure of precision, is evaluated using linear
approximation
32What distinct Bayesian inference from Frequentist
- Takes formally into a account all uncertainties
in inference - Error-in-x-varibles can be taken into account
- Inclusion of prior information
- Efficient numerical sampling algorithms, MCMC
- Predictive distributions
- Posterior distribution includes all the
information necessary for decision making
33MCMC
- Markov chain Monte Carlo method is used for the
sampling of posterior distribution of parameter
vector ? if analytical solution is hard to find
(all the complicated models). - Metroplis-Hastings and Gibbs sampler are mostly
used.
34Steps of Bayesian inference and MCMC sampling
- Definition of model physical/empirical, error
distribution, parameters, control variables. - Observational or experimental design and data
collection. - Definition of Prior distributions uninformative,
normal, uniform, - Model fitting using MCMC-sampling methods
- Post processing of MCMC-chain density plots,
credible intervals, predictive distributions - Testing of normality of model error and
comparison of competing models
35Metropolish-Hastings algorithm
- (i) Select initial value ?0 and proposal
distribution q for parameter vector ?. - (ii) Take a random sample ? from the proposal
distribution q. - (iii) Take a random sample from uniform random
variable u 0,1 and accept the new random sample
e.g. ?i1 ?, if - Otherwise ?i1 ?i
- (iv) Go back to the step (ii) until enough
samples are generated. - Difficulty proposal distribution must be close
enough to the true posterior distribution.
Usually it is normal distribution.
36MH-algorithm for non linear model with normal
error distribution
- Likelihood function is now
- MH-algorithm with non informative prior
distribution p(?)1 and s2 - (i) Select ?0 and q.
- (ii)Sample ?new from proposal distribution
q(?old, ) and calculate SSnew. - Accept new sample if Ssnew lt SSold or if
- (iii) Go back to the step (ii) until enough
samples are generated.
37MARKOV CHAIN MONTE CARLO (MCMC) SAMPLING METHOD
FOR PARAMETER ESTIMATION
- Estimation of exact distributions for parameters
and predictions - Parameter uncertainty (dark area) of predictions
ja model error s (ligth dark area) yf(?,x)s
can be distinquished. - Is applicable to also to dynamic and non linear
oxygen and phytoplankton models.
38Usage of predictive distributions in planning and
decision making
Predicted concentration of contaminant with given
margin of safety f(probability of exceedance)
Concentration of contaminant
Nitrogen load
Predicted Chlorophyll a response surface
10
Chlorophyll a standard
Water quality standard
25
Target nitrogen load
25
50 mechanistic prediction
15
5
Phosphorus load
Load of contaminant
Target phosphorus load
Margin of safety of target load
Mechanistic prediction 50 probability of
exceedance ?target load is large and it is very
likely that water quality standard will not be
attained. Decision maker does not acknowlegde
uncertainty of a water quality prediction ? risk
of non-attaiment of water quality standard is out
of control. Use of artificial safety margins ?
fail to attain the standard or unnecessary high
cost.
39Structural Equation Model (SEM)
- Tests whether theoretical hypothesis about causal
relationships fit to empirical data. - Correct causal structure ? correct parameter
estimates ?better predictions
40Finnish lakes
41LAKE PYHÄJÄRVI in SÄKYLÄ research model
Planktiv Planktivorous fish Z zooplankton
(Crustacea) A3- Cyanobacteria TP total
phosphorus TN total nitrogen
42LAKE PYHÄJÄRVI in SÄKYLÄ Management model
Planktiv Planktivorous fish A3-
Cyanobacteria TPL total phosphorus load TNL
total nitrogen load
43Lakes in the lake monitoring network of Finnish
Environment Institute Targetting of nutrient
reduction to attain chlorophyll a standard.
44Table. Geomorphological typology of Finnish Lakes
specified by Finnish Environment Institute
(SASurface Area, DDepth).
- Lake Name Characteristics
- Type
- I Large, non-humic lakes SA gt 4,000 Ha,
color lt 30 - II Large, humic lakes SA gt 4,000 Ha,
color gt 30 - III Medium and small,
- non-humic lakes SA 50 - 4,000
Ha, color lt 30 - IV Medium Area,
- humic deep lakes SA500-4,000 Ha, color
- 30-90,Dgt3 m
- V Small, humic, deep lakes SA 50-500 Ha,
color30-90, - Dgt3 m
- VI Deep, very humic lakes Color gt 90, D gt 3 m
- VII Shallow, non-humic lakes Color lt 30, D lt 3
m - VIII Shallow, humic lakes Color 30-90, D lt 3
m - IX Shallow, very humic lakes Color gt 90, D lt
3 m
45Objectives of river basin planning of Finnish
Lakes
Primary management objectives
High utility for water uses
Good ecological status
Attainment of chlorofyll a standard
Secondary management objectives
Reduction of nutrient concentrations in lake
Management actions
Reduction of nutrient load
Chlorophyll a lt 30 ug/l
Water quality standard
46Computational prediction tools for Finnish lakes
-Lake specific chlorophyll a -Target nutrient
concentration
-Parameter posterior distributions -Error
variances
Predictions
MCMC sampling
Bayesian posterior predictive inference method
Prior parameter values and data
HIERARCHICAL linear regression model
Statistical model
-Nutrient concentrations -Chlorophyll a
standard -Acceptable probability of exceedance of
chlorophyll a standard
Decision variables
47Hierarchical Chlorophyll a data
- A hierarchical data structure arises if
individual lakes are sampled cross-sectionally
but studied longitudinally. - e.g. lakes within lake types, or lake types
within eco regions, or eco regions within
continents (Malve and Qian 2006).
Ecoregion
Laketype 2
Laketype 1
Laketype N
Laketype 3
Cross-sectional direction
Lake 1
Lake 2
Lake 3
Longitudinal direction
48Hirarchical linear chlorophyll a model
DAG diagram
ß
s2
ßi
s2i
t
ßij
yijk
xijk
49Partial pooling in predicting chlorophyll a in a
lake with few observations per lake
Interclass correlation ?
Log(Chla)
x
- Instead of predicting from lake or from a lake
type hierarchical model partially pools (average
weight by number of observations) information
from those two populations. - If interclass correlation is high 1 ? pooling is
complete eg. prediction is based on lake type
population - If interclass correlation is low 0 ? no pooling
eg. prediction is based on lake type - If interclass correlation is 0,1 (partially
pooling) ? weighted average of those two
populations. - ? If number of observations per lake is small
partially pooled predictions are more precise and
accurate than completely or not pooled predictions
50- Most of the lakes are observed few times
?lake-type-specific Chla regression models may be
inaccurate and imprecise on lake level
Lake Onkijärvi three observations
51Hierarchical linear model
- Partial pooling of information from different
levels of hierarchy increases accuracy and
precision of lake specific chlorophyll a
predictions
Lake Onkijärvi three observations
52- Observational nutrient concentration ranges don't
always cover targeted water quality criteria - ? the lake specific Chla regression model is not
useful in water quality prediction. - Due to the partial pooling of information from
lake-type-level a hierarchical chlorohyll a
regression model can "extrapolate" outside lake
specific observational range. - This is useful if a lake is observed within a
eutrophic region and management target is some
where in a mesotrophic region.
Päijänne
53- The full statistical distribution of parameters
and predictions can be estimated using Markov
chain Monte Carlo sampling methods and Byesian
Inference (Gelman et al. 2005). Freely available
OpenBug-software (http//mathstat.helsinki.fi/open
bugs/) is useful in estimation.
54Logistic regression model
- Phytoplankton bloom in a lake has binomial
distribution 1 bloom and 0 no bloom. - Probability of a bloom piPr(yi1) is predicted
using logistic regression model
log(pi/(1-pi))Xiß
55Observed and predicted probability of a bloom in
clear water lakes in Finland
56MASS BALANCE MODELS
- are based on the law of concervation of mass in a
lake nutrient cannot dissappear and be created
from nothing - dMIN-OUTdStoragesedimentation
- Mass fluxes (IN, OUT) and storage change
(dStorage) can be measured. - Sedimentation rate can be estimated using above
mentioned observations.
57Nutrient retention model using MCMC
Chapras phosphorus modeli (1974) V dC/dt L
Q C -vs As C gt C L / (Q vs As )
Estimated distribution of sedimentation rate
Lake phoshorus concentration vs. phosphorus load.
Calculated and observed
80 fraktil of predicted phusphorus (20
probability of exceedance) as a function of load
Predicted distribution of phosphorus with given
load 0,10,20, ,150 kg/d
58Usage of chlorophyll a regression model and
nutrient retention models in setting the target
nutrient loads with given chla standard and
accepted probability of exceedance.
Chlorophyll a prediction
Prediction with present load
Target nutrient concentrations and loads with
given chla target (5 ug/l)
Total nitrogen prediction
Total nitrogen load mg/m2/d
Target nutrient concentrations and loads with
given chla target (5 ug/l)
Total phosphorus prediction
Total phosphorus load mg/m2/d
59Differential equation (mechanistic) models
- Widely used in river basin management
- High computational costs and data requirements
- Model error and parameter uncertainty can be
estimated using Bayesian inference and MCMC
sampling techniques - ? full statistical distribution of predictions
- ? realistic design of margin of safety of
management plans - ? no over or under design of management actions
- ? cost effective implementation of management
60Mechanistic modeling
- Limitations
- Error analysis is not straightforward
- High computational cost and data requirements.
61Case study 1
- Lappajärvi How much is it necessary to reduce
external phosphorus load into to the lake to
prevent algal blooms?
62Objectives of River basin planning of Lake
Lappajärvi in Ähtävänjoki river basin
Small down stream nutrient flux
Good raw water for
High recreational value
Primary management objectives
Secondary management objectives
Attainment of Chlorophyll a standard
Reduction of non-point nutrient load
Management actions
Water quality standard
Chlorophyll a gt 10 ug/l
63Computational prediction tools for Lake Lappajärvi
-Temperature stratification, -Thickness of
ice -Vertical mixing
-Chlorophyll a -Total phosphorus -Dissolved
oxygen -Target phosphorus load
Predictions
Statistical error analysis method
Not applied
PROBE model vertical mixing
WATER QUALITY model transformations and mass
balances
Mechanistic models
Decision variables
-Total phosphorus load -Chlorophyll a standard
64The targeting of phosphorus reduction using
mechanistic water quality model
Chlorophyll a prediction with different loading
reductions
The best available protection measures (loading
level 4) are predicted to attain chlorophyll a
standard (10 ug/l). ? Target phosphorus load
65Case study 2
- River Kymijoki setting the limits for the dioxin
migration during the planned restoration
dredging.
66Objectives of the assesment of the effect of
restoration dredging on the migration of
contaminated sediments in River Kymijoki
Elimination of the exposure of people and organism
Primary management objectives
Compliance with dioxin standard during and after
restoration dredging
Secondary management objectives
Management actions
Restoration dredging
Water quality standard
Sediment dioxin standard
67Computational prediction tools for River Kymijoki
Predictions
-Flow velocity -Water level
-Migration of dioxin -Dioxin in water and in
sediments -Criteria for restoration dredging
Statistical error analysis method
not applied (sensitivity analysis)
1-D RIVER FLOW model de Saint Venant eduations,
longitudinal flow dynamics
1-D RIVER DISPERSION model Convection,
dispersion, sedimentation, erosion and sediment
consolidation
Mechanistic models
Decision variables
-Dredging methods -Dioxin standard
68Predicted dioxin content of suspended solids in
river water during the planned restoration
dredging if 10 or 1 of dredged sediment is
resuspended.
10 will be resuspened
1 will be resuspended
Schedule of dredging
ConclusionIf 1 will be resuspended, dioxin
content of suspended solids in river water would
not exceed its normal variation. In contrast,
if 10 will be resuspened, dioxin content in
river water would exceed its normal variation
many fold.
69Case study 3
- Lake Tuusulanjärvi real time control of
artificial oxygenation devices
v. 2000-2005
70Objectives of design and real time control of
artificial oxygenation of Lake Tuusulanjärvi
High recreational value
Good catch of fish
Primary management objectives
Secondary management objectives
Reduction of algal blooms
Good fish habitats
Attainment of dissolved oxygen standard
Reduction of internal phosphorus load
Actions not included in this study
-Reduction of municipal waste water load
-Reduction of Non-point nutrient load
Artificial oxygenation of hypolimnion
Management actions
-Dilution of lake water with nutrient poor water
-Fisheries management
Water quality standard
Dissolved oxygen gt 4 mg/l
71Computational prediction tools for Lake
Tuusulanjärvi
-Dissolved oxygen -Target oxygenation efficiency
-Parameter posterior distributions -Error
variances
Predictions
MCMC sampling
Bayesian posterior predictive inference methos
Prior parameter values and data
RESPIRATION sub model Respiration and artificial
dissolved oxygen feed
Mechanistic model
-Artificial oxygenation efficiency -Dissolved
oxygen standard -Acceptable probability of
exceedance of dissolved oxygen standard
Decision variables
72Estimation oxygen depletion rate of lake
Tuusulanjärvi during vinter ice cover
period -data set measured temperature and
oxygen concentrations during 1968-2000 Questions
1. decrease of oxygen depletion rate in 1979
after the end of waste water loading from the
Järvenpää city? 2. impacts of restoration
measures (artificial oxygenation pumping after
1973) - dimensioning of pumps? 3. prediction of
oxygen depletion during vinter? Physical
circumstances for oxygen depletion under ice
cover are very stabile oxygen depletion rate
can be estimated as a linear function of time or
as a non linear function of temperature and time.
The later was chosen for didactical purposes.
Ice coveri
Oxygen depletion in the water and on bottom (k)
Oxygenation pump
73(No Transcript)
74(No Transcript)
75(No Transcript)
76Prediction of dissolved oxygen regime and
dimensioning of artificial oxygenation
efficiency.
77Case study 4
- Lake Pyhäjärvi in Säkylä nutrient load reduction
and fisheries management to attain cyanobacteria
biomass standard.
v. 1992-2007
78Objectives of river basin planning of Lake
Pyhäjärvi
Smal down stream nutrient flux
Good raw water for industry
High recreational value
Good catch of fish
Primary management objectives
Attainment of algal biomass standard
Secondary management objectives
Reduction of nutrient concentrations
Increase of zooplankton biomass
Reduction of non-point nutrient load
Fisheries management
Management actions
Algal biomass gt 0.86 mg/l
Water quality standard
79Computational prediction tools for Lake Säkylän
Pyhäjärvi
Dominating algal groups -Diatomophycea -Chrysophy
cea -N fix. Cyanobacteria -Minor groups
-Parameter posterior distributions -Rrror
variances -Target nutrient and zooplankton
concentration
Predictions
Bayesian posterior predictive inference method
MCMC sampling
Prior parameter values and data
PHYTOPLANKTON SUB MODEL Growth, decay and grazing
Mechanistic model
-Nutrient concentrations and zooplankton
biomass -Algae biomass standard -Acceptable
probability of exceedance of algal biomass
standard
Decision variables
80Calibration data 1992-1999 and validation data
2000-2004
Diatomophycea
Chrysophyceat
Cyanobacteria
Other groups
Globaal irradiance
Zooplankton
Discharge
81Phytoplankton model
82Model fit using Bayesian inference and MCMC
sampling methods Observations and predictive
distributions
Diatomophycea
Cyanobacteria
otherst
Chrysophycea
1992
1993
1994
1995
1996
1997
- 5-95 credible interval of predictions consists
of - Parameter uncertainty (Dark area)
- Model error (ligth dark area)
1998
1999
83Prior and posterior distribution of parameters
Diatomophycea
Chrysophycea
Cyanobacteria
Other groups
Growth rate
Death respiration sedimentation
Temperature coefficient
Half saturation constant of global irradiance
Half saturation constant of TP
Half saturation constant of TN
Filtration rate of zooplankton
84POSTERIOR SIMULATIONPredicted probability of
exceedance of average Cyanobacteria biomass 0.85
mg L-1 as a function of TP ja zooplankton (Z)
biomass
85Simulated Cyanobacteria predictions
(P(Agt0.86)0.05) as a function of total
phosphorus load, temperature and zooplankton
(Crustacea) biomass
Top-down control Planktivorous
fish ? Zooplankton ? Cyanobacteria
86Summary
- Mechanistic models
- High computational cost and data demands
- Predictive distributions can be calculated using
Bayesian inference and MCMC sampling methods - Get causal structure right (SEM)?realiable
predictions - 2. Linear regression models are efficient tools
for statistical decision making in river basin
management
87- Predictive distributions ? realistic and
efficient design of safety margins of management
plans. - More on water quality prediction for river basin
management http//lib.tkk.fi/Diss/2007/isbn978951
2287505/