Title: BAYESIAN NETWORKS
1BAYESIAN NETWORKS IN MODEL AND DATA INTEGRATION
AND DECISION MAKING IN RIVER BASIN MANAGEMENT
USING Consideration of opportunities for Bayes
networks in predictive water quality
modelling Olli Malve (M. Sc.) Water Resources
Research and Management Citations from Ames,
D.P. Neilson, B. (Utah Water resources
Laboratory) 2001 Bayesian Decision Networks for
total maximum daily load analysis East Canyon
Creek Case Study (WWW-document). Reckhow, K.H.
(UNC Water Resources Research Institute, North
Carolina State University, Raleigh, USA) 1999
Water quality prediction and probability network
models probability network model for nitrogen
enrichment and algal blooms in the Neuse River
(Can.J. Aquat. Sci. 561150-1158 (1999)). see
also http//www2.ncsu.edu/ncsu/CIL/WRRI/ken's_pa
ge.html (home page of K. Reckhov) http//www.epa.g
ov/OWOW/tmdl/ (A Total Maximum Daily Load (TMDL)
program)
2Bayes network for discrete variables implement
with Hugin software
- Do not include real Bayesian update of parameters
with new data. - There are several other statistical and
computational methods and software one of the
best - OpenBugs for continuous variables was used
in hierarchical modeling of Finnish lakes. - Resembles Structural equation models.
- They both belong to the family of Graphical
probibilistic models.
3Hirarchical linear chlorophyll a model
DAG diagram
ß
s2
ßi
s2i
t
ßij
yijk
xijk
4Structural equation model
LAKE PYHÄJÄRVI in SÄKYLÄ research model
Planktiv Planktivorous fish Z zooplankton
(Crustacea) A3- Cyanobacteria TP total
phosphorus TN total nitrogen
5PHYSICAL WAY OF THINKING Hydraulic routing of
ground and surface water flow in drainage basin,
in river channels, in lakes and in estuaries.
6Drainage basin, river, lake and estuary are
linked with hydraulic principles High spatial
and temporal resolution
7STATISTICAL INFERENCE Small-scale transport and
transformation processes of pollutants in
drainage basin are summarized with
probabilistic expression that characterize the
aggregate response of interest to the decision
makers.
8Outcomes expressed as probabilities are an
acknowledgement of the lack of precission in
predictive models
9BAYES NETWORKS Formally, BNTs are directed
acyclic graphs in which each node represents a
random variable, or uncertain quantity, whick can
take two or more possible values.
10Each node represents a multi-valued variable,
comprising a collection of mutually exclusive
hypothesis (state of a lake Oligotrophic,
Mesotrophic, Eutrophic) or observations
(nutrient loading Low, Medium, High)
11The arcs signify the existence of direct causal
influence between the linked variables, and the
strength of these influences are quantified by
conditional probabilities
12Conditional probability (each direct link X-gtY)
discrete variables is quantified by a fixed
conditional probability matrix M, in which the
(x,y) entry is given by Myx?P(yx) ?P(Yy
Xx) P(y1x1) P(y2x1) ... P(ynx1) P(y1x2)
P(y2x2) ... P(ynx2) . .
. . . . .
. . P(y1xm) P(y2xm) ...
P(ynxm)
13QUANTIFYING THE LINKS Bayes learning of
Conditional Probability Matrix (CPM) from 1.
Observational data -simultaneus observations of
each variable are tabulated, sorted by the parent
variables and converted into categories as
prescribed in node definitions. -for every
combination of states of parent nodes, the number
of occurences of states of the child is
counted. -probabilities are calculated as a
number of occurences of a child state divided by
the total number of observations for the
combination of parent states
142.Parameter learning from Model simulations
(uncertainty analysis such as Monte Carlo
simulations) -varying the selected input
variables about an appropriate distribution and
drawing random samples from model parameter
distributions -gtresults of simulations at the
selected output variables are tabulated with
their corresponding set of input variable
conditions -gtCPM is generated from this data
tabulation using the same method described above
for observational data
153. Parameter learning from scientists, experts,
stakeholders, cost and benifits If data is not
available and typical models are not appropriate,
conditional probability tables can be generated
by eliciting information from experts and
stakeholders. -in the case of cost and benifit
analysis for example the costs assosiated with
wastewater treatment plant upgrade will likely
need to be elicited from experts and through
market inquiries -benefits assosiated with water
quality improvement (recreation, biological
habitat, esthetics and other environmental
benefits) are subjective in nature and are
difficult to quantify without input from local
individuals, stakeholders and experts The
probabilistic relationships described here may be
more difficult to generate than those calculated
from data and models.
16DECISIONS AND UTILITY A Bayesian Decision Network
(BDN) is a specific form of a Bayesian network
that includes decision and utility nodes and is
used to model the relationship between decisions
and outcomes. Decision node contain descrete
options instead of a probability distribution
across states. Decision node can only exist in
one state at a time, representing a decision or
management option made between multiple
choices. Utility node provide a simple mean for
estimating expected values of different outcomes.
Expected value E of an uncertain outcome with n
states (i1n) is computed as E?Pi Bi , where
a benifit Bi, associated with each state, and a
probability, Pi, of being in each state.
17APPLICATION OF Bayes Decision Networks 1.
Defining the problem 2. Integrating disparate
data rources 3. Scenario generation and
analysis 4. Building a Bayesian Decision Network
(Influence diagram) 5. Obtaining Probability
Distributions
18Decision tree
- Bayesin networks can be transformed to decision
tree
Bayes net
Decision tree
0.7
Get ill
Algal bloom (yes/no)
Algal bloom
yes
Go swimming (yes/no)
0.3
yes
Feeling well
no
Go swimming
0.1
Get ill
no
Algal bloom
Get ill (yes/no)
yes
0.9
Feeling well
no
Hot sunshine
19LIST OF REFERENCES Varis, O. (1990 onwards) 1.
Restoration of a temperate lake 2. Fisheries
management in trophical reservoir 3. Real-time
monitoring system for a river 4. Rehabilitation
of fisheries in a temperate river 5. Cod
fisheries management 6. Salmon fisheries
management 7. Cost-effective wastewater treatment
for a river 8. A nationwide climatic change
impact assessment
20Ames, D.P. Neilson, B. (Utah Water resources
Laboratory) 2001 Bayesian Decision Networks for
total maximum daily load analysis East Canyon
Creek Case Study. Reckhow, K.H. (UNC Water
Resources Research Institute, North Carolina
State University, Raleigh, USA) 1999 Water
quality prediction and probability network
models probability network model for nitrogen
enrichment and algal blooms in the Neuse River
(Can.J. Aquat. Sci. 561150-1158 (1999)). .
21SUMMARY Bayesian Decision Networks provide
successful way to make educated decisions. BDN is
simple for stakeholder involvement and
understanding, while still containing proven and
defensible science. BDN is a tool for
communication between scientists, stakeholders
and decision makers.
22Bayesian Decision Networks 1. provides a good
conceptual framework for clear defining relevant
variables 2. etablishes the relationship between
causes and effects in the system 3. Integrates
different sources of information into a single
analytic tool 4. Captures model responses for
quick scenario generation and investigation 5.
Quantifies risk which can be used in establishing
the marigin of safety
23A carefully devised and calibrated probabiltiy
network model is ideally designed to communicate
at the interface between scientists,
stakeholders, and decision makers. By
acknowledging the sometimes-substantial
uncertainty in model predictions, we enhance,
rather than diminish, the value of predictive
modelling by focusing on the model ability to
estimate risk.
24Bayesian Decision network (Influence diagram) of
Lake Säkylän Pyhäjärvi
25Management scenarion
26Studying the effect of zooplankton and TotP-load
27Studying the effect of management actions on the
costs and the attainment of water quality
standards
Conditional marginal distributions of costs,
attainment of water quality satndard and
Cyanobacteria (BlueGmax) summer maximum biomass
with given Buffer Strip width (21 36 m),
wetland percentage (1.1 1.25 ), forestation
(25 31 ) and fish catch ( 3, in a artificial
scale which will be replaced after expert
judgement).
28Water quality modelling and probability network
models with reference to Reckhow, K.H. Can. J.
Fish. Aquat. Sci. 561150-1158 (1999). Modelling
for nitrogen enrichment and algal blooms in Nuese
River, Canada with Bayes nets - probabilistic
prediction of eutrophication
29Initial forcing function Spring precipitation
is expressed as marginal probabilities assessed
from statistics on historic precipitation data in
the watershed. Distribution was segmented into
three eually likely precipitation ranges (below
average, average, above average).
30The probabilities for precentage forested
buffer reflect a judgemental assessment of the
total perennial stream miles in the Neuese River
watershed that would be required to have a
maintained minimum width buffer, based on the
project outcome of proposed management plans. The
resultant probability estimates are given in the
table.
31Conditional probabilities were assessed for the
four intermediate conditional probabilities.
Precentage of nitrogen load reduction was
conditional on only the precentage of forested
buffer. A scientific expert was consulted for a
probabilistic statement reflecting the expected
reduction in nitrogen loading due to buffers
alone.
32The nitrogen concentration was expressed as a
fuction of spring precipitation and the
nitrogen loading reduction in the absence of
data to fit a statistical model for these
variables, nitrogen concentration was based on
scientific judgement. The relationship between
summer precipitation and summer streamflow
were based on the statistical model developed
from precipitaion and sreamflow data.
33The conditional probabilities for the reponse
variable algal bloom were based in the
scientific judgement (for the effect of nitrogen
concentration) and in part on the interpretation
of chlorophyll a versus flow data. Using the
data, the chlorophyll levels were grouped to
algal bloom categories, and flow data were
grouped into flow categories. The relative
frequency of data points in each algal bloom /
flow group determined the initial
probabilities these probabilities were further
decomposed, using judgement, to account for the
effect of nitrogen concentration.
34Conditional probabilities for anoxia were based
on judgement. These responce variable conditional
probabilitites are presented in the table below.
35Probabilities expressed in earlier pages can be
combined into a joint probability on all
variables, which when allows us to solve for a
number of interesting variables. While all
marginal and conditional probabilities can be
easily calculated using the estimates,
computation in larges problems is facilitaed with
Bayes nets software.
From the probabilities expressed earlier the
marginal probability of anoxia is 0.30 in
Bayesian terms, this calculation reflects only
prior information. If the implementation of
management option could assure that at least 95
of streams had the the required buffer (p(95-100
for forested buffer) 1.0), then anoxia
probability drops slightly to 0.27. This
calculation, although hypothetical, is indicative
of the types of policy related questions that can
be addressed with a complete probabiltiy network
model.
36As another example probabilities presented
earlier yields p(severe algal bloom) 0.18. We
can make the Bayes net more useful by combining
the prior probabilities in the network with new
(sample) information to produce a posterior
probabiltity, using Bayes Theorem. For example,
if spring precipitation is observed to be above
average, then the conditional probability becomes
p(sereve algal bloom/above-average spring
precipitation) 0.21. If, instead, summer srteam
flow is extreamly low (lt500 ft3/s) then p(serve
algal bloom/summer flow lt500) 0.33. Both events
together yield p(serve algal bloom/ above-average
spring precipitation, summer flow lt500)
0.37. The types of what if probabilistic
calculations are relatively quick and easy, even
with much larger and more realistic probability
network model. In addition, since Bayes Theorem
allows the new observational information to be
combined with the prior probabilities, as more
observational information is incorporated into
the analysis, the often-subjective prior is
dominated by the newer data-based sample
evidence. Outcomes expressed as probabilities
are an acknowledgement of the lack of precision
in predictive models. The probabilities, and
relative change in probabilities between
scenarios, give decision makers and stakeholders
an explicit characterization of risk.
37For example, is the probability of a severe algal
bloom of 0.37 unacceptably high? If management
actioncould reduce this probability from 0.37 to
0.20, is that worthwhile, given the costs and
changes in attributes? Questions like these are
of interest to stakeholders can be examined using
probabiltiy network models. The example above
discussed above suggests that the primary sources
of information used to characterize
probabiltities are (1) observational/experimental
evidence or data and (2) expert scientific
judgement. In conventional modelling studies,
observational information that is based on
precise measurements of variable or ralationship
of interest is likely to be the least
controversial and most useful information. It is
unfortunate but common fact that observational
data for parametrization of water quality models
are almost allways woefully inadequate for the
task.
38What would be the basis for selection of a
predictive model? Few will argue againts the
viewpoint that the model should be as simple as
possible. However, it is also true that few argue
againts the viewpoint that model as accurate as
possible, and it is likely that few will argue
againts the viewpoint that a model should
correctly characterize process. Unfortunately ,
these desirable features for models are often in
conflict with one another. Here a probability
network model is recommented as a predictive
model to guide Neuse river decision making
because of uncertainty, or accuracy, is believed
to be an essential attribute for a predictive
model. Does this mean that we can ignore correct
process discription and focus on probabilities?
No! It is important to regonize that any process
model can be easily incorporated into a
probability network model if the accuracy of the
mathematical process discription can be
quantified and is acceptable. For example, any
(or all) of the mecanistic process discriptions
in CE-QUAL-W2 can be represented in a
probability network model if all relationships
are expressed probabilistically. For this to
happen, of course, a complite uncertainty
analysis must be undertaken for the CE-QUAL-W2
process description.