Title: Erin Peterson
1Predicting Water Quality Impaired Stream Segments
using Landscape-scale Data and a Regional
Geostatistical Model
- Erin Peterson
- Environmental Risk Technologies
- CSIRO Mathematical Information Sciences
- St Lucia, Queensland
2Space-Time Aquatic Resources Modeling and
Analysis Program
The work reported here was developed under STAR
Research Assistance Agreement CR-829095 awarded
by the U.S. Environmental Protection Agency (EPA)
to Colorado State University. This presentation
has not been formally reviewed by EPA. EPA does
not endorse any products or commercial services
mentioned in this presentation.
3Collaborators
Dr. David M. Theobald Natural Resource Ecology
Lab Department of Recreation Tourism Colorado
State University, USA Dr. N. Scott
Urquhart Department of Statistics Colorado State
University, USA Dr. Jay M. Ver Hoef National
Marine Mammal Laboratory, Seattle, USA Andrew A.
Merton Department of Statistics Colorado State
University, USA
4Overview
- Introduction
-
- Background
-
- Patterns of spatial autocorrelation in stream
water chemistry -
- Predicting water quality impaired stream segments
using landscape-scale data and a regional
geostatistical model A case study in Maryland,
USA
5Water Quality Monitoring Goals
- Create a regional water quality assessment
- Ecosystem Health Monitoring Program
- Identify water quality impaired stream segments
6Probability-based Random Survey Designs
- Advantages
- Statistical inference about population of streams
over large area - Reported in stream kilometers
- Disadvantages
- Does not take watershed influence into account
- Does not identify spatial location of impaired
stream segments
7Purpose
Develop a geostatistical methodology based on
coarse-scale GIS data and field surveys that can
be used to predict water quality characteristics
about stream segments found throughout a large
geographic area (e.g., state)
8(No Transcript)
9Geostatistical Modeling
- Fit an autocovariance function to data
- Describes relationship between observations based
on separation distance
Distances and relationships are represented
differently depending on the distance measure
10Distance Measures Spatial Relationships
Straight-line Distance (SLD) Geostatistical
models typically based on SLD
11Distance Measures Spatial Relationships
Symmetric Hydrologic Distance (SHD) Hydrologic
connectivity Fish movement
12Distance Measures Spatial Relationships
Asymmetric Hydrologic Distance Longitudinal
transport of material
13Distance Measures Spatial Relationships
- Challenge
- Spatial autocovariance models developed for SLD
may not be valid for hydrologic distances - Covariance matrix is not positive definite
14Asymmetric Autocovariance Models for Stream
Networks
- Weighted asymmetric hydrologic distance (WAHD)
- Developed by Jay Ver Hoef
- Moving average models
- Incorporate flow volume, flow direction, and use
hydrologic distance - Positive definite covariance matrices
Ver Hoef, J.M., Peterson, E.E., and Theobald,
D.M., Spatial Statistical Models that Use Flow
and Stream Distance, Environmental and Ecological
Statistics. In Press.
15Patterns of Spatial Autocorrelation in Stream
Water Chemistry
16Objectives
- Evaluate 8 chemical response variables
- pH measured in the lab (PHLAB)
- Conductivity (COND) measured in the lab µmho/cm
- Dissolved oxygen (DO) mg/l
- Dissolved organic carbon (DOC) mg/l
- Nitrate-nitrogen (NO3) mg/l
- Sulfate (SO4) mg/l
- Acid neutralizing capacity (ANC) µeq/l
- Temperature (TEMP) C
- Determine which distance measure is most
appropriate - SLD
- SHD
- WAHD
- More than one?
17Dataset
- Maryland Biological Stream Survey (MBSS) Data
- Maryland Department of Natural Resources
- Maryland, USA
- 1995, 1996, 1997
- Stratified probability-based random survey design
- 881 sites in 17 interbasins
18Maryland, USA
Baltimore
Annapolis
Washington D.C.
Chesapeake Bay
19Spatial Distribution of MBSS Data
20GIS Tools
Automated tools needed to extract data about
hydrologic relationships between survey sites did
not exist! Wrote Visual Basic for Applications
(VBA) programs to
- Calculate watershed covariates for each stream
segment - Functional Linkage of Watersheds and Streams
(FLoWS) - Calculate separation distances between sites
- SLD, SHD, Asymmetric hydrologic distance (AHD)
- Calculate the spatial weights for the WAHD
- Convert GIS data to a format compatible with
statistics software - FLoWS tools will be available on the STARMAP
website - http//nrel.colostate.edu/projects/starmap
-
21Spatial Weights for WAHD
- Proportional influence (PI) influence of each
neighboring survey site on a downstream survey
site - Weighted by catchment area Surrogate for flow
volume
22Spatial Weights for WAHD
- Proportional influence (PI) influence of each
neighboring survey site on a downstream survey
site - Weighted by catchment area Surrogate for flow
volume
survey sites stream segment
23Spatial Weights for WAHD
- Proportional influence (PI) influence of each
neighboring survey site on a downstream survey
site - Weighted by catchment area Surrogate for flow
volume
A
C
B
E
D
F
G
H
24Data for Geostatistical Modeling
- Distance matrices
- SLD, SHD, AHD
- Spatial weights matrix
- Contains flow dependent weights for WAHD
- Watershed covariates
- Lumped watershed covariates
- Mean elevation, Urban
- Observations
- MBSS survey sites
25Geostatistical Modeling Methods
- Validation Set
- Unique for each chemical response variable
- Initial Covariate Selection
- 5 covariates
- Model Development
- Restricted model space to all possible linear
models - 4 model sets
26(No Transcript)
27Geostatistical Modeling Methods
- Covariance matrix for SLD and SHD models
- Fit exponential autocorrelation function
28Geostatistical Modeling Methods
- Model selection within model set
- GLM Akaike Information Corrected Criterion
(AICC) - Geostatistical models Spatial AICC (Hoeting et
al., in press)
where n is the number of observations, p-1 is the
number of covariates, and k is the number of
autocorrelation parameters. http//www.stat.col
ostate.edu/jah/papers/spavarsel.pdf
- Model selection between model types
- 100 Predictions Universal kriging algorithm
- Mean square prediction error (MSPE)
- Cannot use AICC to compare models based on
different distance measures - Model comparison r2 for observed vs. predicted
values
29Results
- Summary statistics for distance measures
- Spatial neighborhood differs
- Affects number of neighboring sites
- Affects median, mean, and maximum separation
distance
30Results
Mean Range Values SLD 28.2 km SHD 88.03
km WAHD 57.8 km
- Range of spatial autocorrelation differs
- Shortest for SLD
- TEMP shortest range values
- DO largest range values
31Results
- Distance Measures
- GLM always has less predictive ability
- More than one distance measure usually performed
well - SLD, SHD, WAHD PHLAB DOC
- SLD and SHD ANC, DO, NO3
- WAHD SHD COND, TEMP
- SLD distance SO4
32Results
Predictive ability of models
Strong ANC, COND, DOC, NO3, PHLAB Weak DO,
TEMP, SO4
r2
33Discussion
Distance measure influences how spatial
relationships are represented in a stream network
- Sites relative influence on other sites
- Dictates form and size of spatial neighborhood
- Important because
- Impacts accuracy of the geostatistical model
predictions
34(No Transcript)
35Discussion
- Probability-based random survey design (-)
affected WAHD - Maximize spatial independence of sites
- Does not represent spatial relationships in
networks - Validation sites randomly selected
36Discussion
WAHD models explained more variability as
neighboring sites increased
- Not when neighbors had
- Similar watershed conditions
- Significantly different chemical response values
37Discussion
- GLM predictions improved as number of neighbors
increased - Clusters of sites in space have similar watershed
conditions - Statistical regression pulled towards the cluster
- GLM contained hidden spatial information
- Explained additional variability in data with gt
neighbors
38Predictive Ability of Geostatistical Models
r2
39Conclusions
- Spatial autocorrelation exists in stream
chemistry data at a relatively coarse scale - Geostatistical models improve the accuracy of
water chemistry predictions - Patterns of spatial autocorrelation differ
between chemical response variables - Ecological processes acting at different spatial
scales - SLD is the most suitable distance measure at
regional scale at this time - Unsuitable survey designs
- SHD GIS processing time is prohibitive
40Conclusions
- Results are scale specific
- Spatial patterns change with survey scale
- Other patterns may emerge at shorter separation
distances - Further research is needed at finer scales
- Watershed or small stream network
- New survey designs for stream networks
- Capture both coarse and fine scale variation
- Ensure that hydrologic neighborhoods are
represented
41Predicting Water Quality Impaired Stream Segments
using Landscape-scale Data and a Regional
Geostatistical Model A Case Study In Maryland
42Objective
Demonstrate how a geostatistical methodology can
be used to compliment regional water quality
monitoring efforts
- Predict regional water quality conditions
- Identify the spatial location of potentially
impaired stream segments
43(No Transcript)
44Methods
Potential covariates
45Methods
Potential covariates after initial model
selection (10)
46Methods
- Fit geostatistical models
- Two distance measures SLD and WAHD
- Restricted model space to all possible linear
models - 1024 models per set
- 9 model sets
- Parameter Estimation
- Maximized profile log-likelihood function
47Methods
48Results
- SLD models performed better than WAHD
- Exception Spherical model
- Best models
- SLD Exponential, Mariah, and Rational Quadratic
models
- r2 for SLD model predictions
- Almost identical
- Further analysis restricted to SLD Mariah model
49Results
- Covariates for SLD Mariah model
- WATER, EMERGWET, WOODYWET, FELPERC, MINTEMP
- Positive relationship with DOC
- WATER, EMERGWET, WOODYWET, MINTEMP
- Negative relationship with DOC
- FELPERC
50Cross-validation intervals for Mariah model
regression coefficients
- Cross-validation interval 95 of regression
coefficients produced by leave-one-out cross
validation procedure - Narrow intervals
- Few extreme regression coefficient values
- Not produced by common sites
- Covariate values for the site are represented in
observed data - Not clustered in space
51r2 Observed vs. Predicted Values
1 influential site r2 without site 0.66
n 312 sites r2 0.72
52Model Fit
53Discussion
- SLD models more accurate than WAHD models
- Landscape-scale covariates were not restricted to
watershed boundaries
- Geology type
- Temperature
- Wetlands water
54Discussion
- Regression Coefficients
- Narrow cross-validation intervals
- Spatial location of the sites not as important as
watershed characteristics - Extreme regression coefficient values
- Not produced by common sites
- Not clustered in space
- Local-scale factor may have affected stream DOC
- Point source of organic waste
55Spatial Patterns in Model Fit
- North and east of Chesapeake Bay - large SPE
values - Naturally acidic blackwater streams with elevated
DOC - Not well represented in observed dataset
- 2 blackwater sites
- Geostatistical model unable to account for
natural variability - Large square prediction errors
- Large prediction variances
56Spatial Patterns in Model Fit
- West of Chesapeake Bay - low SPE values
- Due to statistical and spatial distribution of
observed data - Regression equation fit to the mean in the data
- Most observed sites low DOC values
- Less variation in western and central Maryland
- Neighboring sites tend to be similar
- Separation distances shorter in the west
- Short separation distances stronger covariances
57Model Performance
Unable to account for abrupt differences in DOC
values between neighboring sites with similar
watershed conditions
- What caused abrupt differences?
- Point sources of organic pollution
- Not represented in the model
- Non-point sources of pollution
- Lumped watershed attributes are non-spatial
- Differences due to spatial location of landuse
are not represented - Challenging to represent ecological processes
using coarse-scale lumped attributes - i.e. Flow path of water
58Generate Model Predictions
- Prediction sites
- Study area
- 1st, 2nd, and 3rd order non-tidal streams
- 3083 segments 5973 stream km
- ID downstream node of each segment
- Create prediction site
- More than one site at each confluence
- Generate predictions and prediction variances
- SLD Mariah model
- Universal kriging algorithm
- Assigned predictions and prediction variances
back to stream segments in GIS
59(No Transcript)
60Weak Model Fit
61Strong Model Fit
62Water Quality Attainment by Stream Kilometers
- Threshold values for DOC
- Set by Maryland Department of Natural Resources
- High DOC values may indicate biological or
ecological stress
63Implications for Water Quality Monitoring
- Tradeoff between cost-efficiency and model
accuracy - Western Maryland
- Can be described using a single geostatistical
model - Eastern and northeastern Maryland
- Accept poor model fit
- Collect additional survey data
- Develop a separate geostatistical model for
eastern Maryland
64Implications for Water Quality Monitoring
- Apply this methodology to other regulated indices
-
- e.g. conductivity and pH
- Categorize predictions into potentially impaired
or unimpaired status - Report on attainment in stream miles/kilometers
65Conclusions
- Geostatistical models generated more accurate DOC
predictions than previous non-spatial models
based on coarse-scale landscape data - SLD is more appropriate than WAHD for regional
geostatistical modeling of DOC at this time - Probability-based random survey designs
- Maryland, USA
- Adds value to existing water quality monitoring
efforts - Used to evaluate/report regional water quality
conditions - Additional field sampling is not necessary
- Generate inferences about regional stream
condition - ID spatial location of potentially impaired
stream segments
66Conclusions
- Model predictions and prediction variances
- Additional field efforts concentrated in
- Areas with large amounts of uncertainty
- Areas with a greater potential for water quality
impairment - Model results displayed visually
- Communicate results to a variety of audiences
67Questions?