Title: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models
1Using the Maryland Biological Stream Survey Data
to Test Spatial Statistical Models
- A Collaborative Approach to Analyzing Stream
Network Data
Andrew A. Merton
2Overview
- The material presented here is a subset of the
work done by Erin Peterson for her Ph.D. - Interested in developing geostatistical models
for predicting water quality characteristics in
stream segments - Data Maryland Biological Stream Survey (MBSS)
- The scope and nature of the problem requires
interdisciplinary collaboration - Ecology, geoscience, statistics, others
3Stream Network Data
- The response data is comprised of observations
within a stream network - What does it mean to be a neighbor in such a
framework? - How does one characterize the distance between
neighbors? - Should distance measures be confined to the
stream network? - Does flow (direction) matter?
4Stream Network Data
- Potential explanatory variables are not
restricted to be within the stream network - Topography, soil type, land usage, etc.
- How does one sensibly incorporate these
explanatory variables into the analysis? - Can we develop tools to aggregate upstream
watershed covariates for subsequent downstream
segments?
5Competing Models
- Given a collection of competing models, how does
one select the best model? - Is one subset of explanatory variables better or
closer to the true model? - Should one assume correlated residuals and, if
so, what form should the correlation function
take? - How does the distance measure impact the choice
of correlation function?
6Functional Distances Spatial Relationships
Geostatistical models are based on straight-line
distance
Straight-line Distance (SLD) Is this an
appropriate measure of distance? Influential
continuous landscape variables geology type or
acid rain (As the crow flies)
7Functional Distances Spatial Relationships
Distances and relationships are represented
differently depending on the distance measure
Symmetric Hydrologic Distance (SHD) Hydrologic
connectivity (As the fish swims)
8Functional Distances Spatial Relationships
Distances and relationships are represented
differently depending on the distance measure
Asymmetric Hydrologic Distance (AHD) Longitudinal
transport of material (As the sht flows)
9Candidate Models
- Restrict the model space to general linear
models - Look at all possible subsets of explanatory
variables X (Hoeting et al) - Require a correlation structure that can
accommodate the various distance measures - Could assume that the residuals are spatially
independent, i.e., S ?2I (probably not best) - Ver Hoef et al propose a better solution
10Asymmetric Autocovariance Models for Stream
Networks
- Weighted asymmetric hydrologic distance (WAHD)
- Developed by Jay Ver Hoef, National Marine Mammal
Laboratory, Seattle - Moving average models
- Incorporates flow and uses hydrologic distance
- Represents discontinuity at confluences
11Exponential Correlation Structure
- The exponential correlation function can be used
for both SLD and SHD - For AHD, one must multiply ? (element-wise) by
the weight matrix A, i.e., ?ij aij ?ij, hence
WAHD - The weights represent the proportion of flow
volume that the downstream location receives from
the upstream location - Estimating the aij is non-trivial Need special
GIS tools (Theobald et al)
12GIS Tools
Theobald et al have created automated tools to
extract data about hydrologic relationships
between sample pointsVisual Basic for
Applications programs that
- Calculate separation distances between sites
- ? SLD, SHD, Asymmetric hydrologic distance (AHD)
- Calculate watershed covariates for each stream
segment - ? Functional Linkage of Watersheds and Streams
(FLoWS) - Convert GIS data to a format compatible with
statistics software
SLD
AHD
13Spatial Weights for WAHD
- Proportional influence influence of each
neighboring sample site on a downstream sample
site - Weighted by catchment area Surrogate for flow
- Calculate influence of each upstream segment on
segment directly downstream - Calculate the proportional influence of one
sample site on another - Multiply the edge proportional influences
- Output
- nn weighted incidence matrix
14Spatial Weights for WAHD
- Proportional influence influence of each
neighboring sample site on a downstream sample
site - Weighted by catchment area Surrogate for flow
- Calculate influence of each upstream segment on
segment directly downstream - Calculate the proportional influence of one
sample site on another - Multiply the edge proportional influences
- Output
- nn weighted incidence matrix
15Spatial Weights for WAHD
- Proportional influence influence of each
neighboring sample site on a downstream sample
site - Weighted by catchment area Surrogate for flow
- Calculate influence of each upstream segment on
segment directly downstream - Calculate the proportional influence of one
sample site on another - Multiply the edge proportional influences
- Output
- nn weighted incidence matrix
16Spatial Weights for WAHD
- Proportional influence influence of each
neighboring sample site on a downstream sample
site - Weighted by catchment area Surrogate for flow
- Calculate influence of each upstream segment on
segment directly downstream - Calculate the proportional influence of one
sample site on another - Multiply the edge proportional influences
- Output
- nn weighted incidence matrix
survey sites stream segment
17Spatial Weights for WAHD
- Proportional influence influence of each
neighboring sample site on a downstream sample
site - Weighted by catchment area Surrogate for flow
- Calculate influence of each upstream segment on
segment directly downstream - Calculate the proportional influence of one
sample site on another - Multiply the edge proportional influences
- Output
- nn weighted incidence matrix
18Parameter Estimation
- Maximize the (profile) likelihood to obtain
estimates for ?, ?, and ?2
Profile likelihood
MLEs
19Model Selection
- Hoeting et al adapted the Akaike Information
Corrected Criterion for spatial models - AICC estimates the difference between the
candidate model and the true model - Select models with small AICC
where n is the number of observations, p-1 is the
number of covariates, and k is the number of
autocorrelation parameters
20Spatial Distribution of MBSS Data
21Summary Statistics for Distance Measures
- Distance measure greatly impacts the number of
neighboring sites as well as the median, mean,
and maximum separation distance between sites
22Comparing Distance Measures
- The selected models (one for each distance
measure) were compared by computing the mean
square prediction error (MSPE) - GLM Assumed independent errors
- Withheld the same 100 (randomly) selected records
from each model fit - Want MSPE to be small
23Comparing Distance MeasuresPrediction
Performance for Various Responses
24Maps of the Relative Weights
- Generated maps by kriging (interpolation)
- Predicted values are linear combinations of the
observed data, i.e.,
Z1 is the observed data, Z2 is the predicted
value, ?11 is the correlation matrix for the
observed sites, and ? is the correlation matrix
between the prediction site and the observed sites
25Relative Weights Used to Make Prediction at Site
465
26(No Transcript)
27Residual Correlations for Site 465
Straight-line
General Linear Model
28(No Transcript)
29Some Comments on the Sampling Design
- Probability-based random survey design
- Designed to maximize spatial independence of
survey sites - Does not adequately represent spatial
relationships in stream networks using hydrologic
distance measures
244 sites did not have neighbors Sample Size
881 Number of sites with 1 neighbor 393 Mean
number of neighbors per site 2.81
30Conclusions
- A collaborative effort enabled the analysis of a
complicated problem - Ecology Posed the problem of interest, provides
insight into variable (model) selection - Geoscience Development of powerful tools based
on GIS - Statistics Development of valid covariance
structures, model selection techniques - Others e.g., very understanding (and
sympathetic) spouses