Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models

1 / 30

About This Presentation

Title:

Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models

Description:

Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models –

Number of Views:63

Avg rating:3.0/5.0

Slides: 31

Provided by: mollyle

Category:

more less

Transcript and Presenter's Notes

Title: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models

1
Using the Maryland Biological Stream Survey Data
to Test Spatial Statistical Models

A Collaborative Approach to Analyzing Stream
Network Data

Andrew A. Merton
2
Overview

The material presented here is a subset of the
work done by Erin Peterson for her Ph.D.
Interested in developing geostatistical models
for predicting water quality characteristics in
stream segments
Data Maryland Biological Stream Survey (MBSS)
The scope and nature of the problem requires
interdisciplinary collaboration
Ecology, geoscience, statistics, others

3
Stream Network Data

The response data is comprised of observations
within a stream network
What does it mean to be a neighbor in such a
framework?
How does one characterize the distance between
neighbors?
Should distance measures be confined to the
stream network?
Does flow (direction) matter?

4
Stream Network Data

Potential explanatory variables are not
restricted to be within the stream network
Topography, soil type, land usage, etc.
How does one sensibly incorporate these
explanatory variables into the analysis?
Can we develop tools to aggregate upstream
watershed covariates for subsequent downstream
segments?

5
Competing Models

Given a collection of competing models, how does
one select the best model?
Is one subset of explanatory variables better or
closer to the true model?
Should one assume correlated residuals and, if
so, what form should the correlation function
take?
How does the distance measure impact the choice
of correlation function?

6
Functional Distances Spatial Relationships
Geostatistical models are based on straight-line
distance
Straight-line Distance (SLD) Is this an
appropriate measure of distance? Influential
continuous landscape variables geology type or
acid rain (As the crow flies)
7
Functional Distances Spatial Relationships
Distances and relationships are represented
differently depending on the distance measure
Symmetric Hydrologic Distance (SHD) Hydrologic
connectivity (As the fish swims)
8
Functional Distances Spatial Relationships
Distances and relationships are represented
differently depending on the distance measure
Asymmetric Hydrologic Distance (AHD) Longitudinal
transport of material (As the sht flows)
9
Candidate Models

Restrict the model space to general linear
models
Look at all possible subsets of explanatory
variables X (Hoeting et al)
Require a correlation structure that can
accommodate the various distance measures
Could assume that the residuals are spatially
independent, i.e., S ?2I (probably not best)
Ver Hoef et al propose a better solution

10
Asymmetric Autocovariance Models for Stream
Networks

Weighted asymmetric hydrologic distance (WAHD)
Developed by Jay Ver Hoef, National Marine Mammal
Laboratory, Seattle
Moving average models
Incorporates flow and uses hydrologic distance
Represents discontinuity at confluences

11
Exponential Correlation Structure

The exponential correlation function can be used
for both SLD and SHD
For AHD, one must multiply ? (element-wise) by
the weight matrix A, i.e., ?ij aij ?ij, hence
WAHD
The weights represent the proportion of flow
volume that the downstream location receives from
the upstream location
Estimating the aij is non-trivial Need special
GIS tools (Theobald et al)

12
GIS Tools
Theobald et al have created automated tools to
extract data about hydrologic relationships
between sample pointsVisual Basic for
Applications programs that

Calculate separation distances between sites
? SLD, SHD, Asymmetric hydrologic distance (AHD)
Calculate watershed covariates for each stream
segment
? Functional Linkage of Watersheds and Streams
(FLoWS)
Convert GIS data to a format compatible with
statistics software

SLD
AHD
13
Spatial Weights for WAHD

Proportional influence influence of each
neighboring sample site on a downstream sample
site
Weighted by catchment area Surrogate for flow

Calculate influence of each upstream segment on
segment directly downstream
Calculate the proportional influence of one
sample site on another
Multiply the edge proportional influences
Output
nn weighted incidence matrix

14
Spatial Weights for WAHD

Proportional influence influence of each
neighboring sample site on a downstream sample
site
Weighted by catchment area Surrogate for flow

Calculate influence of each upstream segment on
segment directly downstream
Calculate the proportional influence of one
sample site on another
Multiply the edge proportional influences
Output
nn weighted incidence matrix

15
Spatial Weights for WAHD

Proportional influence influence of each
neighboring sample site on a downstream sample
site
Weighted by catchment area Surrogate for flow

Calculate influence of each upstream segment on
segment directly downstream
Calculate the proportional influence of one
sample site on another
Multiply the edge proportional influences
Output
nn weighted incidence matrix

16
Spatial Weights for WAHD

Proportional influence influence of each
neighboring sample site on a downstream sample
site
Weighted by catchment area Surrogate for flow

Calculate influence of each upstream segment on
segment directly downstream
Calculate the proportional influence of one
sample site on another
Multiply the edge proportional influences
Output
nn weighted incidence matrix

survey sites stream segment
17
Spatial Weights for WAHD

Proportional influence influence of each
neighboring sample site on a downstream sample
site
Weighted by catchment area Surrogate for flow

Calculate influence of each upstream segment on
segment directly downstream
Calculate the proportional influence of one
sample site on another
Multiply the edge proportional influences
Output
nn weighted incidence matrix

18
Parameter Estimation

Maximize the (profile) likelihood to obtain
estimates for ?, ?, and ?2

Profile likelihood
MLEs
19
Model Selection

Hoeting et al adapted the Akaike Information
Corrected Criterion for spatial models
AICC estimates the difference between the
candidate model and the true model
Select models with small AICC

where n is the number of observations, p-1 is the
number of covariates, and k is the number of
autocorrelation parameters
20
Spatial Distribution of MBSS Data
21
Summary Statistics for Distance Measures

Distance measure greatly impacts the number of
neighboring sites as well as the median, mean,
and maximum separation distance between sites

22
Comparing Distance Measures

The selected models (one for each distance
measure) were compared by computing the mean
square prediction error (MSPE)
GLM Assumed independent errors
Withheld the same 100 (randomly) selected records
from each model fit
Want MSPE to be small

23
Comparing Distance MeasuresPrediction
Performance for Various Responses
24
Maps of the Relative Weights

Generated maps by kriging (interpolation)
Predicted values are linear combinations of the
observed data, i.e.,

Z1 is the observed data, Z2 is the predicted
value, ?11 is the correlation matrix for the
observed sites, and ? is the correlation matrix
between the prediction site and the observed sites
25
Relative Weights Used to Make Prediction at Site
465
26
(No Transcript)
27
Residual Correlations for Site 465
Straight-line
General Linear Model
28
(No Transcript)
29
Some Comments on the Sampling Design

Probability-based random survey design
Designed to maximize spatial independence of
survey sites
Does not adequately represent spatial
relationships in stream networks using hydrologic
distance measures

244 sites did not have neighbors Sample Size
881 Number of sites with 1 neighbor 393 Mean
number of neighbors per site 2.81
30
Conclusions

A collaborative effort enabled the analysis of a
complicated problem
Ecology Posed the problem of interest, provides
insight into variable (model) selection
Geoscience Development of powerful tools based
on GIS
Statistics Development of valid covariance
structures, model selection techniques
Others e.g., very understanding (and
sympathetic) spouses