Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models

1 / 30
About This Presentation
Title:

Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models

Description:

Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models –

Number of Views:63
Avg rating:3.0/5.0
Slides: 31
Provided by: mollyle
Category:

less

Transcript and Presenter's Notes

Title: Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models


1
Using the Maryland Biological Stream Survey Data
to Test Spatial Statistical Models
  • A Collaborative Approach to Analyzing Stream
    Network Data

Andrew A. Merton
2
Overview
  • The material presented here is a subset of the
    work done by Erin Peterson for her Ph.D.
  • Interested in developing geostatistical models
    for predicting water quality characteristics in
    stream segments
  • Data Maryland Biological Stream Survey (MBSS)
  • The scope and nature of the problem requires
    interdisciplinary collaboration
  • Ecology, geoscience, statistics, others

3
Stream Network Data
  • The response data is comprised of observations
    within a stream network
  • What does it mean to be a neighbor in such a
    framework?
  • How does one characterize the distance between
    neighbors?
  • Should distance measures be confined to the
    stream network?
  • Does flow (direction) matter?

4
Stream Network Data
  • Potential explanatory variables are not
    restricted to be within the stream network
  • Topography, soil type, land usage, etc.
  • How does one sensibly incorporate these
    explanatory variables into the analysis?
  • Can we develop tools to aggregate upstream
    watershed covariates for subsequent downstream
    segments?

5
Competing Models
  • Given a collection of competing models, how does
    one select the best model?
  • Is one subset of explanatory variables better or
    closer to the true model?
  • Should one assume correlated residuals and, if
    so, what form should the correlation function
    take?
  • How does the distance measure impact the choice
    of correlation function?

6
Functional Distances Spatial Relationships
Geostatistical models are based on straight-line
distance
Straight-line Distance (SLD) Is this an
appropriate measure of distance? Influential
continuous landscape variables geology type or
acid rain (As the crow flies)
7
Functional Distances Spatial Relationships
Distances and relationships are represented
differently depending on the distance measure
Symmetric Hydrologic Distance (SHD) Hydrologic
connectivity (As the fish swims)
8
Functional Distances Spatial Relationships
Distances and relationships are represented
differently depending on the distance measure
Asymmetric Hydrologic Distance (AHD) Longitudinal
transport of material (As the sht flows)
9
Candidate Models
  • Restrict the model space to general linear
    models
  • Look at all possible subsets of explanatory
    variables X (Hoeting et al)
  • Require a correlation structure that can
    accommodate the various distance measures
  • Could assume that the residuals are spatially
    independent, i.e., S ?2I (probably not best)
  • Ver Hoef et al propose a better solution

10
Asymmetric Autocovariance Models for Stream
Networks
  • Weighted asymmetric hydrologic distance (WAHD)
  • Developed by Jay Ver Hoef, National Marine Mammal
    Laboratory, Seattle
  • Moving average models
  • Incorporates flow and uses hydrologic distance
  • Represents discontinuity at confluences

11
Exponential Correlation Structure
  • The exponential correlation function can be used
    for both SLD and SHD
  • For AHD, one must multiply ? (element-wise) by
    the weight matrix A, i.e., ?ij aij ?ij, hence
    WAHD
  • The weights represent the proportion of flow
    volume that the downstream location receives from
    the upstream location
  • Estimating the aij is non-trivial Need special
    GIS tools (Theobald et al)

12
GIS Tools
Theobald et al have created automated tools to
extract data about hydrologic relationships
between sample pointsVisual Basic for
Applications programs that
  • Calculate separation distances between sites
  • ? SLD, SHD, Asymmetric hydrologic distance (AHD)
  • Calculate watershed covariates for each stream
    segment
  • ? Functional Linkage of Watersheds and Streams
    (FLoWS)
  • Convert GIS data to a format compatible with
    statistics software

SLD
AHD
13
Spatial Weights for WAHD
  • Proportional influence influence of each
    neighboring sample site on a downstream sample
    site
  • Weighted by catchment area Surrogate for flow
  • Calculate influence of each upstream segment on
    segment directly downstream
  • Calculate the proportional influence of one
    sample site on another
  • Multiply the edge proportional influences
  • Output
  • nn weighted incidence matrix

14
Spatial Weights for WAHD
  • Proportional influence influence of each
    neighboring sample site on a downstream sample
    site
  • Weighted by catchment area Surrogate for flow
  • Calculate influence of each upstream segment on
    segment directly downstream
  • Calculate the proportional influence of one
    sample site on another
  • Multiply the edge proportional influences
  • Output
  • nn weighted incidence matrix

15
Spatial Weights for WAHD
  • Proportional influence influence of each
    neighboring sample site on a downstream sample
    site
  • Weighted by catchment area Surrogate for flow
  • Calculate influence of each upstream segment on
    segment directly downstream
  • Calculate the proportional influence of one
    sample site on another
  • Multiply the edge proportional influences
  • Output
  • nn weighted incidence matrix

16
Spatial Weights for WAHD
  • Proportional influence influence of each
    neighboring sample site on a downstream sample
    site
  • Weighted by catchment area Surrogate for flow
  • Calculate influence of each upstream segment on
    segment directly downstream
  • Calculate the proportional influence of one
    sample site on another
  • Multiply the edge proportional influences
  • Output
  • nn weighted incidence matrix

survey sites stream segment
17
Spatial Weights for WAHD
  • Proportional influence influence of each
    neighboring sample site on a downstream sample
    site
  • Weighted by catchment area Surrogate for flow
  • Calculate influence of each upstream segment on
    segment directly downstream
  • Calculate the proportional influence of one
    sample site on another
  • Multiply the edge proportional influences
  • Output
  • nn weighted incidence matrix

18
Parameter Estimation
  • Maximize the (profile) likelihood to obtain
    estimates for ?, ?, and ?2

Profile likelihood
MLEs
19
Model Selection
  • Hoeting et al adapted the Akaike Information
    Corrected Criterion for spatial models
  • AICC estimates the difference between the
    candidate model and the true model
  • Select models with small AICC

where n is the number of observations, p-1 is the
number of covariates, and k is the number of
autocorrelation parameters
20
Spatial Distribution of MBSS Data
21
Summary Statistics for Distance Measures
  • Distance measure greatly impacts the number of
    neighboring sites as well as the median, mean,
    and maximum separation distance between sites

22
Comparing Distance Measures
  • The selected models (one for each distance
    measure) were compared by computing the mean
    square prediction error (MSPE)
  • GLM Assumed independent errors
  • Withheld the same 100 (randomly) selected records
    from each model fit
  • Want MSPE to be small

23
Comparing Distance MeasuresPrediction
Performance for Various Responses
24
Maps of the Relative Weights
  • Generated maps by kriging (interpolation)
  • Predicted values are linear combinations of the
    observed data, i.e.,

Z1 is the observed data, Z2 is the predicted
value, ?11 is the correlation matrix for the
observed sites, and ? is the correlation matrix
between the prediction site and the observed sites
25
Relative Weights Used to Make Prediction at Site
465
26
(No Transcript)
27
Residual Correlations for Site 465
Straight-line
General Linear Model
28
(No Transcript)
29
Some Comments on the Sampling Design
  • Probability-based random survey design
  • Designed to maximize spatial independence of
    survey sites
  • Does not adequately represent spatial
    relationships in stream networks using hydrologic
    distance measures

244 sites did not have neighbors Sample Size
881 Number of sites with 1 neighbor 393 Mean
number of neighbors per site 2.81
30
Conclusions
  • A collaborative effort enabled the analysis of a
    complicated problem
  • Ecology Posed the problem of interest, provides
    insight into variable (model) selection
  • Geoscience Development of powerful tools based
    on GIS
  • Statistics Development of valid covariance
    structures, model selection techniques
  • Others e.g., very understanding (and
    sympathetic) spouses
Write a Comment
User Comments (0)
About PowerShow.com