Title: Local%20Enhancement%20%20of%20Global%20Estimation
1Local Enhancement of Global Estimation
Molly Leecaster, Ph.D. Kerry Ritter, Ph.D.
DAMARS and STARMAP 2nd Annual Conference Oregon
State University Corvallis, OR August 11, 2003
2Acknowledgement
PROJECT FUNDING
- The work reported here was developed under the
STAR Research Assistance Agreement CR-829095
awarded by the U.S. Environmental Protection
Agency (EPA) to Colorado State University. This
presentation has not been formally reviewed by
EPA. The views expressed here are solely those
of the presenter and STARMAP, the Program they
represent. EPA does not endorse any products or
commercial services mentioned in this
presentation.
3Outline of Presentation
- Introduction
- Two-stage sample design
- Spatial modeling of binary EMAP data
- Indicator kriging
- Conditional autoregressive model
- Simulation Example
- Future work
4Introduction
- EMAP developed for estimation of areal extent of
resources - Sample locations are spatially separated
- EMAP participants are interested in global
estimation but also have local concerns - Spatial modeling
- EMAP data does not provide information on the
local spatial structure required for good spatial
models - Therefore .
- Augment EMAP design to improve spatial modeling
5Goals
- Present enhancement to EMAP design
- Use of enhanced sample in spatial models of
indicator data - Indicator kriging
- Conditional autoregressive model
6Outline of Presentation
- Introduction
- Two-stage sample design
- Spatial modeling of EMAP data
- Simulation Example
- Future work
7Two-stage Systematic Grid Plus Star Cluster
Sample Design
- Two-stage because two goals
- Systematic (EMAP) grid for global structure
- Star cluster sample for variogram estimation
- Enhance EMAP design with additional sample
locations - Ideal for areal extent and prediction
- Ideal for variogram estimation
8Two-Stage Design
Pink..absence Blue..presence Black....s
ystematic Green...star clusters
1 Orange....star clusters 2
9Stage One Systematic Component (EMAP)
- Based on global estimation requirements
- e.g. 30 spatially separated locations per strata
10Stage TwoStar Cluster Component
- Star clusters of sample sites around stage-one
locations - Star clusters provide estimate of small scale
pair-wise variance - Star clusters also provide many added pairs of
samples at various distance lags - Star clusters provide directional information at
small scale - How to specify star clusters?
11Stage TwoStar Cluster Component
- Location of star clusters
- Adaptive, locate at specified observed response
- Does this bias the variogram estimation?
- Random stage-one locations
- Systematic subset of stage-one locations
- Size of star clusters
- Diameter of star variogram range
- Diameter of star gt variogram range
- Number of star clusters
- At least two, but how many more?
12Outline of Presentation
- Introduction
- Two-stage sample design
- Spatial modeling of EMAP data
- Simulation Example
- Future work
13Spatial Models for Binary Data
- Indicator kriging for geo-referenced data
- Conditional autoregressive model for binary
lattice data
14Indicator Kriging
- Binary geo-referenced data
- Spatial correlation structure modeled from data
- Precision of predictions depends on sample
spacing and variogram parameters
15Ordinary Indicator Kriging
- Estimate local indicator mean,
, at each location - Apply simple IK estimator using estimated mean
16Conditional Autoregressive Model for Binary Data
- Binary lattice data
- Spatial correlation structure assumed locally
(neighborhood) dependent Markov random field - Neighborhood defined as fixed pattern of
surrounding grid points - Precision of predictions depends on neighborhood
structure, grid size, and variance of response
17Conditional Autoregressive Model for Binary Data
18Comparison of Models
- Ordinary Indicator Kriging
- Advantages
- Knowledge of spatial relationship improves
prediction - Assumed spatial relationship based on data
- Disadvantages
- Not robust to variogram mis-specification
- Requires strong stationarity assumption
- Conditional autoregressive
- Advantages
- No need to estimate or model variogram
- Can be used without geo-referenced data
- Disadvantages
- Assumed spatial relationship based on a grid size
that could be inaccurate
19Outline of Presentation
- From last year to now progress new
directions - Two-stage sample design
- Spatial modeling of EMAP data
- Simulation Example
- Future work
20Simulation Example
- Used simulation so spatial structure was known
- Simulated response from specific variogram model
on to 50x50 hexagon grid of points - Specified presence/absence cutoff
- Applied two-stage sample design (2 realizations)
- Estimated and modeled variogram from sample data
- For some, did two manual and one automatic fit
- Predicted probability of presence using indicator
kriging and conditional autoregressive model
21Simulation Methods
- Simulated data from Gaussian random field
(S-Plus) - Spherical variogram, range 22, sill 0.4,
nugget 0 - Simulated value gt 2 gt presence
- Sample Designs
- Systematic sample (n30)
- Systematic sample plus 2 star clusters (n54)
- Systematic sample plus 4 star clusters (n78)
- Models
- Indicator kriging
- Conditional autoregressive model
22Data Simulation with Sample Sites
Pink..absence Blue..presence Black....s
ystematic Green...star clusters
1 Orange....star clusters 2
23Variogram for Sample Designs
Systematic
Systematic 2 Stars
Systematic 4 Stars
Range Sill Nugget
Systematic 17 0.17 0
Sys. 2 20 0.4 0
Sys. 4 14 0.4 0
24Systematic Sample Results
25Systematic Sample with 2 Stars
26Systematic Sample with 4 Stars
27Three Fits Systematic 2 Stars
Automatic Fit
Manual Fit 1
- Range Sill Nugget
- 17 0.3 0
- 0.4 0
- 0.27 0
- All use correct model
Manual Fit 2
28Predictions from 3 Variogram Fits
Automatic Fit
Manual Fit 1
Manual Fit 2
29Comparison of Prediction Errors
- Sensitivity
- Number of presence sites predicted to be present
- Specificity
- Number of absence sites predicted to be absent
- True Positive Rate
- Number of predicted presence sites that truly are
present - True Negative Rate
- Number of predicted absence sites that truly are
absent
30Comparison of Predictions (Data1F) (positive if
probability gt 0.5)(Auto, Manual 2)
Model Sample Sensitivity Specificity True Positive Rate True Negative Rate
Indicator Kriging Systematic 28 98 85 74
Systematic 2 Stars 41 (36, 27) 94 (96, 99) 77 (80, 76) 77 (90, 74)
Systematic 4 Stars 32 97 85 75
Conditional Auto. Systematic 15 96 63 70
Systematic 2 Stars 56 85 64 80
Systematic 4 Stars 54 86 65 80
31Comparison of Predictions (Data1F) (positive if
probability gt 0.3)(Auto, Manual 2)
Model Sample Sensitivity Specificity True Positive Rate True Negative Rate
Indicator Kriging Systematic 48 91 71 78
Systematic 2 Stars 59 (56, 44) 85 (87, 93) 65 (67, 76) 81 (80 ,78)
Systematic 4 Stars 49 91 73 79
Conditional Auto. Systematic 48 80 53 76
Systematic 2 Stars 80 46 42 83
Systematic 4 Stars 80 49 43 83
32Data Simulation with Sample Sites
Pink..absence Blue..presence Black....s
ystematic Green...star clusters
1 Orange....star clusters 2
33Variograms for Sample Designs
Systematic
Systematic 2 Stars
Systematic 4 Stars
Range Sill Nugget
Systematic 15 0.27 0
Sys. 2 12 0.30 0.05
Sys. 4 13 0.30 0.03
34Systematic Sample Results
35Systematic Sample with 2 Stars
36Systematic Sample with 4 Stars
37Three Fits Systematic
Automatic Fit
Manual Fit 1
- Range Sill Nugget
- 30 .25 .21
- 15 .27 0
- .22 0
- All use correct model
Manual Fit 2
38Predictions from 3 Variogram Fits
Automatic Fit
Manual Fit 1
Manual Fit 2
39Comparison of Predictions (Data3F) (positive if
probability gt 0.5)(Auto, Manual 2)
Model Sample Sensitivity Specificity True Positive Rate True Negative Rate
Indicator Kriging Systematic 31 (1, 15) 92 (99, 97) 65 (88, 69) 73 (68, 70)
Systematic 2 Stars 21 96 75 72
Systematic 4 Stars 24 97 81 72
Conditional Auto. Systematic 7 98 65 69
Systematic 2 Stars 17 97 71 71
Systematic 4 Stars 18 99 88 71
40Comparison of Predictions (Data3F) (positive if
probability gt 0.3)(Auto, Manual 2)
Model Sample Sensitivity Specificity True Positive Rate True Negative Rate
Indicator Kriging Systematic 62 (72, 37) 80 (69, 89) 60 (53, 63) 81 (84, 75)
Systematic 2 Stars 43 90 68 77
Systematic 4 Stars 44 91 71 77
Conditional Auto. Systematic 68 57 41 77
Systematic 2 Stars 78 58 47 84
Systematic 4 Stars 80 56 47 85
41Simulation Conclusions - Design
- Two star clusters improved small-scale features
of variogram - Two star clusters improved prediction accuracy
- Four star clusters offered little improvement
over two stars
42Simulation Conclusions - Models
- Variogram model affects predictions
- Kriging tends toward overall mean probability of
presence, i.e. it smooths - Kriging builds patches whose diameter is
approximately the range of the variogram - Conditional autoregressive model attempts to
connect observed presence - Neither model had consistently higher sensitivity
or specificity
43Outline of Presentation
- From last year to now progress new
directions - Two-stage sample design
- Spatial modeling of EMAP data
- Simulation Example
- Future work
44Future Work
- Further simulation studies on two stage design
- Effect of sample size
- Number of star clusters necessary to improve
variogram estimation - Effect of size of star clusters
- Bias from adaptive second-stage sampling
- Advantages of indicator kriging and conditional
autoregressive model - Sensitivity of conditional autoregressive model
to initial values, prior distributions, and grid
size - Sensitivity of kriging to variogram model
specification
45Future Work
- Apply two-stage sample design to real data
- DDT data from Santa Monica Bay, CA
- EMAP data and local monitoring data
- Freely distribute functions for applying the
conditional autoregressive model on a hexagon
lattice - Functions in R to produce hexagon lattice input
for WinBUGS - File in WinBUGS to apply model
- Investigate optimal grid size to achieve EMAP and
spatial modeling goals
46Systematic (EMAP) Grid Based on Variogram Model
- Kriging variance
- Analog for conditional autoregressive model
47Systematic (EMAP) Grid Based on Variogram Model
- Prediction variance is minimized by large
covariance between prediction location and sample
locations - For kriging, grid refers to sample locations
- For conditional autoregressive, grid refers to
sample locations and prediction locations - Want -------- Sample locations close together
- Samples too far apart gt
- Kriging -gt correctly uses no spatial relationship
- Conditional autoregressive -gt incorrectly uses
assumed spatial relationship - Samples too close together gt waste of resources