Title: Design and Analysis Process
1Design and Analysis Process
- Anthony (Tony) R. Olsen
- USEPA NHEERL Western Ecology Division
- 200 S.W. 35th Street
- Corvallis, OR 97333
- Phone 541 754-4790
- Email olsen.tony_at_epa.gov
2Design Structure
Site Selection Design File
Resource Characteristics
Target Population
Monitoring Objectives
Design Requirements
Survey Design
Institutional Constraints
Sample Frame
3Survey Design Specification
- Target population definition
- Sample Frame Specification
- Basic Design
- RTS for areal population
- GRTS for linear continuous population
- GRTS for discrete population
- Simple random sample
- Stratification categories
- Multi-density categories
- Expected sample sizes
- Panel structure
- Oversample specification
- Nested Subsamples
- Intensification domains
- Study-wide base sample
- Multi-Stage or Multi-phase
4Site Selection Process
- Create GIS coverage for sample frame
- Intersect frame coverage with grid coverage
- Determine multi-density multiplicative factors
for unequal probability - Select sites
- Create Design File
- Driven by survey design specification
- Create GIS coverage for boundary of study area
- Create a hierarchical grid covering study area
- Ensure lt1 point per grid cell
- Ensure grid cell small enough to be included in
small regions of study area
5Design File ContentsHas all sites selected by
survey design
- Survey Design
- Stratum
- Multi-Density Category
- Panel
- Oversample
- Nesting Identification
- Expected Sample Size
- Initial Weight
- Site Identification
- Site ID
- Latitude/Longitude
- Site Name
- Sample Frame ID
- County
- Map names
- Auxiliary Frame Information
- State
- Omernik Ecoregion
- Other
6EMAP West Monitoring Objectives
- Estimate for each State, EPA Region, and Study
Region - Extent (total length) of perennial and
non-perennial streams and rivers - Condition of perennial streams and rivers
- Estimate for each special study area
- Extent of perennial streams and rivers
- Condition of perennial streams and rivers
7Resource Characteristics
- Linear networks of streams and rivers
- Study region covers 12 western states
- Exclude set of Great rivers
- Density of perennial and non-perennial streams
and rivers differs by mountainous and arid
ecoregions - Very uneven distribution of stream and river
length by Strahler order
8Institutional Constraints
- States must be able to operate independently of
other states - Best knowledge of streams and rivers exists
within state environment and natural resource
agencies - Common survey design, data forms, and data
management needed for consistency across 12
states - Probability survey design and analysis expertise
at ORD with study leading to tech transfer to
Regions and states
9Target Population Sample Frame
- RF3 GIS coverage for 12 western states
- Based on 1100,000 maps
- Streams and rivers coded perennial, intermittent,
natural, constructed - Codes
- Strahler order
- Ecoregion arid or mountainous
- All stream and river channels in 12 western
states - Defined channel present
- Natural and constructed channels
10Sample Frame Concerns
- RF3 known to be an under coverage of the
- target population of all streams and rivers
- Target population of all perennial streams and
rivers - RF3 codes are known to be inaccurate
- Strahler order due to incompleteness of linear
networks - Perennial, intermittent codes
- Decision made that under coverage likely to be
size that it can be ignored - RF3 perennial coverage will have both an over
coverage and an under coverage
11RF3 Stream Length
12RF3 Non-PerennialDesign Requirements
- States must be able to complete site evaluations
independent of other states - Site evaluations to be completed in one year
- Sufficient samples in mountainous regions to
determine if have significant RF3 code problems
- Estimate extent (km of length) of non-perennial
and perennial streams and rivers - Entire study region
- EPA Regions 8, 9, 10
- Each state
- /- 10 Precision at 90 confidence
13(No Transcript)
14RF3 Non-perennial Survey Design
- GRTS design for linear resource
- Stratify by 12 states
- Multi-density categories
- Strahler order 1, 2, 3
- Arid and Mountainous ecoregions
- Sample size 100/state
- Minimum 50 sites per multi-density category
- Panels one
- No intensification areas
- No nested subsampling
- No oversample
15Number of RF3 Non Perennial Sites Selected by
categories
16(No Transcript)
17Design Status File Creation
Design File
Design File
Site Evaluation
Design Status File
Site Status File
Field Recon
Sample Frame
18Site Status File Contents
- EMAP Site ID
- Monitoring Program Site ID
- Site Name
- Comments on Site Status
- Other auxiliary information
- Site Status codes
- S Sampled
- LD Landowner Denied Access
- PB Physically inaccessible
- NT Non-Target
- NS Not Sampled
- NN Not Needed
19Design Status File
- Contains variables from Site Status File
- Contains variables from Design File
- Contains derived variables
- Requires summary information from Sample Frame
- May contain results
- Derived variables
- Albers projection x,y coordinates
- Final weights adjusted for use of oversample
20RF3 Non-PerennialSite Evaluation Process
- Each site evaluated to determine
- Stream channel existence
- Perennial or non-perennial
- Other characteristics
- Three phase evaluation
- Office assessment based on existing information
- Phone call to knowledgeable local person
- Field visit
- 1200 RF3 Non-perennial sites evaluated
21Population Estimation
- Population size estimates for Site Status
categories - Proportions for categories
- Population mean and variance
- Cumulative Distribution Function estimates for
continuous variables - Percentile estimates for continuous variables
- Testing for difference between two CDFs
22Statistical Computing Environment
- S-Plus Professional
- Flexible analyses
- Estimation functions developed
- Cost
- R
- Similar to S-Plus
- S-Plus functions work
- No Cost
- SAS
- Flexible analyses
- SAS macros developed
- Cost
- Information management for data
- EMAP West available from S.W.I.M. web site
- Monitoring organization responsible
- STORET when includes design information
- Assume data file format is
- ASCII CSV or Tab delimited
- Excel spreadsheet
- SAS dataset sd2
23Population Size Estimation Function
- Pop.size.est.fcn
- Z SiteStatus
- Wgt final weight
- Stratum
- VarType SRS, Local
- Conf confidence level (95)
- X,Y Coordinates for local variance estimator
24Category Proportion Estimates
- Cat.prop.fcn
- Catvar category variable
- Wgt final weight
- Stratum
- VarType SRS, Local
- Conf confidence level (95)
- X,Y Coordinates for local variance estimator
25Mean and Variance Estimation
- Variance.fcn
- Z result
- Wgt final weight
- X,Y coordinate for variance estimator
- Stratum
- VarType Local or SRS
- R known extent
- Conf confidence level
- Etype Var or SD
- Mean.fcn Total.fcn
- Z result
- Wgt final weight
- X,Y coordinate for variance estimator
- Stratum
- VarType Local or SRS
- R known extent
- Conf confidence level
26CDF Estimate for Continuouscdf.est.fcn
- Z SiteStatus
- Wgt final weight
- Cdfval z-values at which CDF will be estimated
- Stratum
- VarType SRS, Local
- Conf confidence level (95)
- X,Y Coordinates for local variance estimator
- Pctval percentiles to be estimated
- Prop T CDF proportion F CDF total
- R known extent
- Size T size-weight estimates for discrete
- Swgt size-weights
- W known sum of size-weights
27Test Difference Two CDFscdf.test.fcn
- Sample 1 list
- Z result
- Wgt final weights
- X-coordinate
- Y-coordinate
- Sample 2 list
- Z result
- Wgt final weights
- X-coordinate
- Y-coordinate
- Bounds upper limits for cdf test classes
- Conf confidence level
28Other Functions You Dont See
- cdf.prop.fcn
- cdf.total.fcn
- cdf.size.prop.fcn
- cdf.size.total.fcn
- cdfvar.prop.fcn
- cdfvar.size.prop.fcn
- cdfvar.size.total.fcn
- cdfvar.total.fcn
- cdfvar.test.fcn
- dist2full.
- localmean.cov.fcn
- localmean.fcn
- localmean.var.fcn
- localmean.weight.fcn
- wnas.fcn
29S-Plus OperationInitial Step Each
Project
- Create a folder for each project
- Place an S-Plus shortcut icon in folder
- Start S-Plus
- Create a new workspace in project folder
- Browse to find
- S-Plus creates a _Data folder
- Do NOT detach default _Data folder
- Place all estimation functions in single folder
- Start S-Plus
- Execute all estimation functions
- They will be located in default _Data directory
- Only have to do this part once or whenever an
updated set of functions are provided
30Data Analysis Sequence
- Set up workspace initially
- Import Design File and Site Evaluation File OR
Design Status File - Adjust weights if not done
- Export final Design Status file
- Do population extent estimation
- Save extent results
- Import data results file
- Do population status estimation
- Category estimates
- CDF estimates
- Mean, Total, Var estimates
- Construct plots
- Save as pdf files
- Save status results
31Examples
- Kentucky Rotating Basin
- EMAP West RF3 Non Perennial Site Evaluation
survey - Region 10 REMAP Coastal Streams
- Region 6 REMAP Texas 3-ecoregion streams
- New York City water supply watershed streams