Title: NASA USGS Invasive Species Project
1NASA / USGSInvasive Species Project
- Jeff Morisette and Jeff Pedelty
- NASA Goddard Space Flight Center
- Greenbelt, Maryland
- NASA Goddard Space Flight Center
- Biospheric Sciences Branch Seminar
- 9 March 2004
2Presentation Outline
- Invasive species overview (mostly with USGS)
- NASAs work with Invasive species
- Related data sets
- Statistical Modeling
- Parallel Computing
- Future directions
3Presentation Outline
- Invasive species overview (mostly with USGS)
- NASAs work with Invasive species
- Related data sets
- Statistical Modeling
- Parallel Computing
- Future directions
4Invasive SpeciesA Top Environmental Issue of the
21st Century
- Economic Costs
- 137 Billion / Yr
- (Pimentel, et al. 1999 NISRC Management Plan,
2001) - Environmental Costs
- Decreased biodiversity, ecological services, etc.
- Human-Health Costs
- West Nile Virus, Malaria, etc.
- Agricultural Costs
- Crop pathogens, hoof-and-mouth, mad cow disease
-
- Notorious examples include
- Dutch elm disease, chestnut blight, and purple
loosestrife in the northeast kudzu, Brazilian
peppertree, water hyacinth, nutria, and fire ants
in the southeast zebra mussels, leafy spurge,
and Asian long-horn beetles in the Midwest salt
cedar, Russian olive, and Africanized bees in the
southwest yellow star thistle, European wild
oats, oak wilt disease, Asian clams, and white
pine blister rust in California cheatgrass,
various knapweeds and thistles in the Great
Basin whirling disease of salmonids in the
northwest hundreds of invasive species from
microbes to mammals in Hawaii and the brown tree
snake in Guam. - As many as 50,000 now,hundreds new each year ...
5Federal GovernmentResponse
- National Invasive Species Council (EO 13122 -
1999) - Co-Chaired by Departments of Agriculture,
Commerce, and Interior - USGS has a lead role in dealing with invasive
species science in natural and semi-natural areas - Responsible for measurement, management, and
control on all Department of Interior and
adjacent lands ...
6USGS National Institute of Invasive Species
Science
USGS Biological Resources Division (BRD)
laboratory Located at USGSs Ft. Collins Science
Center New facilities opened Aug 02 Director,
Tom Stohlgren Many current / future partners ...
Grand Challenge Biodiversity and Ecosystem
Functioning with special emphasis on invasive
species ... NRC Committee on Grand Challenges
in Environmental Sciences, 2001
Needed A National Center For Biological
Invasions Don Schmitz and Dan
Simberloff Issues in Science and Technology,
Summer 2001
7USGS Science / Client Needs
- On-demand, predictive (in space and time)
landscape- and regional-scale models and maps for
biological invasions - Pick any point, land management unit, county,
state, or region and determine the current
invasion, and vulnerability to future invasion by
species. - Pick any species or group of species, and get
current distributions, potential distribution and
rate of change, each with estimates of
uncertainty. - Data integration and sharing
- Comprehensive information on control efforts and
cost. Share early detection data, control
strategies, local expertise. Help public and
private land managers.
8Presentation Outline
- Invasive species overview (mostly with USGS)
- NASAs work with Invasive species
- Related data sets
- Statistical Modeling
- Parallel Computing
- Future directions
9National Invasive Species Forecasting System
(ISFS)
- Research funded by NASAs Earth Science
Enterprise - Terra and Aqua Science Applications
- Value Added Products from MODIS Time-Series Data
Sets to Support DOI/USGS Invasive Species
Management - (Morisette, Pedelty, Schnase, Stohlgren)
- Interdisciplinary Science
- Fingerprinting Native and Non-native Biodiversity
in the United States - Phase I The Western US (Stohlgren, Schnase,
Morisette, Pedelty) - ReaSON CAN
- (Schnase, Smith, Stohlgren)
- Carbon Cycle Science Applications Program
- Predicting Regional-Scale Exotic Plant Invasions
in Grand Staircase-Escalante National Monument
(NASA YS/YO NRA - Schnase, Smith, Stohlgren) - Computational Technologies Program
- Biotic Prediction Building the Computational
Technology Infrastructure for Public Health and
Environmental Forecasting (NASA YS CAN -
Schnase, Smith, Stohlgren)
10Science Questions
- What are the biotic and abiotic factors
determining species distributions at local and
landscape scales? - Where are local concentrations of endemism,
richness, abundance, and biomass? - What processes drive habitat and community
dynamics? - How do invasive species interact with other
environmental changes?
11Why this is a difficult challenge
- High-resolution, in space and time, is critical
but expensive - Biodiversity hotspots play a critical role in
the biosphere we must be able to adaptively
span global and local scales - Early detection essential for rapid response and
effective management - Quantifying pathways of introduction essential
for cost/benefit guidance for eradication and
control requires more than remotely sensed data - Modeling involves large amounts of data with
inherent spatial structure
12Presentation Outline
- Invasive species overview (mostly with USGS)
- NASAs work with Invasive species
- Related data sets
- Statistical Modeling
- Parallel Computing
- Future directions
13Input data Soil properties
- Importance
- Species habitat requirement
- Determinant of species range boundaries,
corridors of invasion, dispersal patterns - Current Sources
- Type STATSGO, local soil maps Moisture
passive microwave, radar, and NIR - Sources
- USGS STATSGO
- http//water.usgs.gov/lookup/getspatial?ussoils
- Currently hold twenty soil properties raster
layers at 30m spatial resolution for all of
Colorado
14Input data Elevation, slope and aspect
- Importance
- Determinant of species range boundaries,
corridors of invasion - Influences hydrological, geological, and human
processes - Current Sources
- GTOPO 30 GLOBE 30 arcsec/100m USGS/European
regional models US DEM - Future Sources
- SRTM (global) 30m H/30m V High-resolution LIDAR
Military DTED2 (global) - Currently hold Shuttle RADAR Topography Mission
(SRTM) digital elevation data, at 30m spatial
resolution mosaicked and clipped to the Colorado. - Source USGS
- http//seamless.usgs.gov/
15Input data Vegetation signal
- Importance
- Vegetation structure the habitat parameter for
many species - Structural complexity major driver of species
richness in all environments - Current Sources
- Visible/Infrared ETM, MODIS
- SAR - Estimates of canopy texture, biomass,
geometry AVHRR NDVI - Future Sources
- LIDAR
- Vis/IR - ASTER
- Currently hold 4 Tasseled-Cap NDVI layers from
Landsat-7 ETM (2000) for Colorada
Airborne LIDAR
16ASD Spectra
reflectance
Simulated ETM
Simulated ASTER
17Input data Phenology
- Importance
- Plant phenology an important driver for animal
species - Many change habitats to track available resources
- Current Sources
- Multispectral imagery
- 30m resolution several times per year250m
resolution daily - Future Sources
- Higher temporal resolution multi-spectral
- Satellite-borne hyperspectral
- Meterological data
- Currently hold MODIS Vegetation Index (VI)
product (MOD13--16-day composite with 250m
spatial resolution, ver. 004) for four years
(Feb. 2000 to present) for three study sites and
all of Colorado
18Presentation Outline
- Invasive species overview (mostly with USGS)
- NASAs work with Invasive species
- Related data sets
- Statistical Modeling
- Parallel Computing
- Future directions
19USGS Predictive Modeling
Output GIS - Spatial Statistical Dynamic Models
and Maps
Trend Surface Analysis With Stepwise Multiple
Regression Using OLS, GLS, SAR, or Exhaustive
Regression
Input Variables (150) Remotely Sensed
data (ETM, SPOT, MTI, EO1, etc.) Derived Remote
Sensing (Vegetation Indices, PCA Tasseled Cap,
other) Biotic/Abiotic Data Topographic
Data Species Data Vegetation- Forest Data Soils
Characteristics Cryptobiotic Crusts Wildfire
Severity Biodiversity Air Pollution Geology,
Other Environmental Data
Hot spots of native biodiversity Distribution
of non-native species Potential spread
of invasive species. Barriers to rapid
invasions. Corridors that may accelerate
invasions. Economic and environmental risk
assessments, vulnerability of habitats to
invasion. Priorities for control and
containment.
Testing if There Is Spatial Auto-Correlation In
the Residuals
No
Final Trend Surface Map Large - Small Scale
Variability
Yes
Testing if Residuals Cross-Correlated with Other
Variables
Yes
Yes
No
Model Residuals Using Co-Kriging
Regression Trees Classifications
Model Residuals Using Kriging (Universal,
Ordinary, other)
20USGS Predictive Modeling
Output GIS - Spatial Statistical Dynamic Models
and Maps
Trend Surface Analysis With Stepwise Multiple
Regression Using OLS, GLS, SAR, or Exhaustive
Regression
Input Variables (150) Remotely Sensed
data (ETM, SPOT, MTI, EO1, etc.) Derived Remote
Sensing (Vegetation Indices, PCA Tasseled Cap,
other) Biotic/Abiotic Data Topographic
Data Species Data Vegetation- Forest Data Soils
Characteristics Cryptobiotic Crusts Wildfire
Severity Biodiversity Air Pollution Geology,
Other Environmental Data
DSSProducts
Modeling
Hot spots of native biodiversity Distribution
of non-native species Potential spread
of invasive species. Barriers to rapid
invasions. Corridors that may accelerate
invasions. Economic and environmental risk
assessments, vulnerability of habitats to
invasion. Priorities for control and
containment.
Testing if There Is Spatial Auto-Correlation In
the Residuals
No
Final Trend Surface Map Large - Small Scale
Variability
Yes
Testing if Residuals Cross-Correlated with Other
Variables
Yes
Yes
Ingest
No
Model Residuals Using Co-Kriging
Regression Trees Classifications
Model Residuals Using Kriging (Universal,
Ordinary, other)
21ISFS Architecture
22Current Statistical Modeling Array
Example Existing Model Array
Predictors f (observed values, satellite data
and/or anciallary data)
Field-measured variable of interest
GPS X and y coordinates used to extract
information from imagery and ancillary data
DEM Elevation, slope, Aspect
Landsat NDVI
Landsat Tasseled Cap bands 1-3
23Enhanced Statistical Modeling Array
Predictor N
Predictor 1
response
lon
Lat
Example Proposed Model Array
Xn1
X11
R1
Y1
X1
Xn2
X12
R2
Y2
X2
Predictors f (observed values, satellite data
and/or anciallary data)
Field-measured variable of interest
Landsat NDVI
GPS X and y coordinates used to extract
information from imagery and ancillary data
DEM Elevation, slope, Aspect
Landsat Tasseled Cap bands 1-3
New explanatory variable
2001 2002
MODIS Summary Layers
MODIS Time series
Summary Method
24Kriging residuals to account for spatially
correlated errors
Tot. Plant b0 b1 ETM b2 ELEV Kriged
Residuals
Kriged residuals
Tot. Plant b0 b1 ETM b2 ELEV
Field measurements of plant diversity within a
sample plot
25Presentation Outline
- Invasive species overview (mostly with USGS)
- NASAs work with Invasive species
- Related data sets
- Statistical Modeling
- Parallel Computing
- Future directions
26What is Kriging?
- Spatial interpolator
- A weighted linear combination of point
measurements that exploits structure of spatial
auto-correlation present in the data - Spatial structure determines the appropriate
weights for points that are close allows for
anisotropy - Spatial structure is determined by modeling the
empirical variogram auto-correlation as a
function of the separation distance - Kriging determines weights by minimizing the
variance of the errors Best Linear Unbiased
Estimator (BLUE) - An Introduction to Applied Geostatistics, Isaaks
Srivastava, 1989, Oxford University Press.
27Why Kriging?
- Stepwise regression is used to find the
relationship between field samples and remote
sensing, DEM, and ancillary data - Residuals (predictions from the stepwise
regression minus observed value) are calculated
for each sample point - Residuals are tested for spatial structure via
viewing empirical variograms and statistical
hypothesis testing (e.g. Morans I) - If spatial structure exists Kriging is used to
estimate the residual surface for the entire
study area - Kriged residual surface is then added to the
stepwise regression model to produce a final
prediction that includes both small and large
scale structure
28Why Parallel Kriging?
- Kriging step in USGS processes has presented a
major bottleneck. - Reducing the time of this computation allows
different input variables to be considered,
larger data sets to be incorporated, and more
sites/locations to be modeled. - Kriging algorithms are widely used in general and
parallel version has equally wide and general
application.
29Field Sampling in theRocky Mountain National Park
30Kriging Algorithm
- Begin with ndata samples of quantity R (e.g.
residuals) - For each pixel in the output image
- Calculate distance from pixel to each sample data
point (ndata x 1) - Sort vector to find the nn nearest neighbor
samples (nn x 1) - Calculate covariance vector Dj for nearest
neighbors (nn x 1) - Calculate covariance matrix Cij for nearest
neighbors (nn x nn) - Invert covariance matrix Cij
- Multiply by covariance vector to create weight
vector (nn x 1) - W C-1 D
- Calculate dot product of data samples and weights
to estimate R - Restimated W V
31An Elegantly Parallel Algorithm
- Parallelize using Domain Decomposition
- Each processor gets a chunk of complete rows
32Medusa / Frio Configuration
Frio on J. Schnases desk (node 0) Linux PC w/
1.2GHz Athlon processor and 1.5GB memory
Gigabit Ethernet
Medusa Beowulf Cluster at NASAs
GSFC 128-processor 1.2GHz Athlon MP 1GB memory on
each dual-cpu node 2 Gbps Myrinet internal
network
33MPI ImplementationSimple Version
- Propagate input data from node 0 to all compute
nodes - X, Y sample locations and residuals at each point
- Desired size, location, and resolution of output
Kriged image - Number of nearest neighbor samples to use
- Variogram information (nugget, sill, range, model
type) - Node 0 then starts MPI job on compute nodes
(medusa) - Each compute node then
- Determines its processor number and total number
of CPUs - Reads local input data file
- Calculates its assigned rows
- Writes its rows (subimage) to local disk when
finished - Node 0 grabs all files from each node
- Reassembles complete output image
34MPI ImplementationRefined Version
- Simple version does all computation before doing
any communication - Refined version overlaps communication w/
computation - At end of each row, each compute node issues
asynchronous send (MPI_ISEND) of the row to node
0 - Processes next row while previous row is sent to
node 0 - Issues wait/synchronize (MPI_WAIT) to verify
receipt of previous row before sending current
row. - Meanwhile, node 0
- Posts asynchronous receives from each compute
node (MPI_IRECV) - Issues MPI_WAITs to synchronize
- Builds output image row by row in memory
- Complete image is available when final compute
node finishes
35Scaling Results
- Run time scales with area Kriged
- 20482 ran 16x longer than 5122
- Nearly linear scaling with processors
36Timing ResultsWall-clock seconds on Medusa w/
Myrinet
37Scaling Curves
Processing time, wall clock seconds
Number of processors
38Speedup Results
39Scaling Efficiencies
40The Kriged ResidualsCerro Grande Fire Site
41Next Computing Steps
- Moving to Apple Xserve G5 / Xgrid Environment
- Server node 10 compute nodes for GSFC
- Dual CPU G5 processors (2 GHz, 2 GB memory)
- Gigabit ethernet connectivity
- 3 TB XServe RAID array
- Server 5 nodes for USGS
- Xgrid for pool of processors
- computing model
- MPI also available
- New systems on order
- Hope to receive in May
42Xgrid Computing Environment
- Suitable for loosely coupled distributed
computing - Controller distributes pieces/chunks of work to
agent processors - Collects results when agents finish
- Distributes more chunks to agents as they become
free (or join grid)
Xgrid controller
Server storage
Xgrid client
43Xgrid Kriging
- Divide area to be kriged into finer pieces of
work. - For example, to krig a 10242 region with 20
agents - Assign 4 - 8 rows in each piece (or chunk) of
work, yielding 128 - 256 chunks. - First 20 chunks are assigned to all agents.
- Processors are given more rows as they finish
previous chunks, which wont be all at once. - Controller assembles output image when all chunks
have been processed. - This model is straightforwardly extended to other
spatial statistics tasks, e.g. variogram mapping,
as long as images fit on single node.
44Graphic provided by NASAs Scientific
Visualization Studio
45Presentation Outline
- Invasive species overview (mostly with USGS)
- NASAs work with Invasive species
- Related data sets
- Statistical Modeling
- Parallel Computing
- Future directions
46Science plan
- Short term Challenges
- strategically focus on new variables (MODIS time
series summary methods, precipitation/meteorologic
al data) - GSLIB kriging in parallel/Xgrid
- Combinatorial screening (in lieu of stepwise
regression) for linear and logistic regression in
parallel/Xgrid - Long Range Challenges
- code for generalized least squares for linear and
logistic regression (accounting for spatial
structure in model selection) - relate empirical results to physical/mechanist
model - build modeling to forecast in space and time
based on habitat suitability/availability
47What new and improved models are needed?
- High-dimensionality, hybrid predictive models
- Temporal, mechanistic, stochastic, and
scenario-based - Combined economic and ecological modelsusing
hundreds of variables - Scalable spatio-temporal models
- Molecules, microbes, to landscapes/ecosystems
- Chemical reaction times to evolutionary/geological
times - Integrated Earth system models
- Coupled ecosystem/climate models
- Coupled terrestrial/aquatic models
-
48What new and improved measurements are needed?
- Ecosystem biophysical structure
- Biomass, vertical structure, topography, ocean
particulates, pigment florescence, trace gas
fluxes, near surface atmospheric carbon dynamics,
lake and stream chemistry, etc. - Ecosystem functional capacity / physiological
state - Pigment concentrations, live biomass, biomass
turnover rates, photosynthetic and respiratory
capacity, etc. - Biological population mapping
- Species, communities, functional-type mixtures,
etc.
49Additional data MODIS time series
0.7
0.6
0.5
0.4
0.3
0.2
11/09/2000
04/10/2001
09/09/2001
02/08/2002
07/10/2002
12/09/2002
Time in days
MODIS time series Autocorrelation function
50Additional data precipitation temperature
NOAA precipitation time series Autocorrelation
function
51Public Interface Prototype
InvasiveSpecies.gsfc.nasa.gov