Title: What is Kriging
1What is Kriging?
- Spatial interpolator
- A weighted linear combination of point
measurements that exploits structure of spatial
auto-correlation present in the data - Spatial structure determines the appropriate
weights for points that are close allows for
anisotropy - Spatial structure is determined by modeling the
empirical variogram auto-correlation as a
function of the separation distance - Kriging determines weights by minimizing the
variance of the errors Best Linear Unbiased
Estimator (BLUE) - An Introduction to Applied Geostatistics, Isaaks
Srivastava, 1989, Oxford University Press.
2Why Kriging?
- Stepwise regression is used to find the
relationship between field samples and remote
sensing, DEM, and ancillary data - Residuals (predictions from the stepwise
regression minus observed value) are calculated
for each sample point - Residuals are tested for spatial structure via
viewing empirical variograms and statistical
hypothesis testing (e.g. Morans I) - If spatial structure exists Kriging is used to
estimate the residual surface for the entire
study area - Kriged residual surface is then added to the
stepwise regression model to produce a final
prediction that includes both small and large
scale structure
3Why Parallel Kriging?
- Kriging step in USGS processes has presented a
major bottleneck. - Reducing the time of this computation allows
different input variables to be considered,
larger data sets to be incorporated, and more
sites/locations to be modeled. - Kriging algorithms are widely used in general and
parallel version has equally wide and general
application.
4Field Sampling in theRocky Mountain National Park
5Kriging Algorithm
- Begin with ndata samples of quantity R (e.g.
residuals) - For each pixel in the output image
- Calculate distance from pixel to each sample data
point (ndata x 1) - Sort vector to find the nn nearest neighbor
samples (nn x 1) - Calculate covariance vector Dj for nearest
neighbors (nn x 1) - Calculate covariance matrix Cij for nearest
neighbors (nn x nn) - Invert covariance matrix Cij
- Multiply by covariance vector to create weight
vector (nn x 1) - W C-1 D
- Calculate dot product of data samples and weights
to estimate R - Restimated W V
6An Elegantly Parallel Algorithm
- Parallelize using Domain Decomposition
- Each processor gets a chunk of complete rows
- One job per processor
7Medusa / Frio Configuration
Frio on J. Schnases desk (node 0) Linux PC w/
1.2GHz Athlon processor and 1.5GB memory
Gigabit Ethernet
Medusa Beowulf Cluster at NASAs
GSFC 128-processor 1.2GHz Athlon MP 1GB memory on
each dual-cpu node 2 Gbps Myrinet internal
network
8MPI ImplementationSimple Version
- Propagate input data from node 0 to all compute
nodes - X, Y sample locations and residuals at each point
- Desired size, location, and resolution of output
Kriged image - Number of nearest neighbor samples to use
- Variogram information (nugget, sill, range, model
type) - Node 0 then starts MPI job on compute nodes
(medusa) - Each compute node then
- Determines its processor number and total number
of CPUs - Reads local input data file
- Calculates its assigned rows
- Writes its rows (subimage) to local disk when
finished - Node 0 grabs all files from each node
- Reassembles complete output image
9MPI ImplementationRefined Version
- Simple version does all computation before doing
any communication - Refined version overlaps communication w/
computation - At end of each row, each compute node issues
asynchronous send (MPI_ISEND) of the row to node
0 - Processes next row while previous row is sent to
node 0 - Issues wait/synchronize (MPI_WAIT) to verify
receipt of previous row before sending current
row. - Meanwhile, node 0
- Posts asynchronous receives from each compute
node (MPI_IRECV) - Issues MPI_WAITs to synchronize
- Builds output image row by row in memory
- Complete image is available when final compute
node finishes - Lots of extra lines of MPI code
- Complicated logic, easy to deadlock while
developing
10Scaling Results
- Run time scales with area Kriged
- 20482 ran 16x longer than 5122
- Nearly linear scaling with processors
11The Kriged ResidualsCerro Grande Fire Site
12New Cluster Paradigm for ISFS
- We waited until end of CT funding to buy clusters
- Rewarded by new Apple Xserve G5 w/ Xgrid
Environment - Offered alternative to Beowulf PC cluster
- Server node 10 compute nodes for GSFC
- Dual CPU G5 processors (2 GHz, 1 GB memory)
- Gigabit ethernet inter-connectivity
- 3 TB XServe RAID array
- Server 5 nodes for USGS
- Xgrid offers easy pool-of- processors
computing model - MPI also available for heritage code
- Expect to receive new systems by end of month
13Xgrid Computing Environment(Distributed
Computing for the Rest of Us?)
- Suitable for loosely coupled distributed
computing - Controller distributes tasks to agent processors
(tasks include data and code) - Collects results when agents finish
- Distributes more chunks to agents as they become
free and join cluster/grid
Xgrid controller
Server storage
Xgrid client
14Xgrid Work Flow
15Cluster StatusSome Helpful Beautiful Displays
Offline ? turned off
Unavailable ? turned on, but busy w/
other non-cluster tasks
Working ? computing on this cluster job
Available ? waiting to be assigned cluster work
16Cluster Status Displays (continued)
Tachometer illustrates total processing
power available to cluster at any time.
Level will change if running on a cluster of
desktop workstations, but will stay steady if
monitoring a dedicated cluster
17Demo
18Kriging via Xgrid
- Divide area to be kriged into finer pieces of
work. - For example, to krig a 10242 region with 20
agents - Assign 4 - 8 rows in each task (pieces of
work), yielding 128-256 tasks. - First 20 tasks are assigned to all agents.
- Processors are given more tasks/rows as they
finish previous chunks, which wont be all at
once - Controller assembles output image when all rows
have been processed (simple concatenation). - About 2 dozen extra lines of code
- prompt xgrid krigxgrid cerrotp.asc cerrotp.krg
1 128 - prompt xgrid krigxgrid cerrotp.asc cerrotp.krg
2 128 - ???
- prompt xgrid krigxgrid cerrotp.asc cerrotp.krg
2 128 - prompt concatenation, etc.
19Favorable Initial Reactions to Xgrid
- Proof of Concept implementation of Kriging is
complete - Has run on ad hoc collection of 1-3 desktop
Apples running OS X - Startup overhead seems high, but should be much
less on dedicated cluster - Tuning will be required to determine optimal task
size - Can access all cluster job statuses via command
line - Should be straightforward to develop automated,
web enabled scripts - The learning curve was reasonable, but definitely
non-zero - Setting of passwords tricky, firewalls are an
issue (setup issues this AM) - User community was helpful (e.g. xgrid-users
distribution list)
- Xgrid looks promising for other
statistical/geospatial tasks - Exhaustive regression or Variogram map
generation - Any task done row by row, as long as all data
fit within a single processor - Independent simulations, e.g. parameter studies
20The Future Looks Fun
I CANT WAIT TO GET MY HANDS ON OUR CLUSTER!