What is Kriging - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

What is Kriging

Description:

... the appropriate weights for points that are 'close' allows for anisotropy. Spatial structure is determined by modeling the empirical variogram auto ... – PowerPoint PPT presentation

Number of Views:759
Avg rating:3.0/5.0
Slides: 21
Provided by: johnls7
Category:

less

Transcript and Presenter's Notes

Title: What is Kriging


1
What is Kriging?
  • Spatial interpolator
  • A weighted linear combination of point
    measurements that exploits structure of spatial
    auto-correlation present in the data
  • Spatial structure determines the appropriate
    weights for points that are close allows for
    anisotropy
  • Spatial structure is determined by modeling the
    empirical variogram auto-correlation as a
    function of the separation distance
  • Kriging determines weights by minimizing the
    variance of the errors Best Linear Unbiased
    Estimator (BLUE)
  • An Introduction to Applied Geostatistics, Isaaks
    Srivastava, 1989, Oxford University Press.

2
Why Kriging?
  • Stepwise regression is used to find the
    relationship between field samples and remote
    sensing, DEM, and ancillary data
  • Residuals (predictions from the stepwise
    regression minus observed value) are calculated
    for each sample point
  • Residuals are tested for spatial structure via
    viewing empirical variograms and statistical
    hypothesis testing (e.g. Morans I)
  • If spatial structure exists Kriging is used to
    estimate the residual surface for the entire
    study area
  • Kriged residual surface is then added to the
    stepwise regression model to produce a final
    prediction that includes both small and large
    scale structure

3
Why Parallel Kriging?
  • Kriging step in USGS processes has presented a
    major bottleneck.
  • Reducing the time of this computation allows
    different input variables to be considered,
    larger data sets to be incorporated, and more
    sites/locations to be modeled.
  • Kriging algorithms are widely used in general and
    parallel version has equally wide and general
    application.

4
Field Sampling in theRocky Mountain National Park
5
Kriging Algorithm
  • Begin with ndata samples of quantity R (e.g.
    residuals)
  • For each pixel in the output image
  • Calculate distance from pixel to each sample data
    point (ndata x 1)
  • Sort vector to find the nn nearest neighbor
    samples (nn x 1)
  • Calculate covariance vector Dj for nearest
    neighbors (nn x 1)
  • Calculate covariance matrix Cij for nearest
    neighbors (nn x nn)
  • Invert covariance matrix Cij
  • Multiply by covariance vector to create weight
    vector (nn x 1)
  • W C-1 D
  • Calculate dot product of data samples and weights
    to estimate R
  • Restimated W V

6
An Elegantly Parallel Algorithm
  • Parallelize using Domain Decomposition
  • Each processor gets a chunk of complete rows
  • One job per processor

7
Medusa / Frio Configuration
Frio on J. Schnases desk (node 0) Linux PC w/
1.2GHz Athlon processor and 1.5GB memory
Gigabit Ethernet
Medusa Beowulf Cluster at NASAs
GSFC 128-processor 1.2GHz Athlon MP 1GB memory on
each dual-cpu node 2 Gbps Myrinet internal
network
8
MPI ImplementationSimple Version
  • Propagate input data from node 0 to all compute
    nodes
  • X, Y sample locations and residuals at each point
  • Desired size, location, and resolution of output
    Kriged image
  • Number of nearest neighbor samples to use
  • Variogram information (nugget, sill, range, model
    type)
  • Node 0 then starts MPI job on compute nodes
    (medusa)
  • Each compute node then
  • Determines its processor number and total number
    of CPUs
  • Reads local input data file
  • Calculates its assigned rows
  • Writes its rows (subimage) to local disk when
    finished
  • Node 0 grabs all files from each node
  • Reassembles complete output image

9
MPI ImplementationRefined Version
  • Simple version does all computation before doing
    any communication
  • Refined version overlaps communication w/
    computation
  • At end of each row, each compute node issues
    asynchronous send (MPI_ISEND) of the row to node
    0
  • Processes next row while previous row is sent to
    node 0
  • Issues wait/synchronize (MPI_WAIT) to verify
    receipt of previous row before sending current
    row.
  • Meanwhile, node 0
  • Posts asynchronous receives from each compute
    node (MPI_IRECV)
  • Issues MPI_WAITs to synchronize
  • Builds output image row by row in memory
  • Complete image is available when final compute
    node finishes
  • Lots of extra lines of MPI code
  • Complicated logic, easy to deadlock while
    developing

10
Scaling Results
  • Run time scales with area Kriged
  • 20482 ran 16x longer than 5122
  • Nearly linear scaling with processors

11
The Kriged ResidualsCerro Grande Fire Site
12
New Cluster Paradigm for ISFS
  • We waited until end of CT funding to buy clusters
  • Rewarded by new Apple Xserve G5 w/ Xgrid
    Environment
  • Offered alternative to Beowulf PC cluster
  • Server node 10 compute nodes for GSFC
  • Dual CPU G5 processors (2 GHz, 1 GB memory)
  • Gigabit ethernet inter-connectivity
  • 3 TB XServe RAID array
  • Server 5 nodes for USGS
  • Xgrid offers easy pool-of- processors
    computing model
  • MPI also available for heritage code
  • Expect to receive new systems by end of month

13
Xgrid Computing Environment(Distributed
Computing for the Rest of Us?)
  • Suitable for loosely coupled distributed
    computing
  • Controller distributes tasks to agent processors
    (tasks include data and code)
  • Collects results when agents finish
  • Distributes more chunks to agents as they become
    free and join cluster/grid

Xgrid controller
Server storage
Xgrid client
14
Xgrid Work Flow
15
Cluster StatusSome Helpful Beautiful Displays
Offline ? turned off
Unavailable ? turned on, but busy w/
other non-cluster tasks
Working ? computing on this cluster job
Available ? waiting to be assigned cluster work
16
Cluster Status Displays (continued)
Tachometer illustrates total processing
power available to cluster at any time.
Level will change if running on a cluster of
desktop workstations, but will stay steady if
monitoring a dedicated cluster
17
Demo
18
Kriging via Xgrid
  • Divide area to be kriged into finer pieces of
    work.
  • For example, to krig a 10242 region with 20
    agents
  • Assign 4 - 8 rows in each task (pieces of
    work), yielding 128-256 tasks.
  • First 20 tasks are assigned to all agents.
  • Processors are given more tasks/rows as they
    finish previous chunks, which wont be all at
    once
  • Controller assembles output image when all rows
    have been processed (simple concatenation).
  • About 2 dozen extra lines of code
  • prompt xgrid krigxgrid cerrotp.asc cerrotp.krg
    1 128
  • prompt xgrid krigxgrid cerrotp.asc cerrotp.krg
    2 128
  • ???
  • prompt xgrid krigxgrid cerrotp.asc cerrotp.krg
    2 128
  • prompt concatenation, etc.

19
Favorable Initial Reactions to Xgrid
  • Proof of Concept implementation of Kriging is
    complete
  • Has run on ad hoc collection of 1-3 desktop
    Apples running OS X
  • Startup overhead seems high, but should be much
    less on dedicated cluster
  • Tuning will be required to determine optimal task
    size
  • Can access all cluster job statuses via command
    line
  • Should be straightforward to develop automated,
    web enabled scripts
  • The learning curve was reasonable, but definitely
    non-zero
  • Setting of passwords tricky, firewalls are an
    issue (setup issues this AM)
  • User community was helpful (e.g. xgrid-users
    distribution list)
  • Xgrid looks promising for other
    statistical/geospatial tasks
  • Exhaustive regression or Variogram map
    generation
  • Any task done row by row, as long as all data
    fit within a single processor
  • Independent simulations, e.g. parameter studies

20
The Future Looks Fun
I CANT WAIT TO GET MY HANDS ON OUR CLUSTER!
Write a Comment
User Comments (0)
About PowerShow.com