Title: Spatial Interpolation: A Brief Introduction
1Spatial Interpolation A Brief Introduction
2General Outline
- Introduction to interpolation
- Deterministic interpolation methods
- Some basic statistical concepts
- Autocorrelation and First Law of Geography
- Geostatistical Interpolation
- Introduction to variography
- Kriging models
3What is Interpolation?
- Assume we are dealing with a variable which has
meaningful values at every point within a region
(e.g., temperature, elevation, concentration of
some mineral). Then, given the values of that
variable at a set of sample points, we can use an
interpolation method to predict values of this
variable at every point - For any unknown point, we take some form of
weighted average of the values at surrounding
points to predict the value at the point where
the value is unknown - In other words, we create a continuous surface
from a set of points - As an example used throughout this presentation,
imagine we have data on the concentration of gold
in western Pennsylvania at a set of 200 sample
locations
4Appropriateness of Interpolation
- Interpolation should not be used when there isnt
a meaningful value of the variable at every point
in space (within the region of interest) - That is, when points represent merely the
presence of events (e.g., crime), people, or some
physical phenomenon (e.g., volcanoes, buildings),
interpolation does not make sense. - Whereas interpolation tries to predict the value
of your variable of interest at each point,
density analysis (available, for instance, in
ArcGISs Spatial Analyst) takes known quantities
of some phenomena and spreads it across the
landscape based on the quantity that is measured
at each location and the spatial relationship of
the locations of the measured quantities. - Source http//webhelp.esri.com/arcgisdesktop/9.2/
index.cfm?TopicNameUnderstanding_density_analysis
5Interpolation vs. Extrapolation
- Interpolation is prediction within the range of
our data - E.g., having temperature values for a bunch of
locations all throughout PA, predict the
temperature values at all other locations within
PA - Note that the methods we are talking about are
strictly those of interpolation, and not
extrapolation - Extrapolation is prediction outside the range of
our data - E.g., having temperature values for a bunch of
locations throughout PA, predict the temperature
values in Kazakhstan
6First Law of Geography
- Everything is related to everything else, but
near things are more related than distant
things. - Waldo Tobler (1970)
- This is the basic premise
behind interpolation, and
near points generally
receive
higher weights
than far away points
Reference TOBLER, W. R. (1970). "A computer
movie simulating urban growth in the Detroit
region". Economic Geography, 46(2) 234-240.
7Methods of Interpolation
- Deterministic methods
- Use mathematical functions to calculate the
values at unknown locations based either on the
degree of similarity (e.g. IDW) or the degree of
smoothing (e.g. RBF) in relation with neighboring
data points. - Examples include
- Inverse Distance Weighted (IDW)
- Radial Basis Functions (RBF)
- Geostatistical methods
- Use both mathematical and statistical methods to
predict values at all locations within region of
interest and to provide probabilistic estimates
of the quality of the interpolation based on the
spatial autocorrelation among data points. - Include a deterministic component and errors
(uncertainty of prediction) - Examples include
- Kriging
- Co-Kriging
Reference http//www.crwr.utexas.edu/gis/gishydro
04/Introduction/TermProjects/Peralvo.pdf
8Exact vs. Inexact Interpolation
- Interpolators can be either exact or inexact
- At sampled locations, exact interpolators yield
values identical to the measurements. - I.e., if the observed temperature in city A is 90
degrees, the point representing city A on the
resulting grid will still have the temperature of
90 degrees - At sampled locations, inexact interpolators
predict values that are different from the
measured values. - I.e., if the observed temperature in city A is 90
degrees, the inexact interpolator will still
create a prediction for city A, and this
prediction will not be exactly 90 degrees - The resulting surface will not pass through the
original point - Can be used to avoid sharp peaks or troughs in
the output surface - Model quality can be assessed by the statistics
of the differences between predicted and measured
values - Jumping ahead, the two deterministic
interpolators that will be briefly presented here
are exact. Kriging can be exact or inexact.
Reference Burrough, P. A., and R. A. McDonnell.
1998. Principles of geographical information
systems. Oxford University Press, Oxford. 333pp.
9Part 1. Deterministic Interpolation
10Inverse Distance Weighted (IDW)
- IDW interpolation explicitly relies on the First
Law of Geography. To predict a value for any
unmeasured location, IDW will use the measured
values surrounding the prediction location.
Measured values that are nearest to the
prediction location will have greater influence
(i.e., weight) on the predicted value at that
unknown point than those that are farther away. - Thus, IDW assumes that each measured point has a
local influence that diminishes with distance (or
distance to the power of q gt 1), and weighs the
points closer to the prediction location greater
than those farther away, hence the name inverse
distance weighted. - Inverse Squared Distance (i.e., q2) is a widely
used interpolator - For example, ArcGIS allows you to select the
value of q. - Weights of each measured point are proportional
to the inverse distance raised to the power value
q. As a result, as the distance increases, the
weights decrease rapidly. How fast the weights
decrease is dependent on the value for q.
Source http//webhelp.esri.com/arcgisdesktop/9.2/
index.cfm?TopicNameHow_Inverse_Distance_Weighted_
(IDW)_interpolation_works
11Inverse Distance Weighted - Continued
- Because things that are close to one another are
more alike than those farther away, as the
locations get farther away, the measured values
will have little relationship with the value of
the prediction location. - To speed up the computation we might only use
several points that are the closest - As a result, it is common practice to limit the
number of measured values that are used when
predicting the unknown value for a location by
specifying a search neighborhood. The specified
shape of the neighborhood restricts how far and
where to look for the measured values to be used
in the prediction. Other neighborhood parameters
restrict the locations that will be used within
that shape. - The output surface is sensitive to clustering and
the presence of outliers.
12Search Neighborhood Specification
5 nearest neighbors with known values (shown in
red) of the unknown point (shown in
black) will be used to determine its value
Points with known values of elevation that are
outside the circle are just too far from the
target point at which the elevation value is
unknown, so their weights are pretty much 0.
13The Accuracy of the Results
- One way to assess the accuracy of the
interpolation is known as cross-validation - Remember the initial goal use all the measured
points to create a surface - However, assume we remove one of the measured
points from our input, and re-create the surface
using all the remaining points. - Now, we can look at the predicted value at that
removed point and compare it to the points
actual value! - We do the same thing for all the points
- If the average (squared) difference between the
actual value and the prediction is small, then
our model is doing a good job at predicting
values at unknown points. If this average squared
difference is large, then the model isnt that
great. This average squared difference is called
mean square error of prediction. For instance,
the Geostatistical Analyst of ESRI reports the
square root of this average squared difference - Cross-validation is used in other interpolation
methods as well
14A Cross-Validation Example
- Assume you have measurements at 15 data points,
from which you want to create a prediction
surface - The Measured column tells you the measured value
at that point. The Predicted column tells you the
prediction at that point when we remove it from
the input (i.e., use the other 14 points to
create a surface). The Error column is simply the
difference between the measured and predicted
values. - Because we can have an over-prediction or
under-prediction at any point, the error can be
positive or negative. So averaging the errors
wont do us much good if we want to see the
overall error well end up with a value that is
essentially zero due to these positives and
negatives - Thus, in order to assess the extent of error in
our prediction, we square each term, and then
take the average of these squared errors. This
average is called the mean squared error (MSE) - For example, ArcGIS reports the square root of
this mean squared error (referred to as simply
Root-Mean-Square in Geostatistical Analyst). This
root mean square error is often denoted as RMSE.
15Examples of IDW with Different qs
- Larger qs (i.e., power to which distance is
raised) yield smoother surfaces - Food for thought What happens when q is set to 0?
16Part 2. A Review of Stats 101
17Before we do any Geostatistics
- Lets review some basic statistical topics
- Normality
- Variance and Standard Deviations
- Covariance and Correlation
- and then briefly re-examine the underlying
premise of most spatial statistical analyses - Autocorrelation
18Normality
- A lot of statistical tests including many in
geostatistics rely on the assumption that the
data are normally distributed - When this assumption does not hold, the results
are often inaccurate
19(No Transcript)
20Data Transformations
- Sometimes, it is possible to transform a
variables distribution by subjecting it to some
simple algebraic operation. - The logarithmic transformation is the most widely
used to achieve normality when the variable is
positively skewed (as in the image on the left
below) - Analysis is then performed on the transformed
variable.
21The Mean and the Variance
22Example Calculation of Mean and Variance
Person Test Score Distance from the Mean (Distance from the Mean) Squared
1 90 15 225
2 55 -20 400
3 100 25 625
4 55 -20 400
5 85 10 100
6 70 -5 25
7 80 5 25
8 30 -45 2025
9 95 20 400
10 90 15 225
 Mean 75  Variance 445 (Average of the entries in this column)
   Standard deviation (Square root of the variance) 21.1
23Covariance and Correlation
- Defined as a measure of how much two variables X
and Y change together - The units of Cov (X, Y) are those of X multiplied
by those of Y - The covariance of a variable X with itself is
simply the variance of X - Since these units are fairly obscure, a
dimensionless measure of the strength of the
relationship between variables is often used
instead. This measure is known as the
correlation. - Correlations range from -1 to 1, with positive
values close to one indicating a strong direct
relationship and negative values close to -1
indicating a strong inverse relationship
24Spatial Autocorrelation
- Sometimes, rather than examining the association
between two variables, we might look at the
relationship of values within a single variable
at different time points or locations - There is said to be (positive) autocorrelation in
a variable if observations that are closer to
each other in space have related values (recall
Toblers Law) - As an aside, there could also be temporal
autocorrelation i.e., values of a variable at
points close in time will be related
25Examples of Spatial Autocorrelation
(Source http//image.weather.com/images/maps/curr
ent/acttemp_720x486.jpg)
26Examples of Spatial Autocorrelation (Contd)
(Source http//capita.wustl.edu/CAPITA/CapitaRepo
rts/localPM10/gifs/elevatn.gif)
27Regression
- A statistical method used to examine the
relationship between a variable of interest and
one or more explanatory variables - Strength of the relationship
- Direction of the relationship
- Often referred to as Ordinary Least Squares (OLS)
regression - Available in all statistical packages
- Note that the presence of a relationship does not
imply causality
28For the purposes of demonstration, lets focus on
a simple version of this problem
- Variable of interest (dependent variable)
- E.g., education (years of schooling)
- Explanatory variable (AKA independent variable or
predictor) - E.g., Neighborhood Income
29But what does a regression do? An example with a
single predictor
30The example on the previous page can be easily
extended to cases when we have more than one
predictor
- When we have ngt1 predictors, rather than getting
a line in 2 dimensions, we get a line in n1
dimensions (the 1 accounts for the dependent
variable) - Each independent variable will have its own slope
coefficient which will indicate the relationship
of that particular predictor with the dependent
variable, controlling for all other independent
variables in the regression. - The equation of the best fit line becomes
- Dep. Variable m1predictor1 m2predictor2
m3predictor 3 b residuals - where the ms are the coefficients of the
corresponding predictors and b is the y-intercept
term - The coefficient of each predictor may be
interpreted as the amount by which the dependent
variable changes as the independent variable
increases by one unit (holding all other
variables constant)
31Some (Very) Basic Regression Diagnostics
- R-squared the percent of variance in the
dependent variable that is explained by the
independent variables - The so-called p-value of the coefficient
- The probability of getting a coefficient (slope)
value as far from zero as we observe in the case
when the slope is actually zero - When p is less than 0.05, the independent
variable is considered to be a statistically
significant predictor of the dependent variable - One p-value per independent variable
- The sign of the coefficient of the independent
variable (i.e., the slope of the regression line)
- One coefficient per independent variable
- Indicates whether the relationship between the
dependent and independent variables is positive
or negative - We should look at the sign when the coefficient
is statistically significant
32Some (but not all) regression assumptions
- The dependent variable should be normally
distributed (i.e., the histogram of the variable
should look like a bell curve) - Very importantly, the observations should be
independent of each other. (The same holds for
regression residuals). If this assumption is
violated, our coefficient estimates could be
wrong!
33Part 3. Geostatistical Interpolation
34Origins
- Involve a set of statistical techniques called
Kriging (there are a bunch of different Kriging
methods) - Kriging is named after Danie Gerhardus Krige, a
South African mining engineer who presented the
ideas in his masters thesis in 1951. These ideas
were later formalized by a prominent French
mathematician Georges Matheron - For more information, see
- Krige, Danie G. (1951). "A statistical approach
to some basic mine valuation problems on the
Witwatersrand". J. of the Chem., Metal. and
Mining Soc. of South Africa 52 (6) 119139. - Matheron, Georges (1962). Traité de
géostatistique appliquée, Editions Technip,
France - Kriging has two parts the quantification of the
spatial structure in the data (called
variography) and prediction of values at unknown
points
Souce of this information http//en.wikipedia.org
/wiki/Daniel_Gerhardus_Krige
35Motivating Example Ordinary Kriging
- Imagine we have data on the concentration of gold
(denote it by Y) in western Pennsylvania at a set
of 200 sample locations (call them points
p1p200). - Since Y has a meaningful value at every point,
our goal is to create a prediction surface for
the entire region using these sample points - Notation In this western PA region, Y(p) will
denote the concentration level of gold at any
point p.
36Global and Local Structure
- Without any a priori knowledge about the
distribution of gold in Western PA, we have no
theoretical reason to expect to find different
concentrations of gold at different locations in
that region. - I.e., theoretically, the expected value of gold
concentration should not vary with latitude and
longitude - In other words, we would expect that there is
some general, average, value of gold
concentration (called global structure) that is
constant throughout the region (even though we
assume its constant, we do not know what its
value is) - Of course, when we look at the data, we see that
there is some variability in the gold
concentrations at different points. We can
consider this to be a local deviation from the
overall global structure, known as the local
structure or residual or error term. - In other words, geostatisticians would decompose
the value of gold Y(p) into the global structure
µ(p) and local structure e(p). - Y(p) µ(p) e(p)
37e(p)
- As per the First Law of Geography, the local
structures e(p) of nearby observations will often
be correlated. That is, there is still some
meaningful information (i.e., spatial
dependencies) that can be extracted from the
spatially dependent component of the residuals. - So, our ordinary kriging model will
- Estimate this constant but unknown global
structure µ(p), and - Incorporate the dependencies among the residuals
e(p). Doing so will enable us to create a
continuous surface of gold concentration in
western PA.
38Assumptions of Ordinary Kriging
- For the sake of the methods that we will be
employing, we need to make some assumptions - Y(p) should be normally distributed
- The global structure µ(p) is constant and unknown
(as in the gold example) - Covariance between values of e depends only on
distance between the points, - To put it more formally, for each distance h and
each pair of locations p and t within the region
of interest that are h units are apart, there
exists a common covariance value, C(h), such that
covariance e(p), e(t) C(h). - This is called isotropy
39Covariance and Distance
- From the First Law of Geography it would then
follow that as distance between points increases,
the similarity (i.e., covariance or correlation)
between the values at these points decreases - If we plot this out, with inter-point distance h
on the x-axis, and covariance C(h) on the y-axis,
we get a graph that looks something like the one
below. This representation of covariance as a
function of distance is called as the covariogram - Alternatively, we can plot correlation against
distance (the correlogram)
40Covariograms and Weights
- Geostatistical methods incorporate this
covariance-distance relationship into the
interpolation models - More specifically, this information is used to
calculate the weights - As IDW, kriging is a weighted average of points
in the vicinity - Recall that in IDW, in order to predict the value
at an unknown point, we assume that nearer points
will have higher weights (i.e., weights are
determined based on distance) - In geostatistical techniques, we calculate the
distances between the unknown point at which we
want to make a prediction and the measured points
nearby, and use the value of the covariogram for
those distances to calculate the weight of each
of these surrounding measured points. - I.e., the weight of a point h units away will
depend on the value of C(h)
41But
- Unfortunately, it so happens that one generally
cannot estimate covariograms and correlograms
directly - For that purpose, a related function of distance
(h) called the semi-variogram (or simply the
variogram) is calculated - The variogram is denoted by ?(h)
- One can easily obtain the covariogram from the
variogram (but not the other way around) - Covariograms and variograms tell us the spatial
structure of the data
42Interpretation of Variograms
- As mentioned earlier, a covariogram might be
thought of as covariance (i.e., similarity)
between point values as a function of distance,
such that C(h) is greater at smaller distances - A variogram, on the other hand, might be thought
of as dissimilarity between point values as a
function of distance, such that the
dissimilarity is greater for points that are
farther apart - Variograms are usually interpreted in terms of
the corresponding covariograms or correlograms - A common mistake when interpreting variograms is
to say that variance increases with distance.
43Bandwidth (The Maximum Value of h)
- When there are n points, the number of
inter-point distances is equal to - Example
- With 15 points, we have 15(15-1)/2 105
inter-point distances (marked in yellow on the
grid in the lower left) - Since were using Euclidean distance, the
distance between points 1 and 2 is the same as
the distance between points 2 and 1, so we count
it only once. Also, the distance between a point
and itself will always be zero, and is of no
interest here. - The maximum distance h on a covariogram or
variogram is called the bandwidth, and should
equal half the maximum inter-point distance. - In the figure on the lower right, the blue line
connects the points that are the farthest away
from each other. The bandwidth in this example
would then equal to half the length of the blue
line
h
44Mathematical definition of a variogram
- In other words, for each distance h between 0 and
the bandwidth - Find all pairs of points i and j that are
separated by that distance h - For each such point pair, subtract the value of Y
at point j from the value of Y at point i, and
square the difference - Average these square distances across all point
pairs and divide the average by 2. Thats your
variogram value! - Division by 2 -gt hence the occasionally used name
semi-variogram - However, in practice, there will generally be
only one pair of points that are exactly h units
apart, unless were dealing with regularly spaced
samples. Therefore, we create bins, or distance
ranges, into which we place point pairs with
similar distances, and estimate ? only for
midpoints of these bins rather than at all
individual distances. - These bins are generally of the same size
- Its a rule of thumb to have at least 30 point
pairs per bin - We call these estimates of ?(h) at the bin
midpoints the empirical variogram
45Fitting a Variogram Model
- Now, were going to fit a variogram model (i.e.,
curve) to the empirical variogram - That is, based on the shape of the empirical
variogram, different variogram curves might be
fit - The curve fitting generally employs the method of
least squares the same method thats used in
regression analysis
A very comprehensive guide on variography by Dr.
Tony Smith (University of Pennsylvania)
http//www.seas.upenn.edu/ese502/NOTEBOOK/Part_II
/4_Variograms.pdf
46The Variogram Parameters
- The variogram models are a function of three
parameters, known as the range, the sill, and the
nugget. - The range is typically the level of h at the
correlation between point values is zero (i.e.,
there is no longer any spatial autocorrelation) - The value of ? at r is called the sill, and is
generally denoted by s - The variance of the sample is used as an estimate
of the sill - Different models have slightly different
definitions of these parameters - The nugget deserves a slide of its own
Graph taken from http//www.geog.ubc.ca/courses/g
eog570/talks_2001/Variogr1neu.gif
47Spatial Independence at Small Distances
- Even though we assume that values at points that
are very near each other are correlated, points
that are separated by very, very small values
might be considerably less correlated - E.g. you might find a gold nugget and no more
gold in the vicinity - In other words, even though ?(0) is always 0,
however ? at very, very small distances will be
equal to a value a that is considerably greater
than 0. - This value denoted by a is called the nugget
- The ratio of the nugget to the sill is known as
the nugget effect, and may be interpreted as the
percentage of variation in the data that is not
spatial - The difference between the sill and the nugget is
known as the partial sill - The partial sill, and not the sill itself, is
reported in GeoStatistical Analyst
48Pure Nugget Effect Variograms
- Pure nugget effect is when the covariance between
point values is zero at all distances h - That is, there is absolutely no spatial
autocorrelation in the data (even at small
distances) - Pure nugget effect covariogram and variogram are
presented below - Interpolation wont give a reasonable predictions
- Most cases are not as extreme and have both a
spatially dependent and a spatially independent
component, regardless of variogram model chosen
(discussed on following slides)
49The Spherical Model
- The spherical model is the most widely used
variogram model - Monotonically non-decreasing
- I.e., as h increases, the value of ?(h) does not
decrease - i.e., it goes up (until hr) or stays
the same (hgtr) - ?(hr)s and C(hr)0
- That is, covariance is assumed to be exactly zero
at distances hr
50The Exponential Model
- The exponential variogram looks very similar to
the spherical model, but assumes that the
correlation never reaches exactly zero,
regardless of how great the distances between
points are - In other words, the variogram approaches the
value of the sill asymptotically - Because the sill is never actually reached, the
range is generally considered to be the smallest
distance after which the covariance is 5 or less
of the maximum covariance - The model is monotonically increasing
- I.e., as h goes up, so does ?(h)
51The Wave (AKA Hole-Effect) Model
On the picture to the left, the waves exhibit a
periodic pattern. A non-standard form of spatial
autocorrelation applies. Peaks are similar in
values to other peaks, and troughs are similar in
values to other troughs. However, note the
dampening in the covariogram and variogram below
That is, peaks that are closer together have
values that are more correlated than peaks that
are father apart (and same holds for troughs).
More is said about the applicability of these
models in ttp//www.gaa.org.au/pdf/gaa_pyrcz_deuts
ch.pdf Variogram graph edited slightly from
http//www.seas.upenn.edu/ese502/NOTEBOOK/Part_II
/4_Variograms.pdf
52Variograms and Kriging Weights
53Reviewing Ordinary Kriging
- Again, ordinary kriging will
- Give us an estimate of the constant but unknown
global structure µ(p), and - Use variography to examine the dependencies among
the residuals e(p) and to create kriging weights. - We calculate the distances between the unknown
point at which we want to make a prediction and
the measured points that are nearby and use the
value of the covariogram for those distances to
calculate the weight of each of these surrounding
measured points. - The end result is, of course, a continuous
prediction surface - Prediction standard errors can also be obtained
this is a surface indicating the accuracy of
prediction
54Universal Kriging
- Now, take another example imagine we have data
on the temperature at 100 different weather
stations (call them w1..w100) throughout Florida,
and we want to predict the values of temperature
(T) at every point w in the entire state using
these data. - Notation temperature at point w is denoted by
T(w) - We know that temperature at lower latitudes are
expected to be higher. So, T(w) will be expected
to vary with latitude - Ordinary kriging is not appropriate here, because
it assumes that the global structure is the same
everywhere. This is clearly not the case here. - A method called universal kriging allows for a
non-constant global structure - We might model the global structure µ as in
regression - Everything else in universal kriging is pretty
much the same as in ordinary kriging (e.g.,
variography)
55Some More Advanced Techniques
- Indicator Kriging is a geostatistical
interpolation method does not require the data to
be normally distributed. - Co-kriging is an interpolation technique that is
used when there is a second variable that is
strongly correlated with the variable from which
were trying to create a surface, and which is
sampled at the same set of locations as our
variable of interest and at a number of
additional locations. - For more details on indicator kriging and
co-kriging, see one of the texts suggested at the
end of this presentation
56Isotropy vs. Anisotropy
- When we use isotropic (or omnidirectional)
covariograms, we assume that the covariance
between the point values depends only on distance - Recall the covariance stationarity assumption
- Anisotropic (or directional) covariograms are
used when we have reason to believe that
direction plays a role as well (i.e., covariance
is a function of both distance and direction) - E.g., in some problems, accounting for direction
is appropriate (e.g., when wind or water currents
might be a factor)
For more on anisotropic variograms, see
http//web.as.uky.edu/statistics/users/yzhen8/STA6
95/lec05.pdf
57IDW vs. Kriging
- We get a more natural look to the data with
Kriging - You see the bulls eye effect in IDW but not (as
much) in Kriging - Helps to compensate for the effects of data
clustering, assigning individual points within a
cluster less weight than isolated data points
(or, treating clusters more like single points) - Kriging also give us a standard error
- If the data locations are quite dense and
uniformly distributed throughout the area of
interest, we will get decent estimates regardless
of which interpolation method we choose. - On the other hand, if the data locations fall in
a few clusters and there are gaps in between
these clusters, we will obtain pretty unreliable
estimates regardless of whether we use IDW or
Kriging.
These are interpolation results using the gold
data in Western PA (IDW vs. Ordinary Kriging)
58Some Widely Used Texts on Geostatistics
- Bailey, T.C. and Gatrell, A.C. (1995) Interactive
Spatial Data Analysis. Addison Wesley Longman,
Harlow, Essex. - Cressie, N.A.C. (1993) Statistics for Spatial
Data. (Revised Edition). Wiley, John Sons,
Inc., - Isaaks, E.H. and Srivastava, R.M. (1989) An
Introduction to Applied Geostatistics. Oxford
University Press, New York, 561 p.