Spatial Data Analysis: Surfaces - PowerPoint PPT Presentation

About This Presentation
Title:

Spatial Data Analysis: Surfaces

Description:

Each subregion is described by is a statistical distribution Zi ... Escala de polui o. Zona A. Zona B. rea Polu da. Regionalized Variable. Z(x) = m(x) (x) ... – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 67
Provided by: gcam9
Category:

less

Transcript and Presenter's Notes

Title: Spatial Data Analysis: Surfaces


1
Spatial Data Analysis Surfaces
2
Model-Driven Approaches
  • Model of discrete spatial variation
  • Each subregion is described by is a statistical
    distribution Zi
  • e.g., homicides numbers are Poisson (?, ?).
  • The main objective of the analysis is to estimate
    the joint distribution of random variables Z
    Z1,,Zn
  • Model of continuous spatial variation
  • All of the area is a continuous surface
  • The main objective is to estimate the
    distribution Z(x), x ? A

3
Models of Discrete Spatial Variation
Random variable in area i
  • n of ill people
  • n of newborn babies
  • per capita income

4
Models of Continuous Spatial Variation
Temperature, Water ph, soil acidity...
Sampling stations in locations marked by
Location to predict value shown as
5
From Areas to Surfaces
Polygon data
Sample generation
X,Y,Z
X,Y,Z
X,Y,Z
X,Y,Z
Samples
X,Y,Z
geoestatistics
superfície contínua / grade
6
From Areas to Surfaces
Space as a planar subdivision
7
From Areas to Surfaces
Space as a planar subdivision
Space as a continuos surface
8
From Areas to Surfaces
9
Geostatistics
  • Applicable to spatial distributions (fields)
  • Typical situation
  • interpolation from field samples

Water Availabilty Index
Estimated Surface
Estimated Uncertainty
10
What is Geostatistics?
  • Analysis and inference of continuously-distributed
    variables
  • Pollution, Zync concentration, infant mortality
    rate
  • Analysis
  • Describing the spatial variability of the
    phenomenon under study estudar ou descrever
  • Inference
  • Estimating the unknown values





















Study area
Field Samples




























































Inferences

























11
Why Geostatistics ?
  • Techniques appropriate to statistical estimation
    of spatial phenomena

Deterministic Procedures
Study area
G e o e s t a t i s t i c s
Field samples
12
Thinking spatially
Z1 N(?1, ?1)
?1 ?2 ?1 ?2 corr(Z1, Z2) f(h)
Z2 N (?2, ?2)
How are they distributed? How are they related to
each other? How can I infer a distribution from
one sample?
13
Steps of the Geostatistical Process
DATA
Exploratory Analysis
Structural Analysis
Inference and Interpolation
RESULTS
14
Concept of a Regionalized Variable

Zona B
Área Poluída
Escala de poluição
Zona A
-
  • Regionalized Variable structure randomness
  • Structure
  • Global distribution of natural phenomena
  • Average value of a phenomena in a given area is
    constant
  • Random
  • Local variation within a given area
  • Values fluctuate around a mean

15
Regionalized Variable
  • Z(x) m(x) ??(x) ??
  • m(x) structural component (constant mean value)
  • ??(x) random component, spatially variant
    around m(x)
  • ?? uncorrelated random noise

Zona B
m(x)
e(x)
Zona A
e
16
Geostatistics
  • Each position on the field is a random variable
  • E extent of the field
  • ? u ? E, Z(u) is a random variable
  • Each measurement is a realization of a random
    variable
  • Let z(u1), ...z(un) be the set of measures
  • Then, z(u?) is a realization of Z(u?), ? 1,..,n
  • Problem
  • How can we estimate the joint distribution?

17
Uncertainty the Statistical Approach
  • Basic hypothesis
  • Difference in values are similar for similar
    distances
  • We call this a stationary spatial process
  • We can find the structure of a stationary
    spatial process using a very simple technique
  • The variogram

Var Z(uh) Z(u) 2?(h)
18
EXPERIMENTAL SEMIVARIOGRAM
is the number of pairs of samples
separated by
19
Building the Experimental Semivariogram
  • Step 1 (optional) Transforming area maps in
    samples

20
Building the Experimental Semivariogram
  • Step 2 Measuring spatial variation
  • For each pair Z(x) and Z(xh), sepated by a
    distance h, we measure the square of the
    difference between them

Vetor distância h
h
a
21
VARIOGRAMAS DO I.D.H.
22
Spatial Model Fitting for Variograms
  • After building an experimental variogram, we need
    to fit a theoretical function in order to model
    the spatial variation
  • The adjustment procedure is interactive, where
    the user selects the theoretical model that best
    fits his data.
  • Some useful models
  • Gaussian, Exponential, Spherical models

23
Fitting the Semivariogram
24
Plotting the variogram
25
Analysing the variogram
  • Later we will look at fitting a model to the
    variogram but even without a model we can notice
    some features, which we define here only
    qualitatively
  • Sill maximum semi-variance represents
    variability in the absence of spatial dependence
  • Range separation between point-pairs at which
    the sill is reached distance at which there is
    no evidence of spatial dependence
  • Nugget semi-variance as the separation
    approaches zero represents variability at a
    point that cant be explained by spatial
    structure.
  • In the previous slide, we can estimate the sill ?
    1.9, the range ? 1200 m, and the nugget ? 0.5
    i.e. 25 of the sill.

26
Using the experimental variogram to model the
random process
  • Notice that the semivariance of the separation
    vector g(h) is now given as the estimate of
    covariance in the spatial field.
  • So it models the spatially-correlated component
    of the regionalized variable
  • We must go from the experimental variogram to a
    variogram model in order to be able to model the
    random process at any separation.

27
Modelling the variogram
  • From the empirical variogram we now derive a
    variogram model which expresses semivariance as a
    function of separation vector. It allows us to
  • Infer the characteristics of the underlying
    process from the functional form and its
    parameters
  • Compute the semi-variance between any point-pair,
    separated by any vector
  • Interpolate between sample points using an
    optimal interpolator (kriging)

28
Authorized Models
  • Any variogram function must be able to model the
    following
  • 1. Monotonically increasing
  • possibly with a fluctuation (hole)
  • 2. Constant or asymptotic maximum (sill)
  • 3. Non-negative intercept (nugget)
  • 4. Anisotropy
  • Variograms must obey mathematical constraints so
    that the resulting kriging equations are solvable
    (e.g., positive definite between-sample
    covariance matrices).
  • The permitted functions are called authorized
    models.

29
Spherical Model
g
Sill
C1
C Co C1
Co
h
a
30
Exponential Model
g
C1
Co
h
a
31
Gaussian Model
g
C1
Co
a
h
32
What sample size to fit a variogram model?
  • Cant use non-spatial formulas for sample size,
    because spatial samples are correlated, and each
    sample is used multiple times in the variogram
    estimate
  • No way to estimate the true error, since we have
    only one realisation
  • Stochastic simulation from an assumed true
    variogram suggests
  • lt 50 points not at all reliable
  • 100 to 150 points more or less acceptable
  • gt 250 points almost certaintly reliable
  • More points are needed to estimate an anisotropic
    variogram.
  • This is very worrying for many environmental
    datasets (soil cores, vegetation plots, . . . )
    especially from short-term fieldwork, where
    sample sizes of 40 60 are typical. Should
    variograms even be attempted on such small
    samples?

33
Cross Validation
  • Re-estimate the samples to find errors in the
    model

Variogram Model
  • Error Statistics
  • Error Histogram
  • Erro Spatial diagram
  • observed x estimated value

1
2
5
3
4
?
?
?
?
?
NO
OK?
Yes
34
Cross Validation
35
Approaches to spatial prediction
  • This is the prediction of the value of some
    variable at an unsampled point, based on the
    values at the sampled points.
  • This is often called interpolation, but strictly
    speaking that is only for points that are
    geographically inside the sample set (otherwise
    it is extrapolation.

36
Approaches to prediction Local predictors
  • Value of the variable is predicted from nearby
    samples
  • Example concentrations of soil constituents
    (e.g. salts, pollutants)
  • Example vegetation density

37
Local Predictors
  • Each interpolator has its own assumptions, i.e.
    theory of spatial variability
  • Nearest neighbour
  • Average within a radius
  • Average of n nearest neighbours
  • Distance-weighted average within a radius
  • Distance-weighted average of n nearest neighbours
  • Optimal weighting -gt Kriging

38
Ordinary Kriging
  • The theory of regionalised variables leads to an
    optimal interpolation method, in the sense that
    the prediction variance is minimized.
  • This is based on the theory of random functions,
    and requires certain assumptions.

39
Optimal local interpolation motivation
  • Problems with average-in-circle methods
  • 1. No objective way to select radius of circle or
    number of points
  • Problems with inverse-distance methods
  • 1. How to choose power (inverse, inverse squared
    . . . )?
  • 2. How to choose limiting radius?
  • In both cases
  • 1. Uneven distribution of samples could over or
    underemphasize some parts of the field
  • 2. prediction error must be estimated from a
    separate validation dataset

40
An optimal local predictor would have these
features
  • Prediction is made as a linear combination of
    known data values (a weighted average).
  • Prediction is unbiased and exact at known points
  • Points closer to the point to be predicted have
    larger weights
  • Clusters of points reduce to single equivalent
    points, i.e., over-sampling in a small area cant
    bias result
  • Closer sample points mask further ones in the
    same direction
  • Error estimate is based only on the sample
    configuration, not the data values
  • Prediction error should be as small as possible.

41
Kriging
  • A Best Linear Unbiased Predictor (BLUP) that
    satisfies certain criteria for optimality.
  • It is only optimal with respect to the chosen
    model!
  • Based on the theory of random processes, with
    covariances depending only on separation (i.e. a
    variogram model)
  • Theory developed several times (Kolmogorov
    1930s, Wiener 1949) but current practise dates
    back to Matheron (1963), formalizing the
    practical work of the mining engineer D G Krige
    (RSA).

42
How do we use Kriging?
  • 1. Sample, preferably at different resolutions
  • 2. Calculate the experimental variogram
  • 3. Model the variogram with one or more
    authorized functions
  • 4. Apply the kriging system, with the variogram
    model of spatial dependence, at each point to be
    predicted
  • Predictions are often at each point on a regular
    grid (e.g. a raster map)
  • These points are actually blocks the size of
    the sampling support
  • Can also predict in blocks larger than the
    original support
  • 5. Calculate the error of each prediction this
    is based only on the sample point locations, not
    their data values.

43
Prediction with Ordinary Kriging (OK)
  • In OK, we model the value of variable z at
    location si as the sum of a regional mean m and a
    spatially-correlated random component e(si)
  • Z(si) me(si)
  • The regional mean m is estimated from the sample,
    but not as the simple average, because there is
    spatial dependence. It is implicit in the OK
    system.

44
Prediction with Ordinary Kriging (OK)
  • Predict at points, with unknown mean (which must
    also be estimated) and no trend
  • Each point x is predicted as the weighted average
    of the values at all samples
  • The weights assigned to each sample point sum to
    1
  • Therefore, the prediction is unbiased
  • Ordinary no trend or strata regional mean
    must be estimated from sample

45
Simple and Ordinary Kriging
  • Linear combination of nearest neighbours

Kriging
Inverse Distance Weights
46
Ordinary Kriging
47
Ordinary Kriging

  • Substituting the values we find the weights
  • Kriging estimator
  • Variance

48
Kriging example
  • Matrix elements Cij C0 C1 - g (h)

Modelo Teórico
C12 C21 C04 C0 C1 - g (50 2)
9,84
(220) -
49
Kriging example
C13 C31 (C0 C1) - g V (150)2 (50)2
1,23
C14 C41 C02 (C0 C1) - g V (100)2
(50)2 4,98
50
x2
C23 C32 (C0 C1) - g V (100)2 (100)2
2,33
50
x1
x3
C24 C42 (C0 C1) - g V (100)2 (150)2
0,29
x0
x4
C34 C43 (C0 C1 ) - g V (200)2 (50)2
0
C01 (C0 C1 ) - g (50) 12,66
C03 (C0 C1 ) - g (150) 1,72
C11 C22 C33 C44 (C0 C1 ) - g (0)
22
50
Kriging example
Substituting the values Cij, we find the
following weights
l1 0,518 l2 0,022 l3 0,089 l4
0,371
The estimator is
0,518 z(x1) 0,022 z(x2) 0,089 z(x3) 0,371
z(x4)
50
x2
50
x1
x3
x0
x4
51
Sampling configurations
  • There is no agreement on a universally optimal
    sampling configuration for geostatistical
    research (i.e., variogram modelling, followed by
    spatial prediction), but for spatial prediction,
    regular (lattice, or triangular) sampling is
    optimal (in case of isotropy otherwise stretched
    lattices)
  • for variogram modelling, all distances should be
    present, including sufficient information about
    short distances (which are not present when
    sampling regularly)
  • cross validation on a regular sampling grid will
    not reveal deficiencies in modelled short
    distance behaviour of the variogram interpolated
    maps will be dominated by this short distance
    variogram behaviour.
  • compromise most effort put to regular spread,
    sufficient effort to short distance replicates.
  • related questions adding sampling points to an
    existing design, or reducing (optimizing) an
    existing monitoring network.

52
Questions about kriging
  • what do sill, nugget, range, and anisotropy tell
    about spatial variability of an observed
    variable?
  • what happens if we predict a value at an
    observation location?
  • what does the prediction variance measure?
  • why is the interpolator discontinuous at
    observation locations when the nugget is
    positive?
  • why is the prediction variance pattern
    independent on data, but only dependent on data
    configuration?
  • what are the causes for positive nugget effect?

53
Spatial Indices
H.D.I. human development index
(UN) H.D.I. longevity education income
(0 lt HDI lt 1) 3
54
HDI From Areas to Surfaces
55
HDI Variograms
56
Human Development Index in São Paulo
HDI 1
IDH 0
57
Trend Surfaces for Homicide Rates in São Paulo
Estimate of homicide rates using ordinary kriging
58
Trend Surfaces for Homicide Rates Binomial
Kriging
1996
1999
59
Binomial x Ordinary Kriging - 1996
Krigeagem Ordinária
Krigeagem Binomial
60
Binomial x Ordinary Kriging - 1999
Krigeagem Ordinária
Krigeagem Binomial
61
Practical Example
  • Analise of Apgar values in newborn by buroughs,
    Rio de Janeiro, 1994.
  • Apgar index
  • Vitality of newborn baby in first and fifth
    minute after birth
  • Respiration, heartbeat, response to stimula
  • Sample of 152 georeferenced samples.
  • Thematic classification
  • High 77,4 a 83,3
  • Medium High 74,4 a 77,4
  • Average 69,5 a 74,4
  • Medium Low 63,4 a 69,5
  • Low 44,1 a 63,4

62
Practical Example
Bairros do Municipio do Rio de Janeiro
Bairros Excluídos
63
Exploratory Data Analysis
64
Spatial Correlation Analysis
65
Kriging results
Kriging variance
Spatial Variability of the APGAR index
66
Comparison
77,4 a 83,3
74,4 a 77,4
Areal data grouped By quintiles
69,5 a 74,4
66,4 a 69,5
44,1 a 63,4
Excluded
Kriging results
Write a Comment
User Comments (0)
About PowerShow.com