Title: Spatial Statistics
1Spatial Statistics
- YU-FEN LI
- 6/6/2006 6/13/2006
2Point Pattern Descriptors
- Central tendency
- Mean Center (Spatial Mean)
- Weighted Mean Center
- Median Center (Spatial Median) not used widely
for its ambiguity - Consider n points
3Central tendency Mean Center (Spatial Mean)
- The two means of the coordinates define the
location of the mean center as
4Central tendency Weighted Mean Center
- The two means of the coordinates define the
location of the mean center as - where is the weight at point i
5Point Pattern Descriptors
- Dispersion and Orientation
- Standard distance
- Weighted standard distance
- Standard deviational ellipse
6Dispersion and Orientation Standard Distance
- How points deviate from the mean center
- Recall population standard deviation
- is the mean center,
7Dispersion and Orientation Weighted Standard
Distance
- Points may have different attribute values that
reflect the relative importance - is the weighted mean center,
8Dispersion and Orientation Standard
Deviational Ellipse
- Standard distance is a good single measure of the
dispersion of the incidents around the mean
center, but it does not capture any directional
bias - The standard deviational ellipse gives dispersion
in two dimensions and is defined by 3 parameters - Angle of rotation
- Dispersion along major axis
- Dispersion along minor axis
9Dispersion and Orientation Standard
Deviational Ellipse
- Basic concept is to
- Find the axis going through maximum dispersion
(thus derive angle of rotation) - Calculate standard deviation of the points along
this axis (thus derive the length of major axis)
- Calculate standard deviation of points along the
axis perpendicular to major axis (thus derive the
length of minor axis)
10Statistical Methods in GIS
- Point pattern analyzers
- Location information only
- Line pattern analyzers
- Location Attribute information
- Polygon pattern analyzers
- Location Attribute information
11POINT PATTERN ANALYZERS
- Two primary approaches
- Quadrat Analysis
- based on observing the frequency distribution or
density of points within a set of grids - Nearest Neighbor Analysis
- based on distances of points
12Quadrat Analysis (QA)
- Point Density approach
- The density measured by QA is compared with it of
a random pattern
RANDOM
CLUSTERED
UNIFORM/ DISPERSED
13Quadrat Analysis (QA)
Exhaustive census
Random sampling
14Quadrat Analysis (QA)
- Apply uniform or random grid over area (A) with
size of quadrats given by - where r of points
- width of square quadrat is
- radius of circular quadrat is
15Quadrat Analysis (QA) --Frequency distribution
comparison
- Treat each cell as an observation and count the
number of points within it - Compare observed frequencies in the quadrats with
expected frequencies that would be generated by - a random process (modeled by the Poisson
distribution) - a clustered process (e.g. one cell with r
points, n-1 cells with 0 points) (n number of
quadrats) - a uniform process (e.g. each cell has r/n
points) - The standard Kolmogorov-Smirnov (K-S) test for
comparing two frequency distributions can then be
applied
16Quadrat Analysis (QA) -- Kolmogorov-Smirnov (K-S)
Test
- The test statistic D is simply given by
- where Oi and Ei are the observed and expected
cumulative proportions of the ith category in the
two distributions. - i.e. the largest difference (irrespective of
sign) between observed cumulative frequency and
expected cumulative frequency
17Quadrat Analysis (QA) -- Kolmogorov-Smirnov (K-S)
Test
- The critical value at the 5 level is given by
-
- where n is the number of quadrats
-
- in a two-sample case -- where n1 and n2 are the
numbers of quadrats in the two sets of
distributions
18Quadrat Analysis Variance-Mean Ratio (VMR)
- Test if the observed pattern is different from a
random pattern (generated from a Poisson
distribution which mean variance) - Treat each cell as an observation and count the
number of points within it, to create the
variable X - Calculate variance and mean of X, and create the
variance to mean ratio variance / mean
19Quadrat Analysis Variance-Mean Ratio (VMR)
- For an uniform distribution, the variance is
zero. - we expect a variance-mean ratio close to 0
- For a random distribution, the variance and mean
are the same. - we expect a variance-mean ratio around 1
- For a clustered distribution, the variance is
relatively large - we expect a variance-mean ratio above 1
20Significance Test for VMR
- the mean of the observed distribution
- , where xi is the number
of points in a quadrat, ni is the number of
quadrats with xi points, and n is the total
number of quadrats -
21Weakness of Quadrat Analysis
- Results may depend on quadrat size and
orientation - Is a measure of dispersion, and not really
pattern, because it is based primarily on the
density of points, and not their arrangement in
relation to one another - Results in a single measure for the entire
distribution, so variations within the region are
not recognized (could have clustering locally in
some areas, but not overall)
22Weakness of Quadrat Analysis
- For example, quadrat analysis cannot distinguish
between these two, obviously different, patterns
23Nearest-Neighbor Index (NNI)
- Uses distances between points as its basis.
- Compares the observed average distance between
each point and its nearest neighbors with the
expected average distance that would occur if the
distribution were random - NNI r obs / r exp
- For random pattern, NNI 1
- For clustered pattern, NNI lt 1
- For dispersed pattern, NNI gt 1
24Nearest-Neighbor Index (NNI) Significance test
25(No Transcript)
26Nearest-Neighbor Index (NNI)
- Advantages
- NNI takes into account distance
- No quadrat size problem to be concerned with
- However, NNI not as good as might appear --
- Index highly dependent on the boundary for the
area - its size and its shape (perimeter)
- Fundamentally based on only the mean distance
- Doesnt incorporate local variations (could have
clustering locally in some areas, but not
overall) - Based on point location only and doesnt
incorporate magnitude of phenomena at that point
27Nearest-Neighbor Index (NNI)
- An adjustment for edge effects available but
does not solve all the problems
28Nearest-Neighbor Index (NNI)
- Some alternatives to the NNI are
- the G and F functions, based on the entire
frequency distribution of nearest neighbor
distances, and - the K function based on all interpoint distances.
29Spatial Autocorrelation
- Most statistical analyses are based on the
assumption that the values of observations in
each sample are independent of one another - Positive spatial autocorrelation violates this,
because samples taken from nearby areas are
related to each other and are not independent
30Spatial Autocorrelation
- In ordinary least squares regression (OLS), for
example, the correlation coefficients will be
biased and their precision exaggerated - Bias implies correlation coefficients may be
higher than they really are - They are biased because the areas with higher
concentrations of events will have a greater
impact on the model estimate - Exaggerated precision (lower standard error)
implies they are more likely to be found
statistically significant - they will overestimate precision because, since
events tend to be concentrated, there are
actually a fewer number of independent
observations than is being assumed.
31Spatial Autocorrelation
- Several measures available
- Join Count Statistic
- Morans I
- Gearys Ratio C
- General (Getis-Ord) G
- Anselins Local Index of Spatial Autocorrelation
(LISA)
Discuss them later
32LINE PATTERN ANALYZERS
- Two general types of linear features
- Vectors (lines with arrows)
- Networks
- Spatial attributes of linear features
- Length
- Orientation and Direction
- Spatial attribute of network features
- Connectivity or Topology
33Spatial Attributes of Linear Features -- Length
(x1,y1 )
c
a
(x1,y2 )
(x2,y2 )
b
34Spatial Attributes of Linear Features -- Length
- Great circle distance D of locations A and B
- where
- a and b are the latitude readings of locations A
and B - ?? is the absolute difference in longitude
between A and B
35Spatial Attributes of Linear Features
Orientation and Direction
- Orientation
- Directional
- e.g. West-East orientation
- Non-directional (from to )
- e.g. To describe a fault line --
- from location y to location x
- from location x to
location y - Direction
- Dependent on the beginning and ending locations
- from location y to location x
- ? from location x to
location y
36Directional Statistics Directional Mean
Directional Mean Average direction of a set of
vectors
37Directional Statistics Directional Mean
Y
?
X
38Directional Statistics Circular Variance
- Shows the angular variability of the set of
vectors
Y
X
39Directional Statistics Circular Variance
- For a set of n vectors,
-
- , all vectors have the same direction
or no circular variability - , all vectors are in opposite
directions
40Network Analysis
- Connectivity how different links are connected
- Vertices junctions or nodes
- Links/edges the lines joining the vertices
41Connectivity Matrix (C)
- Cij 1 if direct connect between i and j
- Cij 0, otherwise
42Connectivity Matrix (C)
- C1 direct
- C2 number of 2 step paths from i to j
- Example from i to k to j is a 2 step path with
one intermediate vertex k - C3 number of 3 step paths from i to j
- Example from i to k to m to j is a 3 step path
with two intermediate vertices
43Network as a matrix
C2 C1 C1 C3 C2 C1 C4 C3 C1 C5 C4
C1 .
44Minimally connected network
- Each vertex is connected to the network, and
there are no superfluous linkages - The minimum number of edges needed to create a
network is V-1, one less than the number of
vertices in the network i.e, eminV-15
45Maximally connected network
- Nonplanar
- the maximum number of edges is
emax V(V-1)
emax V(V-1)/2
46Maximally connected network
- Planar --
- the maximum number of edges is emax 3(V-2)
47Gamma Index
- Gamma index provides useful basic ratio for
evaluating the relative connectivity of an entire
network - Ratio between the number of edges actually in a
given network and the maximum number possible in
that network - ? actual edges/maximum edges
- minimally connected network is
- ? (V-1) / 3(V-2)
48Alpha Index
- compares the number of actual (fundamental)
"circuits" with the maximum number of all
possible fundamental circuits - ? (E - V 1) / (2V - 5), where 2V - 5 the
maximum number of fundamental circuits
49Diameter
- the number of linkages or steps needed to connect
the two most remote nodes in the network - the better connected the network, the lower the
diameter
50POLYGON PATTERN ANALYZERS
- We will discuss the use of spatial statistics to
describe and measure spatial patterns formed by
geographic objects that are associated with areas
or polygons.
51Spatial Autocorrelation (SA) Spatial Weights
Matrices
- SA measures the degree of sameness of attribute
values among areal units (or polygons) within
their neighborhood - Different ways of specifying spatial relationships
52Neighborhood Definitions Adjacency Criterion
- Immediate (first-order) neighbors of X
- Rooks case
53Neighborhood Definitions Binary Connectivity
Matrix
- C connectivity matrix with elements cij ,
- cij 1 if the ith polygon is adjacent to the jth
polygon - cij 0 if the ith polygon is NOT adjacent to the
jth polygon - Symmetrical cij cji
- Not efficiency
54Neighborhood Definitions Stochastic Matrix
- Row-standardized matrix (stochastic matrix)
- Assume each neighbor exerts the same amount of
influence - W spatial weights matrix with elements wij ,
55Neighborhood Definitions Distance between
polygon centroids
- For example,
- Within a radius of 1 mile
- Adjacency measure is just a binary representation
of the distance measure - 1 zero distance between two neighboring units
56Spatial Weights Matrices Centroid Distances
- dij represents the distance between areal units i
and j - Weight
- Inversely proportional to the distance
- Weight
- Distance-decay spatial relationships diminish
more than just proportionally to the distance
57Spatial Autocorrelation (SA)
- Univariate handle one variable and evaluate how
that variable is correlated over space - Several measures available
- Global measures SA stable across the study
region - Join Count Statistic measure the magnitude of
SA among polygons with binary nominal data - Morans I Index
- Gearys Ratio C
- G statistic
For interval or ratio data
58Spatial Autocorrelation (SA)
- Several measures available
- Local measures may not stable over the study
region - Local version of the G statistic
- Local Index of Spatial Autocorrelation (LISA)
local version of Morans I and Gearys Ratio C
59Spatial Autocorrelation (SA)Joint Count
Statistics
- Binary attribute data
- WW
- BW
- BB
- Compare the observed numbers of joints of various
types (BB,WW, BW) with those expected from a
random pattern
60Global spatial autocorrelation statistic --
Morans I
-
-
- xi is the value of interval or ratio variable in
areal unit i, - W is the sum of all elements of the spatial
weights matrix (i.e. W??wij), and - n is the number of areal units
61Global spatial autocorrelation statistic --
Morans I
- I ranges from 1 to 1
- If no spatial autocorrelation exists,
- lt 0
- inversely related to n
- Z-test
62Global spatial autocorrelation statistic
Gearys Ratio
-
-
- xi is the value of interval or ratio variable in
areal unit i, - W is the sum of all elements of the spatial
weights matrix (i.e. W??wij), and - n is the number of areal units
63Global spatial autocorrelation statistic --
Gearys Ratio
- C ranges from 0 to 2
- C0 indicates a perfect positive spatial
autocorrelation when all neighboring values are
the same - C2 indicates an extremely negative spatial
autocorrelation - E(C)1, not affected by n
- Z-test
64Global spatial autocorrelation statistic
General G Statistic
- Morans I Gearys C cannot tell HH vs LL as
they are concerned with only whether neighboring
values are similar or not - The general G-statistic
- where wij(d)1 if areal unit j is within d from
areal unit i o.w. wij(d)0. - Z-test
65Local spatial autocorrelation statistic LISA
- Local Index of Spatial Autocorrelation (LISA)
local version of Morans I and Gearys Ratio C - Local Moran statistic for areal unit i
- High clustering of similar values (all high or
all low) - Low clustering of dissimilar values
66Local spatial autocorrelation statistic LISA
- Local Gearys Ratio C for areal unit i
- Low clustering of similar values (all high or
all low) - High clustering of dissimilar values
67Local spatial autocorrelation statistic local
G-statistic
- Local G-statistic for areal unit i
- Standard Scores
68Local spatial autocorrelation statistic local
G-statistic
- Interpretation of standard scores for