Title: Point Pattern Analysis
1Point Pattern Analysis
- using
- Spatial Inferential Statistics
2Last time
- Concept of statistical inference
- Drawing conclusions about populations from
samples - Null Hypothesis of no difference
- Alternative hypotheses (which we really want to
accept) - Random point pattern
- Is our observed point pattern significantly
different from random
3How Point Pattern Analysis (PPA) is different
- From Centrographic Statistics (previously)
- Centrographic Statistics calculates single,
- summary measures
- PPA analyses the complete set of points
- From Spatial Autocorrelation (discussed later)
- with PPA, the points have location only there is
no magnitude value - With Spatial Autocorrelation points have
different magnitudes there is an attribute
variable. -
4Approaches to Point Pattern Analysis
- Two primary approaches
- Point Density using Quadrat Analysis
- Based on polygons
- Analyze points using polygons!
- Uses the frequency distribution or density of
points within a set of grid squares. - Point Association using Nearest Neighbor Analysis
- Based on points
- Uses distances between the points
- Although the above would suggest that the first
approach examines first order effects and the
second approach examines second order effects, in
practice the two cannot be separated.
5Quadrat AnalysisThe problem of selecting
quadrat size
Too small many quadrats with zero points
Too big many quadrats have similar number of
points
O.K.
Length of Quadrat edge Astudy area N
number of points
Modifiable Areal Unit Problem
6Uniform grid --used for secondary data
Types of Quadrats
Random sampling --useful in field work
Frequency counts by Quadrat would be
Multiple ways to create quadrats --and results
can differ accordingly!
Quadrats dont have to be square --and their size
has a big influence
7Quadrat Analysis Variance/Mean Ratio (VMR)
- Apply uniform or random grid over area (A) with
width of square given by - Treat each cell as an observation and count the
number of points within it, to create the
variable X - Calculate variance and mean of X, and create the
variance to mean ratio variance / mean - For an uniform distribution, the variance is
zero. - Therefore, we expect a variance-mean ratio close
to 0 - For a random distribution, the variance and mean
are the same. - Therefore, we expect a variance-mean ratio around
1 - For a clustered distribution, the variance is
relatively large - Therefore, we expect a variance-mean ratio above
1
Where A area of region n of points
See following slide for example. See OU p
98-100 for another example
8RANDOM
Note N number of Quadrats 10 Ratio
Variance/mean
9Significance Test for VMR
- A significance test can be conducted based upon
the chi-square frequency distribution - The test statistic is given by (sum of squared
differences)/Mean - The test will ascertain if a pattern is
significantly more clustered than would be
expected by chance (but does not test for a
uniformity) - The values of the test statistics in our cases
would be - For degrees of freedom N - 1 10 - 1 9,
the value of chi-square at the 1 level is
21.666. - Thus, there is only a 1 chance of obtaining a
value of 21.666 or greater if the points had been
allocated randomly. Since our test statistic for
the clustered pattern is 80, we conclude that
there is (considerably) less than a 1 chance
that the clustered pattern could have resulted
from a random process
random 60-(202)/10 10 2
uniform 40-(202)/10 0 2
clustered 200-(202)/10 80 2
10Quadrat Analysis Frequency Distribution
Comparison
- Rather than base conclusion on variance/mean
ratio, we can compare observed frequencies in the
quadrats (Q number of quadrats) with expected
frequencies that would be generated by - a random process (modeled by the Poisson
frequency distribution) - a clustered process (e.g. one cell with P
points, Q-1 cells with 0 points) - a uniform process (e.g. each cell has P/Q
points) - The standard Kolmogorov-Smirnov test for
comparing two frequency distributions can then be
applied see next slide - See Lee and Wong pp. 62-68 for another example
and further discussion.
11Kolmogorov-Smirnov (K-S) Test
- The test statistic D is simply given by
- D max Cum Obser. Freq Cum Expect. Freq
- The largest difference (irrespective of sign)
between observed cumulative frequency and
expected cumulative frequency - The critical value at the 5 level is given by
- D (at 5) 1.36 where Q is the number
of quadrats - Q
- Expected frequencies for a random spatial
distribution are derived from the Poisson
frequency distribution and can be calculated
with - p(0) e-? 1 / (2.71828P/Q) and
p(x) p(x - 1) ? /x - Where x number of points in a quadrat and
p(x) the probability of x points - P total number of points Q number of
quadrats - ? P/Q (the average number of points per
quadrat)
See next slide for worked example for cluster case
12Row 10
The spreadsheet spatstat.xls contains worked
examples for the Uniform/ Clustered/ Random data
previously used, as well as for Lee and Wongs
data
13Weakness of Quadrat Analysis
- Results may depend on quadrat size and
- orientation (Modifiable areal unit problem)
- test different sizes (or orientations) to
determine the effects of each test on the results - Is a measure of dispersion, and not really
pattern, because it is based primarily on the
density of points, and not their arrangement in
relation to one another - Results in a single measure for the entire
distribution, so variations within the region are
not recognized (could have clustering locally in
some areas, but not overall)
For example, quadrat analysis cannot distinguish
between these two, obviously different, patterns
For example, overall pattern here is dispersed,
but there are some local clusters
14Nearest-Neighbor Index (NNI) (OU p. 100)
- Uses distances between points
- It compares
- the mean of the distance observed between each
point and its nearest neighbor - with the expected mean distance if the
distribution was random - Observed Average Distance
- Expected Average Distance
- For random pattern, NNI 1
- For clustered pattern, NNI 0
- For dispersed pattern, NNI 2.149
-
NNI
See next slide for formulae for calculation
15Calculating Nearest Neighbor Index
Where
The average distance to nearest neighbor
Area of region result very dependent on this
value
16Significance Test for NNI
- The test statistic is calculated as follows
- Z Av. Distance Observed - Av. Distance
Expected. - Standard Error
- It has a Normal Frequency Distribution.
- It tests if the observed pattern is significantly
different from random. - if Z is below 1.96 or above 1.96, we are 95
confident that the distribution is not randomly
distributed. - or can say If the observed pattern was random,
there are less than 5 chances in 100 we would
have observed a z value this large. - Note in the example on the next slide, the fact
that the NNI for uniform is 1.96 is coincidence!
17Calculating Test Statistic for Nearest Neighbor
Index
Where
18RANDOM
UNIFORM
CLUSTERED
Z 5.508
Z -0.1515
Z 5.855
Source Lembro
19Running in ArcGIS Telecom and Software Companies
Result is very dependent on area of the region.
There is an option to insert your own
value. Default value is the minimum enclosed
rectangle that encompasses all features.
20results
Scroll up the window to see all the
results. Note Progress box continues to run
until graphic is closed. Always close graphic
window first.
Produced if Display output graphically box is
?
21Evaluating the Nearest Neighbor Index
- Advantages
- Unlike quadrats, the NNI considers distances
between points - No quadrat size problem
- However, NNI has problems
- Very dependent on the value of A, the area of the
study region. What boundary do we use for the
study area? - Minimum enclosing rectangle? (highly affected by
a few outliers) - Convex hull
- Convex hull with buffer. What size buffer?
- There is an adjustment for edge effects but
problems remain - Based on only the mean distance to the nearest
neighbor - Doesnt incorporate local variations, or
clustering scale - could have clustering locally in
- some areas, but not overall
- Based on point location only and does not
incorporate magnitude of phenomena (quantity) at
that point
22Ripleys K(d) Function
- Ripleys K is calculated multiple times, each for
a different distance band, - So it is represented as K(d) K is a function of
distance, d - The distance bands are placed around every point
- K (d) is the average density of points at each
distance (d), divided by the average density of
points in the entire area (n/a) - If the density is high for a particular band,
then clustering is occurring at that distance -
OU p. 135-137
Where S is a point, and C(si, d) is a circle of
radius d, centered at si
Ripley B.D. 1976. The second order analysis of
stationary point processes. Journal of Applied
Probability 13 255-266
23Not this simple with real data!!!
The low end (0.2) corresponds to distances
within the cluster
The high end (0.6) corresponds to distance
between the clusters
within
clustered
between
The distance bands are placed around every
point. Note the big problem of edge effects from
circles outside the study area.
dispersed
Source OSullivan Unwin, p.
Begins flat
24Running in ArcGIS Telecom and Software Companies
use 9 for tests--99 takes a long time!
Weight fieldnumber of points at that location
Distance bands
Result is very dependent on area of the region.
Can insert your own value.
Againstudy area has big effect so there are
several options for this
25Interpreting the Results
Not this simple with real data!!!
Observed
Expected
26Distance bands start 5,000 feet size
10,000 feet Expected assumes random
pattern Confidence band9 iterations (takes long
time for 99!)
Results for 10,000 feet Bands
27Distance bands start 10,000 feet size
20,000 feet Also experiment with different
region (study area) boundaries.
Results for 20,000 feet Bands
28Plotting the Difference Between Observed and
Expected K, versus Distance
Y field Diffk ObservedK - ExpectedK
X field ExpectedK or HiConfEnv
Distance between clusters 70,000 feet 13
miles 20 km
29Problems with Ripley K(d)
- Dependent on study area boundary (edge effect)
- Circles go outside study area
- Special adjustments are available (see OU p.
148) - Try different options for boundary in ArcGIS
- Affected by circle radii selected
- Try different values
- Each point has unit valueno magnitude or
quantity - Weight field assumes X points at that location
- e.g. X 3, then 3 points at that location
30What have we learned?
- How to measure and test if spatial patterns are
clustered or dispersed.
31Why is this important?
?
We can measure and test --not just look and
guess!
Is it clustered?
That is science.
32Not just GIS!
- I taught these tools to senior undergraduate
geography students. - They are also used in Earth Management.
- A former Henan University student and faculty
member (now at UT-Dallas) is using Ripleys K
function for research on urban forests.
33Next Time
- No classes next week
- Next class will be Wednesday November 17
- Topic
- Spatial Autocorrelation
- Unlike PPA, in Spatial Autocorrelation points
have different magnitudes there is an attribute
variable. -
34(No Transcript)