Point Pattern Analysis - PowerPoint PPT Presentation

About This Presentation

Title:

Point Pattern Analysis

Description:

Title: Spatial Statistics Author: briggs Last modified by: briggs Created Date: 4/11/2003 8:31:59 PM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:200

Avg rating:3.0/5.0

Slides: 35

Provided by: brig75

Learn more at: https://personal.utdallas.edu

Category:

more less

Transcript and Presenter's Notes

Title: Point Pattern Analysis

1
Point Pattern Analysis

using
Spatial Inferential Statistics

2
Last time

Concept of statistical inference
Drawing conclusions about populations from
samples
Null Hypothesis of no difference
Alternative hypotheses (which we really want to
accept)
Random point pattern
Is our observed point pattern significantly
different from random

3
How Point Pattern Analysis (PPA) is different

From Centrographic Statistics (previously)
Centrographic Statistics calculates single,
summary measures
PPA analyses the complete set of points
From Spatial Autocorrelation (discussed later)
with PPA, the points have location only there is
no magnitude value
With Spatial Autocorrelation points have
different magnitudes there is an attribute
variable.

4
Approaches to Point Pattern Analysis

Two primary approaches
Point Density using Quadrat Analysis
Based on polygons
Analyze points using polygons!
Uses the frequency distribution or density of
points within a set of grid squares.
Point Association using Nearest Neighbor Analysis
Based on points
Uses distances between the points
Although the above would suggest that the first
approach examines first order effects and the
second approach examines second order effects, in
practice the two cannot be separated.

5
Quadrat AnalysisThe problem of selecting
quadrat size
Too small many quadrats with zero points
Too big many quadrats have similar number of
points
O.K.
Length of Quadrat edge Astudy area N
number of points
Modifiable Areal Unit Problem
6
Uniform grid --used for secondary data
Types of Quadrats
Random sampling --useful in field work
Frequency counts by Quadrat would be
Multiple ways to create quadrats --and results
can differ accordingly!
Quadrats dont have to be square --and their size
has a big influence
7
Quadrat Analysis Variance/Mean Ratio (VMR)

Apply uniform or random grid over area (A) with
width of square given by
Treat each cell as an observation and count the
number of points within it, to create the
variable X
Calculate variance and mean of X, and create the
variance to mean ratio variance / mean
For an uniform distribution, the variance is
zero.
Therefore, we expect a variance-mean ratio close
to 0
For a random distribution, the variance and mean
are the same.
Therefore, we expect a variance-mean ratio around
1
For a clustered distribution, the variance is
relatively large
Therefore, we expect a variance-mean ratio above
1

Where A area of region n of points
See following slide for example. See OU p
98-100 for another example
8
RANDOM
Note N number of Quadrats 10 Ratio
Variance/mean
9
Significance Test for VMR

A significance test can be conducted based upon
the chi-square frequency distribution
The test statistic is given by (sum of squared
differences)/Mean
The test will ascertain if a pattern is
significantly more clustered than would be
expected by chance (but does not test for a
uniformity)
The values of the test statistics in our cases
would be
For degrees of freedom N - 1 10 - 1 9,
the value of chi-square at the 1 level is
21.666.
Thus, there is only a 1 chance of obtaining a
value of 21.666 or greater if the points had been
allocated randomly. Since our test statistic for
the clustered pattern is 80, we conclude that
there is (considerably) less than a 1 chance
that the clustered pattern could have resulted
from a random process

random 60-(202)/10 10 2
uniform 40-(202)/10 0 2
clustered 200-(202)/10 80 2
10
Quadrat Analysis Frequency Distribution
Comparison

Rather than base conclusion on variance/mean
ratio, we can compare observed frequencies in the
quadrats (Q number of quadrats) with expected
frequencies that would be generated by
a random process (modeled by the Poisson
frequency distribution)
a clustered process (e.g. one cell with P
points, Q-1 cells with 0 points)
a uniform process (e.g. each cell has P/Q
points)
The standard Kolmogorov-Smirnov test for
comparing two frequency distributions can then be
applied see next slide
See Lee and Wong pp. 62-68 for another example
and further discussion.

11
Kolmogorov-Smirnov (K-S) Test

The test statistic D is simply given by
D max Cum Obser. Freq Cum Expect. Freq
The largest difference (irrespective of sign)
between observed cumulative frequency and
expected cumulative frequency
The critical value at the 5 level is given by
D (at 5) 1.36 where Q is the number
of quadrats
Q
Expected frequencies for a random spatial
distribution are derived from the Poisson
frequency distribution and can be calculated
with
p(0) e-? 1 / (2.71828P/Q) and
p(x) p(x - 1) ? /x
Where x number of points in a quadrat and
p(x) the probability of x points
P total number of points Q number of
quadrats
? P/Q (the average number of points per
quadrat)

See next slide for worked example for cluster case
12
Row 10
The spreadsheet spatstat.xls contains worked
examples for the Uniform/ Clustered/ Random data
previously used, as well as for Lee and Wongs
data
13
Weakness of Quadrat Analysis

Results may depend on quadrat size and
orientation (Modifiable areal unit problem)
test different sizes (or orientations) to
determine the effects of each test on the results
Is a measure of dispersion, and not really
pattern, because it is based primarily on the
density of points, and not their arrangement in
relation to one another
Results in a single measure for the entire
distribution, so variations within the region are
not recognized (could have clustering locally in
some areas, but not overall)

For example, quadrat analysis cannot distinguish
between these two, obviously different, patterns
For example, overall pattern here is dispersed,
but there are some local clusters
14
Nearest-Neighbor Index (NNI) (OU p. 100)

Uses distances between points
It compares
the mean of the distance observed between each
point and its nearest neighbor
with the expected mean distance if the
distribution was random
Observed Average Distance
Expected Average Distance
For random pattern, NNI 1
For clustered pattern, NNI 0
For dispersed pattern, NNI 2.149

NNI
See next slide for formulae for calculation
15
Calculating Nearest Neighbor Index
Where
The average distance to nearest neighbor
Area of region result very dependent on this
value
16
Significance Test for NNI

The test statistic is calculated as follows
Z Av. Distance Observed - Av. Distance
Expected.
Standard Error
It has a Normal Frequency Distribution.
It tests if the observed pattern is significantly
different from random.
if Z is below 1.96 or above 1.96, we are 95
confident that the distribution is not randomly
distributed.
or can say If the observed pattern was random,
there are less than 5 chances in 100 we would
have observed a z value this large.
Note in the example on the next slide, the fact
that the NNI for uniform is 1.96 is coincidence!

17
Calculating Test Statistic for Nearest Neighbor
Index
Where
18
RANDOM
UNIFORM
CLUSTERED
Z 5.508
Z -0.1515
Z 5.855
Source Lembro
19
Running in ArcGIS Telecom and Software Companies
Result is very dependent on area of the region.
There is an option to insert your own
value. Default value is the minimum enclosed
rectangle that encompasses all features.
20
results
Scroll up the window to see all the
results. Note Progress box continues to run
until graphic is closed. Always close graphic
window first.
Produced if Display output graphically box is
?
21
Evaluating the Nearest Neighbor Index

Advantages
Unlike quadrats, the NNI considers distances
between points
No quadrat size problem
However, NNI has problems
Very dependent on the value of A, the area of the
study region. What boundary do we use for the
study area?
Minimum enclosing rectangle? (highly affected by
a few outliers)
Convex hull
Convex hull with buffer. What size buffer?
There is an adjustment for edge effects but
problems remain
Based on only the mean distance to the nearest
neighbor
Doesnt incorporate local variations, or
clustering scale
could have clustering locally in
some areas, but not overall
Based on point location only and does not
incorporate magnitude of phenomena (quantity) at
that point

22
Ripleys K(d) Function

Ripleys K is calculated multiple times, each for
a different distance band,
So it is represented as K(d) K is a function of
distance, d
The distance bands are placed around every point
K (d) is the average density of points at each
distance (d), divided by the average density of
points in the entire area (n/a)
If the density is high for a particular band,
then clustering is occurring at that distance

OU p. 135-137
Where S is a point, and C(si, d) is a circle of
radius d, centered at si
Ripley B.D. 1976. The second order analysis of
stationary point processes. Journal of Applied
Probability 13 255-266
23
Not this simple with real data!!!
The low end (0.2) corresponds to distances
within the cluster
The high end (0.6) corresponds to distance
between the clusters
within
clustered
between
The distance bands are placed around every
point. Note the big problem of edge effects from
circles outside the study area.
dispersed
Source OSullivan Unwin, p.
Begins flat
24
Running in ArcGIS Telecom and Software Companies
use 9 for tests--99 takes a long time!
Weight fieldnumber of points at that location
Distance bands
Result is very dependent on area of the region.
Can insert your own value.
Againstudy area has big effect so there are
several options for this
25
Interpreting the Results
Not this simple with real data!!!
Observed
Expected
26
Distance bands start 5,000 feet size
10,000 feet Expected assumes random
pattern Confidence band9 iterations (takes long
time for 99!)
Results for 10,000 feet Bands
27
Distance bands start 10,000 feet size
20,000 feet Also experiment with different
region (study area) boundaries.
Results for 20,000 feet Bands
28
Plotting the Difference Between Observed and
Expected K, versus Distance
Y field Diffk ObservedK - ExpectedK
X field ExpectedK or HiConfEnv
Distance between clusters 70,000 feet 13
miles 20 km
29
Problems with Ripley K(d)

Dependent on study area boundary (edge effect)
Circles go outside study area
Special adjustments are available (see OU p.
148)
Try different options for boundary in ArcGIS
Affected by circle radii selected
Try different values
Each point has unit valueno magnitude or
quantity
Weight field assumes X points at that location
e.g. X 3, then 3 points at that location

30
What have we learned?

How to measure and test if spatial patterns are
clustered or dispersed.

31
Why is this important?
?
We can measure and test --not just look and
guess!
Is it clustered?
That is science.
32
Not just GIS!

I taught these tools to senior undergraduate
geography students.
They are also used in Earth Management.
A former Henan University student and faculty
member (now at UT-Dallas) is using Ripleys K
function for research on urban forests.

33
Next Time

No classes next week
Next class will be Wednesday November 17
Topic
Spatial Autocorrelation
Unlike PPA, in Spatial Autocorrelation points
have different magnitudes there is an attribute
variable.

34
(No Transcript)

Write a Comment

User Comments (0)