Title: Point Pattern Analysis
1Point Pattern Analysis
Point Patterns fall between the two extremes,
highly clustered and highly dispersed . Most
tests of point patterns compare the observed
patterns to CSR. The two measurements that are
used to describe pattern are Density of points
across the analysis area Distance between points
within the analysis area
2Distance Methods
- Distance methods are becoming more common
- Does not require rasterization
- Easy to do with GIS
3Issues with Length Measurement
- Measurements in GIS are often made on horizontal
projections of objects - length and area may be substantially lower than
on a true three-dimensional surface
4Be careful
- 0.251 Hypotenuse 1.03
- 0.51 Hypotenuse 1.11
- 11 Hypotenuse 1.41
- 21 Hypotenuse 2.24
- 31 Hypotenuse 3.16
- No an issue if the gradient is uniform.
5Manhattan Distance
- Distance is computed between to points (cells) by
moving either N-S or E-W.
Cell 2 10, 20 (row, column)
Cell 1 15, 15
6Distance Methods
- Nearest-Neighbor Distance (NND)
- Basic Statistics from Sample (Mean, SD)
- Compare to Expect Population Mean, SD
- Z statistic, R statistic
- Assumes a normal distribution to compute
expected values - Global estimate of pattern
7Nearest Neighbor Distance
R lt 1
R gt 1
8Nearest Neighbor Analysis Nearest neighbor
analysis examines the distances between each
point and the closest point to it, and then
compares these to expected values for a random
sample of points from a CSR (complete spatial
randomness) pattern. CSR is generated by means of
two assumptions 1) that all places are equally
likely to be the recipient of a case (event) and
2) all cases are located independently of one
another. The mean nearest neighbor distance
where N is the number of points. di is the
nearest neighbor distance for point i.
9The expected value of the nearest neighbor
distance in a random pattern where A is the
area and B is the length of the perimeter of the
study area. The variance
10And the Z statistic
This approach assumes Equations for the expected
mean and variance cannot be used for irregularly
shaped study areas. The study area is a regular
rectangle or square. Area (A) is calculated by
(Xmax Xmin) (Ymax Ymin), where these
represent the study area boundaries. R
statistic Observed Mean d / Expect d R 1
random, R ? 0 cluster, R ? 2 uniform
112 x 0.5 A 1, B 5 E (di) 0.05277 Var (d)
8.85 x 10-6 1 x 1 A 1, B 4 E(di)
0.05222 Var(d) 8.48 x 10-6 2 x 2 E(di)
0.10444
12Wilderness Campsites
Real world study areas are complex and violate
the assumptions of most equations for expected
values.
13- Solution
- Simulate randomization using Monte Carlo
Methods. - Compare simulated distribution to observed.
- If possible use the true area and perimeter
to compute the expected value. - Software that does not ask for area/perimeter
or a shapefile of the study area will assume a
rectangle.
14Autotheft Within City
15Autotheft - Downtown
16Autotheft - Neighborhood
17Nearest Neighbor - ArcMap
Method Area Observed NND Expected NND Z Score P-Valve
Euclidean 1668437432 278 729 -33.1 0.000
Euclidean 943000863 278 548 -26.3 0.000
Manhattan 1668437432 399 729 -28.6 0.000
Manhattan 57850697 227 235 -1.1 0.284
Manhattan 10743164 251 223 1.8 0.071
18Distance Methods
- G Function (Revised NND)
- Same measurements as NND
- Analyzed using a CDF Compare to Expected
- Expected CDF can be Theoretical or
Generated (E(G(d)) - d statistic (max distance between Observed
and Expected CDF) - Can test d statistic with the Kolmogorov-
- Smirnov Test
19G Function
1/12 0.083
From OSullivan and Unwin Geographic Information
Analysis
20Distance Methods
- F Function
- Similar to G but measures distance for a
set of random points - Also uses CDF and same Expected Distribution
Function as G - Harder to Interpret!!!
- I have never used it. I also do not like it!
-
- Both G and F Functions have edge and area
problems. Better to use a generated expected
distribution
21G and F Functions
Clustered
Evenly Spaced
From OSullivan and Unwin Geographic Information
Analysis
22Distance Methods
- K Function (Riley, 1976)
- Statistic is based on the sum of all the
points within a distance d of each observation -
- where n of points
- ? Density (n/area)
- C(si, d) a circle with radius d centered at
point si -
23Ripley K counts the number of points found with r
distance from each point. The maximum r distance
should be about ½ the short dimension of the
input points. The K increases quicker then
expected the points are clustered. If K
increases slower then expected the points are
dispersed.
24Distance Methods
- Expect K(d)
- E(K(d)) ? p d2 / ? p d2
- L(d) (K(d)/ p)1/2
- E(L(d)) d
25K Function
Clustered
L(d)
Evenly Spaced
L(d)
From OSullivan and Unwin Geographic Information
Analysis
26(No Transcript)
27There are a total of 32 points in this analysis.
New Mexico is approximately 500km per side, so
we will set our maximum study distance at 250km.
We choose 25 increments so that we will calculate
the observed L(d) and confidence interval for
every 10km. 99 permutations are used for
creating the confidence envelope in order to test
the null hypothesis at approximately the a0.01
level.
28Figure 2 Graph of K-Function Results
29A graph of the K-function results is shown below.
The observed L(d) is 0 for 10km and 20km because
the closest pair of points is approximately 29km
apart. At a distance of 30km, the observed L(d)
falls within the generated confidence interval.
However, for distances between 40km and 90km the
observed L(d) lies outside of the confidence
interval. This indicates that we can reject the
null hypothesis of CSR. Also, since the observed
L(d) is less than the Minimum L(d), this implies
that we have a statistically significant
dispersed or regular distribution of points.