Spatial Correspondence of Areal Distributions

About This Presentation

Title:

Spatial Correspondence of Areal Distributions

Description:

Table 2. Cities Falling Inside a County Won by Either Bush or Gore ... ZGore/Gore 15.47; ZBush/Bush 8.75. Overlay Analysis ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 31

Provided by: lem76

Category:

more less

Transcript and Presenter's Notes

Title: Spatial Correspondence of Areal Distributions

1
Spatial Correspondence of Areal Distributions

Quadrat and nearest-neighbor analysis deal with a
single distribution of points
Often, we want to measure the distribution of two
or more variables
The coefficient of Areal correspondence and
chi-square statistics perform these tasks

2
Coefficient of Areal Correspondence

Simple measure of the extent to which two
distributions correspond to one another
Compare wheat farming to areas of minimal
rainfall
Based on the approach of overlay analysis

3
Overlay Analysis

Two distributions of interest are mapped at the
same scale and the outline of one is overlaid
with the other

4
Coefficient of Areal Correspondence

CAC is the ratio between the area of the region
where the two distributions overlap and the total
area of the regions covered by the individual
distributions of the entire region

5
(No Transcript)
6
Result of CAC

Where there is no correspondence, CAC is equal to
0
Where there is total correspondence, CAC is equal
to 1
CAC provides a simple measure of the extent of
spatial association between two distributions,
but it cannot provide any information about the
statistical significance of the relationship

7
Resemblance Matrix

Proposed by Court (1970)
Advantages over CAC
Limits are 1 to 1 with a perfect negative
correspondence given a value of 1
Sampling distribution is roughly normal, so you
can test for statistical significance

8
Chi-Square Statistic

Measures the strength of association between two
distributions
Class Example
Relationship between wheat yield and
precipitation
Two maps showing high and low yields and high and
low precipitation

9
HIGH PRECIP
HIGH YIELD
10
High Precip.
High Yield
11
Chi-Square

By combining distribution on one map we can
better understand the relationship between the
two distributions
In this example we are using a grid
The finer the grid, the more precise the
measurement
Four possibilities exist
Low rainfall, low yield
Low rainfall, high yield
High rainfall, low yield
High rainfall, high yield

12
Chi-Square

Record the total number of occurrences into a
table of observed frequencies

WHEAT
High Low
High Low
PRECIP.
13
Chi-Square

Create a table of expected frequencies using
probability statistics ( High rain of high
yield cells)
Row total column total / table total

WHEAT
WHEAT
High Low
High Low
High Low
High Low
PRECIP.
PRECIP.
14
Compute Chi-Square

Therefore, in our example we have

High Low
High Low
High Low
High Low
Observed
Expected
High/High
High/Low
Low/Low
Low/Low
15
Interpreting Chi Square

Zero indicates no relationship
Large numbers indicate stronger relationship
Or, a table of significance can be consulted to
determine if the specific value is statistically
significant
The fact that we have shown that there is a
correlation between variables does NOT mean that
we have found out anything about WHY this is so.
In our analysis we might state our assumptions as
to why this is so, but we would need to perform
other analyses to show causation.

16
If you dont have Chi-Square values

Yules Q
Value of Yules Q always lies between 1 and 1
Value of 0 indicates no relationship
Value of 1 indicates a positive relationship
Value of 1 indicates a negative relationship

17
Analysis of Election 2000

Polygon to Polygon
Point to Polygon

18
Assessing Our Cultural Divide Results from the
2000 Presidential Election Arthur J. Lembo, Jr.
Ph.D. Cornell University Paul Overberg USAToday
ANALYSIS OF SPATIAL AUTOCORRELATION JOIN COUNT
ANALYSIS
ANALYSIS OF SPATIAL CORRESPONDENCE OVERLAY
ANALYSIS
A second analysis was used to determine the
likelihood of a county with urban areas voting
for either candidate. For this study, four
categories were evaluated counties with small
cities (under 50,000), medium sized cities
(50,000 75,000), large sized cities (greater
than 75,000), and no cities. Based on the
percentage of counties won by each candidate
(Gore 22 Bush 78) we computed the random
probability that a city would fall within a Bush
county or a Gore county. This probability
allowed us to determine the expected number of
cities that would be located within Gore counties
or Bush counties. The actual number of cities
located in a Gore county or Bush county was
determined using overlay analysis with ArcView.
Similar to the previous example, z-scores were
computed for each of the categories as
follows where O is the observed number of
cities falling within a county, E is the expected
number of cities falling within a county, p is
the probability of a city falling in a Bush
county, q is the probability of a city falling in
a Gore County, and n are the total number of
cities. Table 2. Cities Falling Inside a
County Won by Either Bush or Gore
Expected Expected Observed Observed
Z Z Gore
Bush Gore Bush Gore
Bush Large ( 75K) 66 238
184 119 267 272 Medium
(50-75K) 54 196
147 98 470 55 Small
(2030 1236 4,998 3 No
City 427 1588 347
1690 18 29
As previously stated, a purely random sample
drawn from a population whose true mean is 0 at
the 95 confidence level would fall within a
z-score range of /- 1.96 in magnitude. Table 2
indicates that each of the z-score values exceed
1.96. Implied from this is that significant
correlation among votes for Al Gore and counties
with cities, and votes for George W. Bush and
counties without cities (rural areas) exists.
Join Count Analysis is a method of spatial
autocorrelation that evaluates the statistical
significance of clustering among neighboring
polygons. Based upon the total number of
counties won by each candidate (Gore 588 Bush
2214), the expected number of adjacent counties
that voted for the same candidate (i.e. two
adjacent counties voting for Bush) was computed .
In addition, the actual number of adjacent
counties that voted for the same candidate was
also computed using spatial analysis techniques
in ArcView GIS. The results were as
follows Table 1. Expected vs. Actual Joins of
Adjacent Counties Voting for the Same Candidate
Expected Actual Expected
Expected Actual Gore/Gore Joins
Gore/Gore Joins Bush/Bush Joins
Bush/Bush Joins 438
879 5516 6253 Assuming
an independent random process, we computed the
z-score, or number of standard deviations away
from the mean for each candidates specified
number of joins (ZGore/Gore 15.47 ZBush/Bush
8.75). A purely random sample drawn from a
population whose true mean is 0 at the 95
confidence level would fall within a z-score
range of /- 1.96 in magnitude. Both numbers
were significantly higher than 1.96, indicating
significant positive spatial autocorrelation.
Therefore, the join count analysis showed that
clustering exists within the county voting
patterns. Inferred from this analysis is the
observation that regionalized voting patterns
existed in the 2000 Presidential Election.
ABSTRACT Although the 2000 Presidential election
was one of the closest in recent history, many
commentators noted that the voting patterns
appeared to exhibit a cultural divide, with
urban areas voting for Al Gore, and rural areas
voting for George W. Bush. Because most of the
comments are based on a subjective view of the
county voting patterns, this project attempts to
provide a quantifiable measure of the voting
patterns exhibited during the 2000 election.
Specifically, we were interested in determining
if a statistically significant clustering pattern
existed based on county-wide results, and if each
candidate won their assumed cultural association
(Gore Urban Bush rural). To test these
hypotheses, two separate spatial analysis methods
were performed on county-wide voting patterns
within the United States. The first method
utilized a principle of spatial autocorrelation
called join count analysis to determine if voting
patterns exhibited evidence of spatial
clustering. The second method used map overlay
to determine the likelihood of cities falling
within either Bush or Gore counties.
Conclusion This analysis provided quantifiable
evidence that positive spatial autocorrelation
(clustering) of voting patterns existed during
the 2000 Presidential Election. Also, the
analysis showed a high statistical correlation
between urbanized areas and county votes for Al
Gore. Further analysis is necessary to better
understand causation (i.e. ethnicity, income,
age), however both analyses indicate that
geographic regions (i.e. urban areas) may have
played a large role in the vote determination for
Election 2000. Data Provided Courtesy of
Election Data Services, and USAToday
Figure 1. Examples of Cities in Relation to the
Distribution of Counties. These examples from
New York and Minnesota show that although Bush
(in red) won a majority of the counties, the
cities appear clustered primarily within the few
counties in which Gore won (in blue). For
example, in Minnesota, a majority of the cities
exist within Hennepin County, while in New York,
virtually every county Gore won has a city within
its border.
19
Election 2000 Results

Join Count Analysis
Table 1. Expected vs. Actual Joins of Adjacent
Counties Voting for the Same Candidate
Expected Actual Expected
Expected Actual
Gore/Gore Joins Gore/Gore Joins
Bush/Bush Joins Bush/Bush Joins
438 879
5516 6253
ZGore/Gore 15.47 ZBush/Bush 8.75
Overlay Analysis
Table 2. Cities Falling Inside a County Won by
Either Bush or Gore
Expected Expected Observed
Observed Z Z
Gore Bush
Gore Bush Gore
Bush
Large ( 75K) 66 238
184 119 267
272
Medium (50-75K) 54 196
147 98 470 55
Small ( 2030 1236
4,998 3
No City 427 1588
347 1690 18
29

Not mutually exclusive from large cities. We must
account for this
20
Election 2000 Results

There was obvious spatial autocorrelation in the
way way people voted. That is, Bush counties and
Gore counties were highly clustered
Also, there are a very high correlation between
urbanized counties voting for Gore, and
non-urbanized counties voting for Bush

21
Analysis of Environmental Justice

Point in Polygon Analysis
By
Greg Thorhaugcss620 project Spring 2001

22
(No Transcript)
23
(No Transcript)
24
Erie Chi-Squared
25
Summary

Spatial Data Analysis is possible, through basic
statistical methods
More in-depth analysis is possible using spatial
statistics
GIS software may be used to prepare data for
statistical analysis
Spatial data analysis techniques provide a
powerful tool for analyzing GIS data, and enable
users to solve creative problems

26
Cross Tabulation

Assume we have a 9 cell land cover map, one from
1980 and one from 2000 with three categories A,
B, and C.
You can see that the resulting cross tabulation
provides a pixel, by pixel comparison of the
interpreted land cover types with the two dates.
So, for the upper left hand cell, the 1980 land
use was A, and the 2000 land cover also indicated
the value of A. Therefore, this is a match
between the 1980 data and 2000 data. However, in
the lower right cell you can see that the 1980
data indicated a value of C, while the 2000 value
was B. This is not a match, and would indicate
an error between the two sources.
We can now quantify the results into a matrix as
shown below. This matrix, is oftentimes called a
confusion matrix

Ground Reference Data
Interpreted Land Cover Data
Cross Tabulated Grid
A
B
A
B
B
A
BA
BB
AA
Cross Tabulate
B
C
C
B
B
C
BB
BC
CC
A
A
B
B
A
C
BA
AA
CB
A B C
A B C
2
0
2
0
2
1
0
1
1
27
Confusion Matrix
Ground Reference

The matrix on the right shows the comparison of
the two hypothetical data sets. The 1980 data
set and the 2000 data set .
As an example, geographic features that were
classified as A on the map in 1980, and actually
were still be A in 2000, represent the upper left
hand matrix with the value 2 (there were two
pixels that met this criteria). This means that
2 units in the overall map that were A, actually
is A. Similarly, the same exists for the
classifications of B and C.
But, there may have been times where the 1980
value was A and the 2000 value was B. In this
case, the 2 represented in the top row of the
matrix says that there are 2 units of something
that was A in 1980, but is now B in 2000.
We can begin to add these number up, by adding an
additional row and column. But what do these
numbers tell us?

A B C
A B C
2
0
2
Map Classification
0
2
1
0
1
1
Ground Reference
A B C
A B C
2
0
2
4
Map Classification
0
2
1
3
0
1
1
2
2
5
2

28
Comparing the maps

The bottom row tells us that there were two cells
that were A, five cells that were B, and two
cells that were C. The rightmost column tells us
that we mapped four cells as A, three cells as B,
and 2 cells as C. Adding up the Diagonal cells
says there were 5 cells where we actually got it
right.
So, the overall map comparison is really a
function of
Total cells on the diagonal / total number of
cells.
(2 2 1) / (2 2 0 0 2 1 0 11)
5/9 .55 agreement

Ground Reference
A B C
A B C
2
0
2
4
Map Classification
0
2
1
3
0
1
1
2
2
5
2

29
Other Accuracy Assessment

The total correspondence of our example is 55.
But, that only tells us part of the story. What
if we were really interested in classification B?
Where there changes in classification B? Even
here, there are two different ways of
interpreting that question
If I were interested in mapping all the areas of
B, how well did I get them all? This is called
the map Producers Accuracy. That is, how well
did we produce a map of classification B.
If I were to use the map to find B, how
successful would I be? This is called the Map
Users Accuracy. That is, much confidence should
a user of the map have for a given
classification.
To compute the map users accuracy, we would
divide the total number correct within a row with
the total number in the whole row. Staying with
our example of classification B
We said that we had two cells where B was
correct. However, we actually said that there
were three cells that contained B (in other
words, we incorrectly called a cell B, when it
should have been C). Therefore, we have
2 correct B values / 3 total values .66 users
accuracy.
This means that if we were to use this map and
look for the classification of B, we would be
correct 66 of the time.
To compute the map producers accuracy, we would
divide the total number of correct within a
column with the total number in the whole column.
Staying with our example of classification B
We said that we had two cells where B was
correct. However, we actually said that there
were five cells that should have been B.
Therefore, we have
2 correct B values / 5 total values that should
be B .4 producers accuracy
This means that the map produced only 40 of all
the Bs that were out there.

Ground Reference
A B C
A B C
2
0
2
4
Map Classification
0
2
1
3
0
1
1
2
2
5
2

30
User and Producer Accuracy
Users Accuracy
Ground Reference

To test your understanding of all this, compute
the users and producers accuracy for
classifications A and C.
This also gives us some indication of the nature
of the errors. For instance, it appears that we
confused classification A with classification B
(we said on two occasions that B was A). By
understanding the nature of the errors, perhaps
we can go back, look over our process and correct
for that mistake.