Title: Spatial Data Analysis Areas II
1Spatial Data AnalysisAreas II Exploratory
Spatial Data Analysis
Ifgi, Muenster, Fall School 2005
- Gilberto Câmara
- INPE, Brazil
2Data-Driven Approaches
- Exploratory spatial data analysis" (ESDA)
- Point pattern analysis
- Indices of spatial association
- Compare the observed pattern in the data (e.g.,
locations in point pattern analysis, values at
locations in spatial autocorrelation) to one in
which space is irrelevant. - The second common aspect is that the spatial
pattern, spatial structure, or form for the
spatial dependence are derived from the data
only.
3Spatial Autocorrelation
- Complicated name, simple concept...
- Expresses the amount of spatial dependence
- How much proximity matters in spatial data
- Correlation is the key notion
- It indicates how much two properties vary
together - Correlation in space
- Is a variable in a location correlated with its
values in nearby places? - Spatial auto correlation
4Positive, High Correlation
5Sometimes we need to transform the data
Scatter plots (a) Y versus PORC3_NR (percentage
of large farms in number ) (b) log10 Y versus
log 10 (PORC3_NR).
Predicted versus Observed Plots (a) model with
variables not transformed) R2 0.61 (b) Model
7 R2 0.85.
6Log x linear correlation
- Y aX - linear corellation
- Y Xa or log Y a logX log correlation
7No Correlation
8Is this data spatially autocorrelated?
9Spatial Randomness
- Null Hypothesis No Spatial Autocorrelation
- Spatial randomness
- values observed at a location do not depend on
values observed at neighboring locations - observed spatial pattern of values is equally
likely as any other spatial pattern - the location of values may be altered without
affecting the information content of the data
10Random or Clustered?
Columbus homicide data (source Luc Anselin)
11Random or Clustered?
Columbus homicide data (source Luc Anselin)
12Random or Clustered?
Columbus homicide data (source Luc Anselin)
13Exploratory Spatial Data Analysis
- Visualization of spatial data
- Global Indicators of Spatial Autocorrelation
- Local Indicators of Spatial Autocorrelation
(LISA)
14Visualization of Area Patterns
- Grouping
- Equal intervals
- Quantiles
- Standard deviation
- Be careful!
- Color mapas can lead to wrong interpretation
Breast cancer in England (1985-1989)
Source Bailey and Gattrel, 1995
15 Equal-Interval Visualization
- Defined by maximum and minimum values.
- Shows data dispersion.
- Outliers can mask differences.
Source Bailey and Gattrel, 1995
16Quantiles
- Each group has the same numbre of elements
- Ordenation
- e.g best 25 and worst 25
Source Bailey and Gattrel, 1995
17Standard Deviations
- Dispersion around a mean value
- Breaks 1 stdev, 1/2 stdev
- Shows the statistical behaviour
- Best for normality case
Source Bailey and Gattrel, 1995
18Visualization
Source Bailey and Gattrel, 1995
19Visualization
Source Bailey and Gattrel, 1995
20Spatial Proximity Matrix
- Matrix W (n x n) , where each elements wij
represents a measure of nearness between Oi and
Oj - Criteria
- wij 1, if Oi touches Oj
- wij 1, if distance(Oi, Oj) lt h
21Moving Averages
- Local smoothing of attribute values
- where
- Wij is the spatial weights matrix.
- yi is the attribute value for each area.
- n is the number of areas
22Moving Averages
- Proportion of population aged 70 or older, São
Paulo, 1991
23Moving Averages using Bar Graphs
Regions where there is a large difference between
the original value and the local mean Indicates
places of spatial transitions
Atributo
Média local
24Moran Scatterplot Values x Local Means
Q1 (val. , means ) and Q2 (val. -, means
-) Locations of positive spatial
association (Im similar to my neighbours).
WZ
Q1
Q4
a
0
Q3 (val. , means -) and Q4 (val. -, means
) Locations of negative spatial
association (Im different from my neighbours).
Q2
Q3
z
0
25Moran Scatterplot Map
São Paulo
WZ
Q1 HH
Q4 LH
a
0
Q3 HL
Q2 LL
z
0
Old-aged population
26Indicators of spatial autocorrelations
global
local
where
spatial proximity between i and j
a
measured relation between object and its
neighbors
ij
27Indicators of spatial autocorrelation
n
n
n
å
å
å
G
G
a
w
w
a
ij
ij
i
ij
ij
j
i
j
(
)
(
)
Moran (covariance)
z
z
-
-
x
x
x
x
i
j
j
i
(
)
(
)
2
2
-
-
Geary (variance)
z
z
x
x
j
i
j
i
(
)
(
)
x
x
ou
x
G or G (moving averages)
z
ou
z
z
j
i
j
j
i
j
28Global Indicators of Spatial Autocorrelation
- Morans I
- onde
- n number of areas,
- yi attribute value in area i,
- mean value in study region
- wij spatial weigths matrix.
- How to interpret the above equation?
29Global Indicators of Spatial Autocorrelation
- Similar to tradicional correlation calculation,
but restricted to spatial neighbours - Values of I go from -1 to 1.
- -1 negative spatial autocorrelation
- 0 no spatial autocorrelation
- 1 positive spatial autocorrelation
- For the old-age population in São Paulo, I0.45
- Is this significant?
30Randomization Strategy
- Empirical Distribution Function
- permute arrangement of objects
- associate values with locations
- associate locations with values
- recompute indicators
- Obtain a distribution
- Compare observed G to distribution of
pseudo-Significance - p (t 1) / (m 1)
- M permutations
- T times GAW G
31Random or Clustered?
extremo
Distribuição simulada
- Testing Morans I
- Permutate the spatial values 999 times
- Obtain a probability distribution
- Locate the real value in the distribution
- In this case, I .45 (very significant!)
32Pros and cons of randomization
- Advantages
- non-parametric
- no distributional assumptions
- easy to compute
- easy to interpret
- Disadvantages
- sample specific
- no generalization to population
- precision of pseudo significance arbitrary
- 1/(991) yields 0.01, and 1/(9991) yields 0.00
- sensitive to random number generator
33Random or Clustered?
Morans I -0.003
Morans I 0.486
Columbus homicide data (source Luc Anselin)
34Spatial Analysis
What distinguishes spatial statistical data
analysis is that its main focus is on inquiring
about spatial patterns of places and values, the
spatial association between them and the
sistematic variation of the phenomenon in
diffeent locations. Anselin,1992
35Local Indicators of Spatial Autocorrelation (LISA)
- Moran I is global
- What if we want to find out the spatial
correlation of each area? - Use a local indicator
- Compares local value to that of its neighbours
36Local and Global Analysis
- Global
- one statistic to summarize pattern
- Clustering
- Homogeneity
- Local
- location-specific statistics
- clusters
- heterogeneity
37LISA Definition (Anselin 1995)
- LISA satisfies two requirements
- indicate significant spatial clustering for each
location - sum of LISA proportional to a global indicator of
spatial association - LISA Forms of Global Statistics
- local Moran, local Geary, local Gamma
38Use of LISA
- Identify Hot Spots
- significant local clusters in the absence of
global autocorrelation - some complications in the presence of global
autocorrelation (extra heterogeneity) - significant local outliers
- high surrounded by low and vice versa
- Indicate Local Instability
- local deviations from global pattern of spatial
autocorrelation
39Local Indicators of Spatial Autocorrelation (LISA)
LISAs enable a quantitative expression of spatial
distribution of values
Distributution characteristics
-concentrations -persistences -transitions
40Local Indicators of Spatial Autocorrelation (LISA)
Local Moran
G index
where is the spatial weight for
objects i and j
41Distance Statistics for Local Spatial Association
- Getis-Ord Gi and Gi
- one statistic for each location
- contiguity as distance bands, wij(d)
- Gi Statistic
- does not include observation i
- Gi Statistic
- includes observation i in sum
42Interpretation of Gi Statistics
- Local Spatial Association
- positive clusters of high values
- negative clusters of low values
- Inference
- randomization
- permutation
- Visualization
- map of locations with significant Gi or Gi
43Spatial weights matrix
44Local Indicators of Spatial Autocorrelation (LISA)
- How can we know if a LISA value means anything?
- Use permutation to construct a probability
distribtuion - Change everybodys place but one region
- Produce a map showing those areas whose LISA
values are different from the rest (LISA MAP). - Statistical Significance
- Not significant
- Significant at 95 (1,96s), 99 (2,54s) e 99,9
(3,2s).
45LISA Map for old age in São Paulo
46Data
proportion of jobs per local population in
greater São Paulo
47Local moran signifcance map
48ANÁLISE ESPACIAL II - LISA
Mapa Gi normalizado classificados por desvios
padrão
49ANÁLISE ESPACIAL II - LISA
Mapa de Espalhamento de Moran
50Interpretation and Limitations
- Most Important
- assessing lack of spatial randomness
- suggests significant spatial structure
- Multivariate Association
- univariate spatial autocorrelation may result
from - multivariate association
- scale mismatch
- need to control for other variables spatial
regression - LISA Clusters and Hot Spots
- suggest interesting locations
- do not explain
51LISAs in São Paulo
Population density
population
52Local Moran percentage of rented houses
LM SÉ 3.579176 LAPA
-1.555046 SANTA CECÍLIA 3.128312 REPÚBLICA
5.159141 BOM RETIRO 2.788280 BRÁS
2.360710
53 Gi - percentage of rented housing
Gi_qi_aluguel
Region Z PROB CONSOLAÇÃO
3.7602 0.0002 SÉ 3.6893 0.0002 CAMPO
LIMPO -3.0400 0.0024 JD. ÂNGELA -2.7608
0.0058
54Gi - percentage of rented housing
GI_qi_aluguel
IBGE Z PROB SÉ 4.1501
0.0000 REPÚBLICA 4.0764 0.0000 JD. SÃO
LUIS -3.2949 0.0010 CAMPO LIMPO -3.1093
0.0019
55Local moran no income
IBGE LOCAL MORAN CONSOLAÇÃO
3.372106 JD. PAULISTA 5.925623 ITAIM
PAULISTA 4.440743 JD. HELENA
3.608146 MOEMA 4.258492
56Gi statistic no income
Gi_qi_srend
Z PROB VILA CURUÇÁ 4.1568
0.0000 SÃO MIGUEL 3.7919 0.0001 MOEMA
-4.6730 0.0000 ITAIM BIBI -4.5468 0.0000
57Gi statistic no income
Z PROB VILA CURUÇÁ 4.3464
0.0000 ITAIM PAULISTA 4.0837 0.0000 MOEMA
-5.0586 0.0000 ITAIM BIBI -4.8552 0.0000