Mining for Spatial Patterns

About This Presentation

Title:

Mining for Spatial Patterns

Description:

Shashi Shekhar Department of Computer Science University of Minnesota http://www.cs.umn.edu/~shekhar Collaborators: V. Kumar, G. Karypis, C.T. Lu, W. Wu, Y. Huang, V ... – PowerPoint PPT presentation

Number of Views:136

Avg rating:3.0/5.0

Slides: 40

Provided by: csUmnEdu

Learn more at: https://www-users.cse.umn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Mining for Spatial Patterns

1
Mining for Spatial Patterns

Shashi Shekhar
Department of Computer Science
University of Minnesota http//www.cs.umn.edu/s
hekhar
Collaborators V. Kumar, G. Karypis, C.T. Lu, W.
Wu, Y. Huang, V. Raju, P. Zhang, P. Tan, M.
Steinbach
This work was partially funded by NASA and Army
High Performance Computing Center

2
Spatial Data Mining(SDM) - Examples

Historical Examples
London Asiatic Cholera 1854 (Griffith)
Dental health and fluoride in water, Colorado
early 1900s
Current Examples
Cancer clusters (CDC), Spread of disease (e.g.
Nile virus)
Crime hotspots (NIJ CML, police petrol planning)
Environmental justice (EPA), fair lending
practices
Upcoming Applications Location aware services
Defense Sensor networks, Mobile ad-hoc networks
Civilian Mortgage PMI determination based on
location

3
Army Relevance of SDM

Strategic
Predicting global hot spots (FORMID)
Army land endangered species vs. training and
war games
Search for local trends in massive simulation
data
Critical infra-structure defense (threat
assessment)
Tactical
Inferring enemy tactics (e.g. flank attack) from
blobology
Detection of lost ammunition dumps (Dr.
Radhakrishnan)
Operational
Interpretation of maps map matching (locating
oneself on map)
identify terrain feature, e.g. ravines, valleys,
ridge, etc.
Locating enemy (e.g. sniper in a haystack, sensor
networks)
Avoiding friendly fire

4
Spatial Data Mining(SDM) - Definition

Search of implicit, interesting patterns in
geo-spatial data
Ex. Reconnaissance, Vector maps(NIMA, TEC), GPS,
Sensor networks
Data Mining vs. Statistics
Primary vs. Secondary analysis
Global vs. local trends
Spatial Data Mining vs. Data Mining
Spatial Autocorrelation
Continuous vs. Discrete data types

5
Background

Spatial Data Mining
Spatial statistics in Geology, Regional Economics
NSF workshop on GIS and DM (3/99)
NSF workshop on spatial data analysis (5/02)
Spatial patterns
Spatial outliers
Location prediction
Associations, colocations
Hotspots, Clustering, trends,

6
Framework

2 Approaches to mining Spatial Data
1. Pick spatial features use classical DM
methods
2. Use novel data mining techniques
Our Approach
Define the problem capture special needs
Explore data using maps, other visualization
Try reusing classical DM methods
If classical DM perform poorly, try new methods
Evaluate chosen methods rigourously
Performance tuning if needed

7
Spatial Association Rule

Citation Symp. On Spatial Databases 2001
Problem Given a set of boolean spatial features
find subsets of co-located features, e.g. (fire,
drought, vegetation)
Data - continuous space, partition not natural,
no reference feature
Classical data mining approach association rules
But, Look Ma! No Transactions!!! No support
measure!
Approach Work with continuous data without
transactionizing it!
confidence Pr.fire at s drought in N(s) and
vegetation in N(s)
support cardinality of spatial join of instances
of fire, drought, dry veg.
participation min. fraction of instances of a
features in join result
new algorithm using spatial joins and apriori_gen
filters

8
Event Definition

Convert the time series into sequence of events
at each spatial location.

9
Interesting Association Patterns

Use domain knowledge to eliminate uninteresting
patterns.
A pattern is less interesting if it occurs at
random locations.
Approach
Partition the land area into distinct groups
(e.g., based on land-cover type).
For each pattern, find the regions for which the
pattern can be applied.
If the pattern occurs mostly in a certain group
of land areas, then it is potentially
interesting.
If the pattern occurs frequently in all groups of
land areas, then it is less interesting.

10
Association Rules

Intra-zone non-sequential Patterns

Region corresponds to semi-arid grasslands, a
type of vegetation, which is able to quickly take
advantage of high precipitation than forests.
Hypothesis FPAR-Hi events could be related to
unusual precipitation conditions.

11
Co-location
Can you find co-location patterns from the
following sample dataset?
Answers and
12
Co-location
Spatial Co-location A set of features
frequently co-located Given A set T of K
boolean spatial feature types Tf1,f2, ,
fk A set P of N locations Pp1, , pN in
a spatial frame work S, pi? P is of some spatial
feature in T A neighbor relation R over
locations in S Find Tc ?subsets of T
frequently co-located Objective Correctness
Completeness Efficiency Constraints R
is symmetric and reflexive Monotonic
prevalence measure
Reference Feature Centric
Window Centric
Event Centric
13
Co-location
Comparison with association rules
Association rules Co-location rules
underlying space discrete sets continuous space
item-types item-types events /Boolean spatial features
collections transactions neighborhoods
prevalence measure support participation index
conditional probability measure Pr. A in T B in T Pr. A in N(L) B at L
Participation index Participation ratio pr(fi, c)
of feature fi in co-location c f1, f2, , fk
fraction of instances of fi with feature f1, ,
fi-1, fi1, , fk nearby 2.Participation index
minpr(fi, c) Algorithm Hybrid Co-location
Miner
14
Spatial Co-location Patterns

Dataset

Spatial feature A,B,C and their instances
Possible associations are (A, B), (B, C), etc.
Neighbor relationship includes following pairs
A1, B1
A2, B1
A2, B2
B1, C1
B2, C2

15
Spatial Co-location Patterns

Partition approachYasuhiko, KDD 2001
Support not well defined,i.e. not independent of
execution trace
Has a fast heuristic which is hard to analyze for
correctness/completeness

Dataset

Spatial feature A,B, C, and their instances
Support A,B1 B,C2
Support A,B 2 B,C2
16
Spatial Co-location Patterns

Dataset

Reference feature approach Han SSD 95
C as reference feature to get transactions
Transactions (B1) (B2)
Support (A,B) ? from Apriori algorithm
Note Neighbor relationship includes following
pairs
A1, B1
A2, B1
A2, B2
B1, C1
B2, C2

Spatial feature A,B, C, and their instances
17
Spatial Co-location Patterns

Our approach (Event Centric)
Neighborhood instead of transactions
Spatial join on neighbor relationship
Support ? Prevalence
Participation index min. p_ratio
P_ratio(A, (A,B)) fraction of instance of A
participating in join(A,B, neighbor)
Examples
Support(A,B)min(2/2,3/3)1
Support(B,C)min(2/2,2/2)1

Dataset

Spatial feature A,B, C, and their instances
18
Spatial Co-location Patterns

Partition approach

Our approach

Dataset

Support(A,B)min(2/2,3/3)1
Spatial feature A,B, C, and their instances
Support(B,C)min(2/2,2/2)1
Support A,B 2 B,C2

Reference feature approach

C as reference feature Transactions (B1)
(B2) Support (A,B) ?
Support A,B1 B,C2
19
Spatial Outliers

Spatial Outlier A data point that is extreme
relative to it neighbors
Case Study traffic stations different from
neighbors SIGKDD 2001, JIDA 2002
Data - space-time plot, distr. Of f(x), S(x)
Distribution of base attribute
spatially smooth
frequency distribution over value domain normal
Classical test - Pr.item in population is low
Q? distribution of diff.f(x), neighborhood
aggf(x)
Insight this statistic is distributed normally!
Test (z-score on the statistics) gt 2
Performance - spatial join, clustering methods

20
Spatial Outlier Detection
Given A spatial graph GV,E A neighbor
relationship (K neighbors) An attribute
function V -gt R An aggregation function
R k -gt R A comparison function
Confidence level threshold ? Statistic test
function ST R -gtT, F Find O vi vi ?V,
vi is a spatial outlier Objective
Correctness The attribute values of vi is
extreme, compared with its neighbors
Computational efficiency Constraints
and ST are algebraic aggregate functions of
and Computation cost dominated by I/O op.
21
Spatial Outlier Detection
Spatial Outlier Detection Test 1. Choice of
Spatial Statistic S(x) f(x)E y?
N(x)(f(y)) Theorem S(x) is normally
distributed if f(x) is
normally distributed 2. Test for Outlier
Detection (S(x) - ?s) / ?s gt ?
Hypothesis I/O cost determined by clustering
efficiency
f(x)
S(x)
22
Graphical Spatial Tests
Moran Scatter Plot
Original Data
Variogram Cloud
23
A Unified Approach Spatial Outliers

Tests quantitative, graphical
Results
Computation spatial self-join
Tests algebraic functions of join
Join predicate neighbor relations
I/O-cost f(clustering efficiency)
Our algorithm is I/O-efficient for
Algebraic tests

Scatter Plot
Original Data
Our Approach
24
Spatial Outlier Detection
Results 1. CCAM achieves higher clustering
efficiency (CE) 2. CCAM has lower I/O cost
3. High CE gt low I/O cost 4. Big Page gt high
CE
I/O cost
CE value
Z-order
CCAM
Cell-Tree
25
Location Prediction

Citations IEEE Tran. on Multimedia 2002, SIAM DM
Conf. 2001, SIGKDD DMKD 2000
Problem predict nesting site in marshes
given vegetation, water depth, distance to edge,
etc.
Data - maps of nests and attributes
spatially clustered nests, spatially smooth
attributes
Classical method logistic regression, decision
trees, bayesian classifier
but, independence assumption is violated ! Misses
auto-correlation !
Spatial auto-regression (SAR), Markov random
field bayesian classifier
Open issues spatial accuracy vs. classification
accurary
Open issue performance - SAR learning is slow!

26
Location Prediction
Given 1. Spatial Framework 2. Explanatory
functions 3. A dependent class 4. A family
of function mappings Find Classification
model Objectivemaximize classification_accurac
y Constraints Spatial Autocorrelation exists

Nest locations
Distance to open water
Water depth
Vegetation durability
27
Motivation and Framework
28
Spatial AutoRegression (SAR)

Spatial Autoregression Model (SAR)
y ?Wy X? ?
W models neighborhood relationships
? models strength of spatial dependencies
? error vector
Solutions
? and ? - can be estimated using ML or Bayesian
stat.
e.g., spatial econometrics package uses Bayesian
approach using sampling-based Markov Chain Monte
Carlo (MCMC) method.
Likelihood-based estimation requires O(n3) ops.
Other alternatives divide and conquer, sparse
matrix, LU decomposition, etc.

29
Evaluation

Linear Regression
Spatial Regression
Spatial model is better

30
MRF Bayesian

Markov Random Field based Bayesian Classifiers
Pr(li X, Li) Pr(Xli, Li) Pr(li Li) / Pr
(X)
Pr(li Li) can be estimated from training data
Li denotes set of labels in the neighborhood of
si excluding labels at si
Pr(Xli, Li) can be estimated using kernel
functions
Solutions
stochastic relaxation Geman
Iterated conditional modes Besag
Graph cut Boykov

31
Experiment Design
32
Prediction Maps(Learning)
MRF-P Prediction (ADNP3.36)
Actual Nest Sites (Real Learning)
NZ85
NZ138
MRF-GMM Prediction (ADNP5.88)
SAR Prediction (ADNP9.80)
NZ140
NZ130
33
Prediction Maps(Testing)
MRF-P Prediction (ADNP2.84)
Actual Nest Sites (Real Testing)
Actual Nest Sites (Real Learning)
NZ30
NZ80
MRF-GMM Prediction (ADNP3.35)
SAR Prediction (ADNP8.63)
NZ76
NZ80
34
Comparison (MRF-BC vs. SAR)

SAR can be rewritten as y (QX) ? Q?
where Q (I- ?W)-1 which can be viewed as a
spatial smoothing operation.
This transformation shows that SAR is similar to
linear logistic model, and thus suffers with same
limitations i.e., SAR model assumes linear
separability of classes in transformed feature
space
SAR model also make more restrictive assumptions
about the distribution of features and class
shapes than MRF
The relationship between SAR and MRF are
analogous to the relationship between logistic
regression and Bayesian classifiers.
Our experimental results shows that MRF model
yields better spatial and classification
accuracies than SAR predictions.

35
MRF vs. SAR
Confusion Matrix
Spatial Confusion Matrix
36
Conclusion and Future Directions

Spatial domains may not satisfy assumptions of
classical methods
data auto-correlation, continuous geographic
space
patterns global vs. local, e.g. spatial outliers
vs. outliers
data exploration maps and albums
Open Issues
patterns hot-spots, blobology (shape), spatial
trends,
metrics spatial accuracy(predicted locations),
spatial contiguity(clusters)
spatio-temporal dataset
scale and resolutions sentivity of patterns
geo-statistical confidence measure for mined
patterns

37
Army Relevance and Collaborations

Relevance Maps are as important to soldiers as
guns - unknown
Joint Projects
High Performance GIS for Battlefield Simulation
(ARL Adelphi)
Spatial Querying for Battlefield Situation
Assessment (ARL Adelphi)
Joint Publications
w/ G. Turner (ARL Adelphi, MD) D. Chubb (CECOM
IEWD)
IEEE Computer (December 1996)
IEEE Transactions on Knowledge and Data Eng.
(July-Aug. 1998)
Three conference papers
Visits, Other Collaborations
GIS group, Waterways Experimentation Station
(Army)
Concept Analysis Agency, Topographic Eng.
Center, ARL, Adelphi
Workshop on Battlefield Visualization and Real
Time GIS (4/2000)

38
Reference

S. Shekhar, S. Chawla, S. Ravada, A. Fetterer, X.
Liu and C.T. Liu, Spatial Databases
Accomplishments and Research Needs, IEEE
Transactions on Knowledge and Data Engineering,
Jan.-Feb. 1999.
S. Shekhar and Y. Huang, Discovering Spatial
Co-location Patterns a Summary of Results, In
Proc. of 7th International Symposium on Spatial
and Temporal Databases (SSTD01), July 2001.
S. Shekhar, C.T. Lu, P. Zhang, "Detecting
Graph-based Spatial Outliers Algorithms and
Applications, the Seventh ACM SIGKDD
International Conference on Knowledge Discovery
and Data Mining, 2001.
S. Shekhar, C.T. Lu, P. Zhang, Detecting
Graph-based Saptial Outlier, Intelligent Data
Analysis, To appear in Vol. 6(3), 2002
S. Shekhar, S. Chawla, the book Spatial
Database Concepts, Implementation and Trends,
Prentice Hall, 2002
S. Chawla, S. Shekhar, W. Wu and U. Ozesmi,
Extending Data Mining for Spatial Applications
A Case Study in Predicting Nest Locations, Proc.
Int. Confi. on 2000 ACM SIGMOD Workshop on
Research Issues in Data Mining and Knowledge
Discovery (DMKD 2000), Dallas, TX, May 14, 2000.
S. Chawla, S. Shekhar, W. Wu and U. Ozesmi,
Modeling Spatial Dependencies for Mining
Geospatial Data, First SIAM International
Conference on Data Mining, 2001.
S. Shekhar, P.R. Schrater, R. R. Vatsavai, W. Wu,
and S. Chawla, Spatial Contextual Classification
and Prediction Models for Mining Geospatial
Data,To Appear in IEEE Transactions on
Multimedia, 2002.
S. Shekhar, V. Kumar, P. Tan. M. Steinbach, Y.
Huang, P. Zhang, C. Potter, S. Klooster, Mining
Patterns in Earth Science Data, IEEE Computing
in Science and Engineering (Submitted)

39
Reference

S. Shekhar, C.T. Lu, P. Zhang, A Unified
Approach to Spatial Outliers Detection, IEEE
Transactions on Knowledge and Data Engineering
(Submitted)
S. Shekhar, C.T. Lu, X. Tan, S. Chawla, Map Cube
A Visualization Tool for Spatial Data Warehouses,
as Chapter of Geographic Data Mining and
Knowledge Discovery. Harvey J. Miller and Jiawei
Han (eds.), Taylor and Francis, 2001, ISBN
0-415-23369-0.
S. Shekhar, Y. Huang, W. Wu, C.T. Lu, What's
Spatial about Spatial Data Mining Three Case
Studies , as Chapter of Book Data Mining for
Scientific and Engineering Applications. V.
Kumar, R. Grossman, C. Kamath, R. Namburu (eds.),
Kluwer Academic Pub., 2001, ISBN 1-4020-0033-2
Shashi Shekhar and Yan Huang , Multi-resolution
Co-location Miner a New Algorithm to Find
Co-location Patterns in Spatial Datasets, Fifth
Workshop on Mining Scientific Datasets (SIAM 2nd
Data Mining Conference), April 2002