Title: Spatial Data Mining
1Spatial Data Mining
2Learning Objectives
- Understand the concept of Spatial Data Mining
- Learn techniques on how to find spatial patterns
3Examples of Spatial Patterns
- 1855 Asiatic Cholera in London.
- A water pump identified as the source.
- Cancer cluster to investigate health hazards.
- Crime hotspots for planning police patrol routes.
- Affects of weather in the US caused by unusual
warming of Pacific ocean (El Nino).
4What is a Spatial Pattern?
- What is not a pattern?
- Random, haphazard, chance, stray, accidental,
unexpected. - Without definite direction, trend, rule, method,
design, aim, purpose. - What is a Pattern?
- A frequent arrangement, configuration,
composition, regularity. - A rule, law, method, design, description.
- A major direction, trend, prediction.
5Defining Spatial Data Mining
- Search for spatial patterns.
- Non-trivial search as automated as possible.
- Large search space of plausible hypothesis
- Ex. Asiatic cholera causes water, food, air,
insects. - Interesting, useful, and unexpected spatial
patterns. - Useful in certain application domain
- Ex. Shutting off identified water pump gt saved
human lives. - May provide a new understanding of the world
- Ex. Water pump Cholera connection lead to the
germ theory.
6What is NOT Spatial Data Mining
- Simple querying of Spatial Data
- Finding neighbors of Canada given names and
boundaries of all countries (Search space not
large) - Uninteresting or obvious patterns
- Heavy rainfall in Minneapolis is correlated with
heavy rainfall in St. Paul (10 miles apart). - Common knowledge, nearby places have similar
rainfall - Mining of non-spatial data
- Diaper sales and beer sales are correlated in
evenings
7Families of Spatial Data Mining Patterns
- Location Prediction
- Where will a phenomenon occur?
- Spatial Interactions
- Which subset of spatial phenomena interact?
- Hot spot
- Which locations are unusual or share
commonalities?
8Location Prediction
- Where will a phenomenon occur?
- Which spatial events are predictable?
- How can a spatial event be predicted from other
spatial events? - Examples
- Where will an endangered bird nest?
- Which areas are prone to fire given maps of
vegitation and drought? - What should be recommended to a traveler in a
given location?
9Spatial Interactions
- Which spatial events are related to each other?
- Which spatial phenomena depend on other
phenomenon? - Examples
- Earth science
- climate and disturbance gt wild fires, hot, dry,
lightning - Epidemiology
- Disease type and enviornmental events gt West
Nile disease, stagnant water source, dead birds,
mosquitoes
10Hot spots
- Is a phenomenon spatially clutered?
- Which spatial entities are unusual or share
common characteristics? - Examples
- Crime hot spots to plan police patrols
11Spatial Queries
- Spatial Range Queries
- Find all cities within 50 miles of Paris
- Query has associated region (location, boundary)
- Answer includes overlapping or contained data
regions - Nearest-Neighbor Queries
- Find the 10 cities nearest to Paris
- Results must be ordered by proximity
- Spatial Join Queries
- Find all cities near a lake
- Join condition involves regions and proximity.
12Unique Properties of Spatial Patterns
- Items in a traditional data are independent of
each other, where as properties of location in a
map are often auto-correlated (patterns exist) - Traditional data deals with simple domains, e.g.
numbers and symbols where as spatial data types
are complex - Items in traditional data describe discrete
objects where as spatial data is continuous
13Association Rules
- Support the number of time a rule shows up in a
database - Confidence Conditional probability of Y given X
- Example
- (Bedrock type limestone), (soil depth lt 50 ft)
gt (sink hole risk high) - Support 20 , confidence 0.8
- Interpretation Locations with limestone bedrock
and low soil depth have high risk of sink hole
formation.
14Apriori Algorithm to mine association rules
- Key challenge
- Very large search space
- Key assumption
- Few associations are support above given
threshold - Associations with low support are not interesting
- Key insight
- If an association item set has high support, then
so do all its subsets
15Association rules Example
16Techniques for Association Mining
- Classical method
- Association rules given item types and
transactions - Assumes spatial data can be decomposed into
transactions - Such decomposition may alter spatial patterns
- New spatial method
- Spatial association rule
- Spatial co-location
17Associations, Spatial associations, co-location
18Associations, Spatial associatins, co-location
19Co-location Rules
- For point data in space
- Does not need transaction, works directly with
continuous space - Use neighborhood definition and spatial joins
20Co-location rules
21Clustering
- Process of discovering groups in large databases
- Spatial view rows in a database points in a
multi-dimentional space. - Visualization may reveal interesting groups
22Clustering
- Hierarchical
- All points in one cluster
- Split and merge till a stop criterion is reached
- Partitional
- Start with random central point
- Assign points to nearest central point
- Update the central points
- Approach with statistical rigor
- Density
- Find clusters based on density of regions
23Outliers
- Observations inconsistent with rest of the
dataset - Observations inconsistent with their
neighborhoods - A local instability or discontinuity
24Variogram Cloud
- Create a variogram by plotting attribute
difference, distance for each pair of points - Select points common to many outlying pairs
25Moran Scatter Plot
- Plot normalized attribute values, weighted
average in the neighborhood for each location - Select points in upper left and lower right
quadrant
26Scatter plot
- Plot normalized attribute values, weighted
average in the neighborhood for each location - Fit a liner regression line
- Select points which are unusually far from the
regression line.
27Conclusion
- Patterns are opposite of random
- Common spatial patterns
- Location prediction
- Feature interaction
- Hot spot
- Spatial patterns may be discovered using
- Techniques like associations, clustering and
outlier detection