Spatial Data Mining: Three Case Studies - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Spatial Data Mining: Three Case Studies

Description:

Spatial Databases are too large to analyze manually. NASA ... Function family : generalized linear (GL)(logit, probit), non-linear, GL with auto-correlation ... – PowerPoint PPT presentation

Number of Views:316
Avg rating:3.0/5.0
Slides: 43
Provided by: shashis
Category:

less

Transcript and Presenter's Notes

Title: Spatial Data Mining: Three Case Studies


1
Spatial Data Mining Three Case Studies
  • Presented by Chang-Tien Lu
  • Spatial Database Lab
  • Department of Computer Science
  • University of Minnesota
  • ctlu_at_cs.umn.edu
  • http//www.cs.umn.edu/research/shashi-group
  • Group Members
  • Shashi Shekhar, Weili Wu, Yan Huang, C.T. Lu

2
Outline
  • Introduction
  • Case 1 Location Prediction
  • Case 2 Spatial Association Co-location
  • Case 3 Spatial Outlier Detection
  • Conclusion and Future Directions

3
Introduction spatial data mining
  • Spatial Databases are too large to analyze
    manually
  • NASA Earth Observation System (EOS)
  • National Institute of Justice Crime mapping
  • Census Bureau, Dept. of Commerce - Census Data
  • Spatial Data Mining
  • Discover frequent and interesting spatial
    patterns for post processing (knowledge
    discovery)
  • Pattern examples spatial outliers, location
    prediction, clustering, spatial association,
    trends, ..
  • Historical Example
  • London, 1854
  • Cholera water pump

4
Framework
  • Problem statement capture special needs
  • Data exploration maps
  • Try reusing classical methods
  • data mining, spatial statistics
  • Invent new methods if reuse is not applicable
  • Develop efficient algorithms
  • Validation, Performance tuning

5
Case 1 Location Prediction
  • Problem predict nesting site in marshes
  • Given vegetation, water depth, distance to edge,
    etc.
  • Data - maps of nests and attributes
  • spatially clustered nests, spatially smooth
    attributes
  • Classical method logistic regression, decision
    trees, bayesian classifier
  • but, independence assumption is violated !
  • Misses auto-correlation !
  • Spatial auto-regression (SAR)
  • Open issues spatial accuracy vs. classification
    accurary
  • Open issue performance - SAR learning is slow!

6
Location Prediction
Given 1. Spatial Framework 2. Explanatory
functions 3. A dependent class 4. A family
of function mappings Find Classification
model Objectivemaximize classification_accurac
y Constraints Spatial Autocorrelation exists

Nest locations
Distance to open water
Vegetation durability
Water depth
7
Evaluation Change Model
  • Linear Regression
  • Spatial Autoregression Model (SAR)
  • y ?Wy X? ?
  • W models neighborhood relationships
  • ? models strength of spatial dependencies
  • ? error vector
  • Mixed Spatial Autoregression Model (MSAR)
  • y ?Wy X? WX? ?
  • Consider the impact of the explanatory variables
    from the neighboring observations

8
Measure ROC Curve
  • Classification accuracy confusion matrix
  • ROC Curve Locus of the pair (TPR,FPR) for each
    cut-off probability
  • Receiver Operating Characteristic (ROC)
  • TPR AnPn / (AnPn AnPnn)
  • FPR AnnPn / (AnnPnAnnPnn)

9
Evaluation Change Model
  • Linear Regression
  • Spatial Regression
  • Spatial model is better

10
Solution Procedures
  • Spatial Autoregression Model (SAR)
  • y ?Wy X? ?
  • Solutions
  • ? and ? - can be estimated using Maximum
    likelihood theory or Bayesian statistics.
  • e.g., spatial econometrics package uses Bayesian
    approach using sampling-based Markov Chain Monte
    Carlo (MCMC) method.
  • Maximum likelihood-based estimation requires
    O(n3) ops.

11
Evaluation Chang measure
  • Spatial accuracy (map similarity)
  • New measure ADNP
  • Average distance to nearest prediction

12
Predicting Location using Map Similarity
13
Predicting location using Map Similarity
  • PLUMS components
  • Map Similarity Avg. Distance to Nearest
    Prediction(ADNP) ,..
  • Search Algorithm Greedy, gradient descent
  • Function family generalized linear (GL)(logit,
    probit), non-linear,
  • GL with auto-correlation
  • Discretization of parameter space Uniform,
    non-uniform,
  • multi-resolution,

14
Association Rule
  • Supermarket shelf management
  • Goal To identify items that are bought together
    by sufficiently many customers
  • Approach Process the point-of-scale data
    collected with barcode scanners to find
    dependencies among items (Transaction data)
  • A classic rule
  • If a customer buys diaper and milk, then he is
    very likely to buy beer
  • So, dont be surprised if you find six-packs of
    beer stacked next to diapers!

15
Association RulesSupport and confidence
  • Item set I i1, i2, .ik
  • Transactions T t1, t2, tn
  • Association rule A -gt B
  • Support S
  • (A and B) occur in at least S percent of the
    transactions
  • P (A U B)
  • Confidence C
  • Of all the transactions in which A occurs, at
    least C percent of them contains B
  • P (BA)

16
Case 2 Spatial Association Rule
  • Problem Given a set of boolean spatial features
  • find subsets of co-located features,
  • e.g. (fire, drought, vegetation)
  • Data - continuous space, partition not natural
  • Classical data mining approach association rules
  • But, No Transactions!!! No support measure!!
  • Approach Work with continuous data without
    transactionizing it!
  • Participation index (support) min. fraction of
    instances of a features in join result
  • Confidence Pr.fire at s drought in N(s) and
    vegetation in N(s)
  • new algorithm using spatial joins

17
Co-location
Can you find co-location patterns from the
following sample dataset?
Answers and
18
Co-location
Can you find co-location patterns from the
following sample dataset?
19
Co-location
Spatial Co-location A set of features
frequently co-located Given A set T of K
boolean spatial feature types Tf1,f2, , fk
A set P of N locations Pp1, , pN in a
spatial frame work S, pi? P is of some spatial
feature in T A neighbor relation R over
locations in S Find Tc ?subsets of T
frequently co-located Objective Correctness
Completeness Efficiency Constraints R
is symmetric and reflexive Monotonic
prevalence measure
Reference Feature Centric
Window Centric
Event Centric
20
Co-location
Comparison with association rules
  • Participation index
  • Participation index minpr(fi, c)
  • Participation ratio pr(fi, c) of feature fi in
    co-location c f1, f2, , fk
  • Fraction of instances of fi with feature f1,
    f2, f i-1, f i1,, fk nearby.

21
Spatial Co-location Patterns
  • Dataset
  • Spatial feature A,B,C and their instances
  • Possible associations are (A, B), (B, C), etc.
  • Neighbor relationship includes following pairs
  • A1, B1
  • A2, B1
  • A2, B2
  • B1, C1
  • B2, C2

22
Spatial Co-location Patterns
  • Partition approach Yasuhiko, KDD 2001
  • Support not well defined
  • i.e., not independent of execution trace
  • Has a fast heuristic which is hard to analyze
    for
  • correctness/completeness
  • Dataset

Spatial feature A,B, C, and their instances
Support (A,B) 2 (B,C)2
Support (A,B)1 (B,C)2
23
Spatial Co-location Patterns
  • Reference feature approach Han SSD 95
  • Use C as reference feature to get transactions
  • Transactions (B1) (B2)
  • Support (A,B) ?
  • Note Neighbor relationship includes following
  • pairs
  • A1, B1
  • A2, B1
  • A2, B2
  • B1, C1
  • B2, C2
  • Dataset

Spatial feature A,B, C, and their instances
24
Spatial Co-location Patterns
  • Our approach (Event Centric)
  • Neighborhood instead of transactions
  • Spatial join on neighbor relationship
  • Support
  • Participation index Min ( p_ratio )
  • P_ratio(A, (A,B)) fraction of instance of A
    participating in join(A,B, neighbor)
  • Examples
  • Support(A, B)min(3/2,3/2)1.5
  • Support(B, C)min(2/2,2/2)1
  • Dataset

Spatial feature A,B, C, and their instances
25
Spatial Co-location Patterns
  • Partition approach
  • Our approach
  • Dataset

Support(A,B)min(3/2,3/2)1.5
Support(B,C)min(2/2,2/2)1
Support A,B 2 B,C2
Spatial feature A,B, C, and their instances
  • Reference feature approach

C as reference feature Transactions (B1)
(B2) Support (A,B) ?
Support A,B1 B,C2
26
Case 3 Spatial Outliers Detection
  • Spatial Outlier A data point that is extreme
    relative to it neighbors

27
Application Domain Traffic Data
28
Spatial Outlier Detection
  • Given
  • A spatial framework SF consisting of locations
    s1, s2, , sn
  • An attribute function f si ? R
  • (R set of real numbers)
  • A neighborhood relationship N ? SF ? SF
  • A neighborhood aggregation function RN ?
    R
  • A difference function Fdiff R ? R ? R
  • Statistic test function ST R ? True, False
  • Test is based on Fdiff (f, (f, N)
  • Find
  • O vi vi ?V, vi is a spatial outlier
  • Objective
  • Correctness The attribute values of vi is
    extreme, compared with its neighbors
  • Computational efficiency

29
An example of Spatial outlier
30
Spatial Outlier Detection Zs(x) approach
Function
If
Declare x as a spatial outlier
31
Evaluation of Statistical Assumption
  • Distribution of traffic station attribute f(x)
    looks normal
  • Distribution of
    looks normal too!

32
Different Spatial Outlier Test
  • Spatial Statistic Approach
  • Scatter plot approach(Luc Anselin 94)
  • Moran scatter plot approach (Luc Anselin 95)
  • Variogram cloud approach (Graphic)

33
Scatter plot approach
  • Given
  • An attribute function f(x)
  • A neighborhood relationship N(x)
  • An aggregation function
  • A difference function
  • Fdiff ? E(x) (m ? f(x) b)
  • Detect spatial outlier by
  • Statistic test function
  • ST

34
Graphical Spatial Outlier Test
35
Graphical Spatial Tests
Original Data
36
A Unified Algorithm
  • Separate two phases
  • Model building
  • Testing (a node or a set of nodes)
  • Computation structure of model building
  • Key insights
  • Spatial self join using N(x) relationship
  • Algebraic aggregate functions can be computed in
    one disk scan of spatial join
  • Computation structure of testing
  • Single node spatial range query
  • Get-All-Neighbors(x) operation

37
An example Scatter plot
  • Model building
  • An attribute function f(x)
  • Neighborhood aggregate function
  • Distributive aggregate functions
  • Algebraic aggregate functions
  • where
    ,
  • Testing
  • Difference function
  • where
  • Statistic test function

38
Outlier Stations Detected
39
Outlier Station Detected
40
Conclusion and Future Directions
  • Spatial domains may not satisfy assumptions of
    classical methods
  • data auto-correlation, continuous geographic
    space
  • patterns global vs. local, e.g., outliers vs.
    spatial outliers
  • data exploration maps and albums
  • Open Issues
  • patterns hot-spots, spatial trends,
  • metrics spatial accuracy (predicted locations),
    spatial contiguity(clusters)
  • spatio-temporal dataset spatial-temporal
    outliers
  • scale and resolutions sentivity of patterns
  • geo-statistical confidence measure for mined
    patterns

41
Reference
  • S. Shekhar and Y. Huang, Discovering Spatial
    Co-location Patterns a Summary of Results, In
    Proc. of 7th International Symposium on Spatial
    and Temporal Databases (SSTD01), July 2001.
  • S. Shekhar, C.T. Lu, P. Zhang, "Detecting
    Graph-based Spatial Outliers Algorithms and
    Applications, the Seventh ACM SIGKDD
    International Conference on Knowledge Discovery
    and Data Mining, 2001.
  • S. Shekhar, C.T. Lu, P. Zhang, Detecting
    Graph-based Saptial Outlier, Intelligent Data
    Analysis, To appear in Vol. 6(3), 2002
  • S. Chawla, S. Shekhar, W. Wu and U. Ozesmi,
    Extending Data Mining for Spatial Applications
    A Case Study in Predicting Nest Locations, Proc.
    Int. Confi. on 2000 ACM SIGMOD Workshop on
    Research Issues in Data Mining and Knowledge
    Discovery (DMKD 2000), Dallas, TX, May 14, 2000.
  • S. Chawla, S. Shekhar, W. Wu and U. Ozesmi,
    Modeling Spatial Dependencies for Mining
    Geospatial Data, First SIAM International
    Conference on Data Mining, 2001.
  • S. Shekhar, Y. Huang, W. Wu, C.T. Lu, What's
    Spatial about Spatial Data Mining Three Case
    Studies , as Chapter of Book Data Mining for
    Scientific and Engineering Applications. V.
    Kumar, R. Grossman, C. Kamath, R. Namburu (eds.),
    Kluwer Academic Pub., 2001, ISBN 1-4020-0033-2
  • Shashi Shekhar and Yan Huang , Multi-resolution
    Co-location Miner a New Algorithm to Find
    Co-location Patterns in Spatial Datasets, Fifth
    Workshop on Mining Scientific Datasets (SIAM 2nd
    Data Mining Conference), April 2002

42
http//www.cs.umn.edu/research/shashi-group
Thank you !!!
Write a Comment
User Comments (0)
About PowerShow.com