Spatial Data Mining: Three Case Studies - PowerPoint PPT Presentation

About This Presentation
Title:

Spatial Data Mining: Three Case Studies

Description:

Distribution of base attribute: spatially smooth ... Attribute value is normally distributed. Computation cost dominated by I/O op. ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 19
Provided by: sha7167
Category:

less

Transcript and Presenter's Notes

Title: Spatial Data Mining: Three Case Studies


1
Spatial Data MiningThree Case Studies
For additional details www.cs.umn.edu/shekhar/p
roblems.html
Shashi Shekhar, University of Minnesota Presented
to UCGIS Summer Assembly 2001
2
Background
  • NSF workshop on GIS and DM (3/99)
  • Spatial data1, 8 - traffic, bird habitats,
    global climate, logistics, ...
  • For spatial patterns - outliers, location
    prediction, associations, sequential
    associations, trends,

3
Framework
  • Problem statement capture special needs
  • Data exploration maps, new methods
  • Try reusing classical methods
  • from data mining, spatial statistics
  • If reuse is not possible, invent new methods
  • Validation, Performance tuning

4
Case 1 Spatial Outliers
  • Problem stations different from neighbors
    SIGKDD 2001
  • Data - space-time plot, distr. Of f(x), S(x)
  • Distribution of base attribute
  • spatially smooth
  • frequency distribution over value domain normal
  • Classical test - Pr.item in population is low
  • Q? distribution of diff.f(x), neighborhood
    aggf(x)
  • Insight this statistic is distributed normally!
  • Test (z-score on the statistics) gt 2
  • Performance - spatial join, clustering methods

5
Spatial outlier detection4
  • Spatial outlier
  • A data point that is extreme relative to
  • it neighbors
  • Given
  • A spatial graph GV,E
  • A neighbor relationship (K neighbors)
  • An attribute function f V -gt R
  • An aggregation function f aggr R k -gt R
  • Confidence level threshold ?
  • Find
  • O vi vi ?V, vi is a spatial outlier
  • Objective
  • Correctness The attribute values of vi
  • is extreme, compared with its
    neighbors
  • Computational efficiency
  • Constraints
  • Attribute value is normally distributed
  • Computation cost dominated by I/O op.

6
Spatial outlier detection
  • Spatial Outlier Detection Test
  • 1. Choice of Spatial Statistic
  • S(x) f(x)E y? N(x)(f(y))
  • Theorem S(x) is normally distributed
  • if f(x) is normally
    distributed
  • 2. Test for Outlier Detection
  • (S(x) - ?s) / ?s gt ?
  • Hypothesis
  • I/O cost determined by clustering efficiency

f(x)
S(x)
Spatial outlier and its neighbors
7
Spatial outlier detection
  • Results
  • 1. CCAM achieves higher clustering efficiency
    (CE)
  • 2. CCAM has lower I/O cost
  • 3. Higher CE leads to lower
  • I/O cost
  • 4. Page size improves CE for
  • all methods

I/O cost
CE value
Cell-Tree
Z-order
CCAM
8
Case 2 Location Prediction
  • Citations SIAM DM Conf. 2001, SIGKDD DMKD 2000
  • Problem predict nesting site in marshes
  • given vegetation, water depth, distance to edge,
    etc.
  • Data - maps of nests and attributes
  • spatially clustered nests, spatially smooth
    attributes
  • Classical method logistic regression, decision
    trees, bayesian classifier
  • but, independence assumption is violated ! Misses
    auto-correlation !
  • Spatial auto-regression (SAR), Markov random
    field bayesian classifier
  • Open issues spatial accuracy vs. classification
    accurary
  • Open issue performance - SAR learning is slow!

9
Location Prediction6, 7, 8
  • Given
  • 1. Spatial Framework
  • 2. Explanatory functions
  • 3. A dependent function
  • 4. A family of function mappings
  • Find A function
  • Objectivemaximize
  • classification_accuracy
  • Constraints
  • Spatial Autocorrelation exists


Nest locations
Distance to open water
Water depth
Vegetation durability
10
Evaluation Changing Model
  • Linear Regression
  • Spatial Regression
  • Spatial model is better

11
Evaluation Changing measure
New measure
12
Case 3 Spatial Association Rules
  • Citation Symp. On Spatial Databases 2001
  • Problem Given a set of boolean spatial features
  • find subsets of co-located features, e.g. (fire,
    drought, vegetation)
  • Data - continuous space, partition not natural,
    no reference feature
  • Classical data mining approach association rules
  • But, Look Ma! No Transactions!!! No support
    measure!
  • Approach Work with continuous data without
    transactionizing it!
  • confidence Pr.fire at s drought in N(s) and
    vegetation in N(s)
  • support cardinality of spatial join of instances
    of fire, drought, dry veg.
  • participation min. fraction of instances of a
    features in join result
  • new algorithm using spatial joins and apriori_gen
    filters

13
Co-location Patterns2, 3
Can you find co-location patterns from the
following sample dataset?
Answers and
14
Co-location Patterns
Can you find co-location patterns from the
following sample dataset?
15
Co-location Patterns
  • Spatial Co-location
  • A set of features frequently co-located
  • Given
  • A set T of K boolean spatial feature types
    Tf1,f2, , fk
  • A set P of N locations Pp1, , pN in a
    spatial frame work S, pi? P is of some spatial
    feature in T
  • A neighbor relation R over locations in S
  • Find
  • Tc ?subsets of T frequently co-located
  • Objective
  • Correctness
  • Completeness
  • Efficiency
  • Constraints
  • R is symmetric and reflexive
  • Monotonic prevalence measure

Reference Feature Centric
Window Centric
Event Centric
16
Co-location Patterns
Comparison with association rules
Association rules Co-location rules
underlying space discrete sets continuous space
item-types item-types events /Boolean spatial features
collections transactions neighborhoods
prevalence measure support participation index
conditional probability measure Pr. A in T B in T Pr. A in N(L) B at L
  • Participation index
  • Participation ratio pr(fi, c) of feature fi in
    co-location c f1, f2, , fk fraction of
    instances of fi with
  • feature f1, , fi-1, fi1, , fk nearby
    2.Participation index minpr(fi, c)
  • Algorithm
  • Hybrid Co-location Miner

17
Conclusions Future Directions
  • Spatial domains may not satisfy assumptions of
    classical methods
  • data auto-correlation, continuous geographic
    space
  • patterns global vs. local, e.g. spatial outliers
    vs. outliers
  • data exploration maps and albums
  • Open Issues
  • patterns hot-spots, blobology (shape), spatial
    trends,
  • metrics spatial accuracy(predicted locations),
    spatial contiguity(clusters)
  • spatio-temporal dataset
  • scale and resolutions sentivity of patterns
  • geo-statistical confidence measure for mined
    patterns

18
References
  • S. Shekhar, S. Chawla, S. Ravada, A. Fetterer, X.
    Liu and C.T. Liu, Spatial Databases
    Accomplishments and Research Needs, IEEE
    Transactions on Knowledge and Data Engineering,
    Jan.-Feb. 1999.
  • S. Shekhar and Y. Huang, Discovering Spatial
    Co-location Patterns a Summary of Results, In
    Proc. of 7th International Symposium on Spatial
    and Temporal Databases (SSTD01), July 2001.
  • S. Shekhar, Y. Huang, and H. Xiong, Performance
    Evaluation of Co-location Miner, the IEEE
    International Conference on Data Mining
    (ICDM01), Nov. 2001. (submitted)
  • S. Shekhar, C.T. Lu, P. Zhang, "Detecting
    Graph-based Spatial Outliers Algorithms and
    Applications, the Seventh ACM SIGKDD
    International Conference on Knowledge Discovery
    and Data Mining, 2001.
  • S. Shekhar, S. Chawla, the book Spatial
    Database Concepts, Implementation and Trends.
    (To be published in 2001)
  • S. Chawla, S. Shekhar, W. Wu and U. Ozesmi,
    Extending Data Mining for Spatial Applications
    A Case Study in Predicting Nest Locations,
    Proc. Int. Confi. on 2000 ACM SIGMOD Workshop on
    Research Issues in Data Mining and Knowledge
    Discovery (DMKD 2000), Dallas, TX, May 14,
    2000.
  • S. Chawla, S. Shekhar, W. Wu and U. Ozesmi,
    Modeling Spatial Dependencies for Mining
    Geospatial Data, First SIAM International
    Conference on Data Mining, 2001.
  • S. Shekhar, P.R. Schrater, R. R. Vatsavai, W. Wu,
    and S. Chawla, Spatial Contextual Classification
    and Prediction Models for Mining Geospatial
    Data, IEEE Transactions on Multimedia, 2001.
    (Submitted)

Some papers are available on the Web sites
http//www.cs.umn.edu/research/shashi-group/
Write a Comment
User Comments (0)
About PowerShow.com