Data Mining in Spatial Databases: A MultiDisciplinary Promise - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Data Mining in Spatial Databases: A MultiDisciplinary Promise

Description:

Mining interesting knowledge/patterns from huge amount of spatial data ... Tree pruning methods: boosting/bagging. Na ve-Bayesian classifier boosting ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 35
Provided by: jiaw209
Category:

less

Transcript and Presenter's Notes

Title: Data Mining in Spatial Databases: A MultiDisciplinary Promise


1
Data Mining in Spatial Databases A
Multi-Disciplinary Promise
  • Jiawei Han
  • Database Systems Research Lab.
  • Department of Computing Science
  • University of Illinois at Urbana-Champaign
  • http//www.cs.uiuc.edu/hanj

2
Outline
  • Why geo-spatial data mining?
  • Spatial data mining major progress
  • Spatial OLAP
  • Spatial association
  • Spatial classification
  • Spatial clustering and outlier analysis
  • Research challenges in spatial data mining

3
Why Geo-Spatial Data Mining?
  • Spatial data mining
  • Mining interesting knowledge/patterns from huge
    amount of spatial data
  • Necessity is the mother of invention
  • Data explosion problem Data is overwhelming and
    everywhereautomated data collection, satellite
    images, remote sensing, GPS, mobile computing and
    network technology, WWW, etc.)
  • Making data in use Data mining may lead to
    important discoveries

4
Spatial Data Mining vs. Traditional Spatial Data
Analysis
  • Scalability and performance
  • Handle gigabytes of data, interactive
    exploration, multi-dimensional drilling/rolling,
    visualization, ...
  • Tight integration of database systems and GIS
    systems
  • Most of spatial/aspatial data have been stored in
    relational database systems (e.g., Oracle,
    MS/SQLServer, DB2, Informix), GIS (e.g., ArcInfo,
    MapInfo), or data warehouses
  • Tight coupling and seamless integration
  • Data cleaning, data integration, and data
    consolidation
  • New methods and functionalities
  • Association, sequential patterns, classification
    methods, ...

5
Spatial Data Mining Confluence of Multiple
Disciplines
Spatial DB System
Statistics
Spatial Data Mining
Machine Learning (AI)
Visualization
Geography
Mobile Computing
Remote Sensing
6
Outline
  • Why geo-spatial data mining?
  • Spatial data mining major progress
  • Spatial OLAP
  • Spatial association
  • Spatial classification
  • Spatial clustering and outlier analysis
  • Research challenges in spatial data mining

7
Spatial Data MiningMajor Progress
  • Geo-spatial data warehouse and spatial OLAP
  • Spatial data classification/predictive modeling
  • Spatial clustering/segmentation
  • Spatial association and correlation analysis
  • Spatial regression analysis
  • Spatio-temporal pattern analysis
  • Many more to be explored

8
Spatial Data Warehousing
  • Spatial data warehouse
  • Integrated, subject-oriented, time-variant, and
    nonvolatile spatial data repository for data
    analysis
  • Spatial data integration a big issue
  • Structure-specific formats (raster- vs.
    vector-based, OO vs. relational models, different
    storage and indexing, etc.)
  • Vendor-specific formats (ESRI, MapInfo,
    Integraph, etc.)
  • Spatial data cube Multidimensional spatial
    database
  • Both dimensions and measures may contain spatial
    components

9
Star Schema of the BC Weather Warehouse
  • Spatial data warehouse
  • Dimensions
  • region_name
  • time
  • temperature
  • precipitation
  • Measurements
  • region_map
  • area
  • count

Fact table
Dimension table
10
Spatial OLAPOLAP on Map Data
11
Dynamic Merging of Spatial Objects?
  • Materializing (precomputing) all?too much
    storage space
  • On-line merge?slow, expensive!
  • A better way object-based, selective (partial)
    materialization

12
Spatial Association and Correlation Mining
What kind of objects are usually located close to
golf course?
FIND SPATIAL ASSOCIATION RULE DESCRIBING "Golf
Course" FROM Washington_Golf_courses,
Washington WHERE CLOSE_TO(Washington_Golf_courses.
Obj, Washington.Obj, "3 km") AND
Washington.CFCC ltgt "D81" IN RELEVANCE TO
Washington_Golf_courses.Obj, Washington.Obj, CFCC
SET SUPPORT THRESHOLD 0.5
13
Efficient Mining of Spatial Associations
  • Progressive refinement
  • Hierarchy of spatial relationship
  • g_close_to near_by, touch, intersect, contain,
    etc.
  • First search for rough relationship and then
    refine it
  • Rough spatial computation (as a filter)
  • Using MBR or R-tree for rough estimation
  • Detailed spatial algorithm (as refinement)
  • Apply only to those objects which have passed
    the rough spatial association test (no less than
    min_support)
  • Micro-clustering and join indexing methods

14
Spatial Classification and Model Construction
  • Generalization- or clustering- based induction
  • Interactive classification

15
Can Typical Classification Methods Be Applied to
Spatial Classification?
  • Decision-tree classification
  • Entropy-based information-gain vs. Gini-index vs.
    MDL
  • Tree pruning methods boosting/bagging
  • Naïve-Bayesian classifier boosting
  • Bayesian belief networks
  • Neural network
  • Genetic programming
  • Nearest neighbor and case-based reasoning
  • Support vector machine method
  • Association-based multi-dimensional classification

16
What Kind of Houses Are Highly Valued?Associative
Classification
C03
C08
C10
H
C04
H
C05
H
L
H
L
H
H
L
L
H
L
L
L
H
L
H
H
H
C01
H
H
H
H
H
H
H
L
H
H
C09
L
C02
L
H
L
H
L
H
H
Highway
C06
lake
17
Grouping and Associating Spatial Features for
Classification
18
Spatial Classification Typical Examples
  • Mining volcanoes on Venus
  • Training set provided by experts
  • Model constructed can be used for prediction
  • Finding stars in galaxies (JPL96)
  • QuakeFinder
  • Find earth quakes related to spatial info

19
Spatial Trend Analysis
  • Function
  • Detect changes and trends along a spatial
    dimension
  • Study the trend of non-spatial or spatial data
    changing with space
  • Application examples
  • Observe the trend of changes of the climate or
    vegetation with the increasing distance from an
    ocean
  • Crime rate or unemployment rate change with
    regard to city geo-distribution

20
Spatial Cluster Analysis
  • Mining clustersk-means, k-medoids, hierarchical,
    density-based, etc.
  • Analysis of distinct features of the clusters

21
Density-Based Cluster analysis OPTICS Its
Applications
22
Clustering and Distribution Density Functions
Density Attractor
23
Center-Defined and Arbitrary Shaped
24
STING A Statistical Information Grid Approach
  • Wang, Yang and Muntz (VLDB97)
  • Each cell stores statistical distribution of
    measure at low level
  • Multi-level resolution

25
WaveCluster
  • G. Sheikholeslami, et al. (1998) Multiple wavelet
    transformation-based cluster analysis

26
Constraints-Based Clustering
  • Constraints on individual objects
  • Simple selection of such objects before
    clustering
  • Clustering parameters as constraints
  • K-means, density-based radius, min- of points
  • Constraints imposed by physical obstacles
  • Clustering with Obstructed Distance
  • Constraints specified on clusters using SQL
    aggregates
  • Sum of the profits in each cluster gt 1 million
  • Average sales in each cluster gt 20 million s
  • Min of golden customers (in each cluster) gt 1000

27
Constraint-Based Clustering Planning ATM
Locations
C3
C2
Bridge
C1
River
Mountain
C4
Spatial data with obstacles
Clustering without taking obstacles into
consideration
28
Clustering with Spatial Obstacles
Taking obstacles into account
Not Taking obstacles into account
29
Towards Spatial Data Mining System An
Architecture
Graphic User Interface
Geo-OLAP Analyzer
Geo-Clustor
Geo-Classifier
Geo-Associator
Geo-Predictor
Future Modules
Future Modules
Spatial Database and Warehouse Server
meta data hierarchy
Non-Spatial DB
Spatial DB
30
Outline
  • Why geo-spatial data mining?
  • Spatial data mining major progress
  • Spatial OLAP
  • Spatial association
  • Spatial classification
  • Spatial clustering and outlier analysis
  • Research challenges in spatial data mining

31
Research Challenges in Spatial Data Mining
  • Mining temporal spatial data
  • Mining spatial-related stream data
  • Spatial data mining applications (land use,
    bio-medical)

32
Conclusions
  • Spatial data mining vs. traditional spatial
    analysis
  • Scalability, architecture, functions, methods
  • Good progress has been made on spatial data
    mining
  • OLAP, association, clustering, classification,
    outlier analysis, etc.
  • Still lots to be done! Young and promising
    direction
  • Joint efforts (from multiple disciplines) lead to
    joyous promises!

33
http//www.cs.uiuc.edu/hanj
  • Thank you !!!

34
Some References on Spatial Data Mining
  • H. Miller and J. Han (eds.), Geographic Data
    Mining and Knowledge Discovery, Taylor and
    Francis, 2001.
  • Ester M., Frommelt A., Kriegel H.-P., Sander J.
    Spatial Data Mining Database Primitives,
    Algorithms and Efficient DBMS Support, Data
    Mining and Knowledge Discovery, an International
    Journal. 4, 2000, pp. 193-216.
  • J. Han, M. Kamber, and A. K. H. Tung, "Spatial
    Clustering Methods in Data Mining A Survey", in
    H. Miller and J. Han (eds.), Geographic Data
    Mining and Knowledge Discovery, Taylor and
    Francis, 2000.
  • Y. Bedard, T. Merrett, and J. Han, "Fundamentals
    of Geospatial Data Warehous ing for Geo-graphic
    Knowledge Discovery", in H. Miller and J. Han
    (eds.), Geographic Data Mining and Knowledge
    Discovery, Taylor and Francis, 2000
Write a Comment
User Comments (0)
About PowerShow.com