Title: Data Mining in Spatial Databases: A MultiDisciplinary Promise
1Data Mining in Spatial Databases A
Multi-Disciplinary Promise
- Jiawei Han
- Database Systems Research Lab.
- Department of Computing Science
- University of Illinois at Urbana-Champaign
- http//www.cs.uiuc.edu/hanj
2Outline
- Why geo-spatial data mining?
- Spatial data mining major progress
- Spatial OLAP
- Spatial association
- Spatial classification
- Spatial clustering and outlier analysis
- Research challenges in spatial data mining
3Why Geo-Spatial Data Mining?
- Spatial data mining
- Mining interesting knowledge/patterns from huge
amount of spatial data - Necessity is the mother of invention
- Data explosion problem Data is overwhelming and
everywhereautomated data collection, satellite
images, remote sensing, GPS, mobile computing and
network technology, WWW, etc.) - Making data in use Data mining may lead to
important discoveries
4Spatial Data Mining vs. Traditional Spatial Data
Analysis
- Scalability and performance
- Handle gigabytes of data, interactive
exploration, multi-dimensional drilling/rolling,
visualization, ... - Tight integration of database systems and GIS
systems - Most of spatial/aspatial data have been stored in
relational database systems (e.g., Oracle,
MS/SQLServer, DB2, Informix), GIS (e.g., ArcInfo,
MapInfo), or data warehouses - Tight coupling and seamless integration
- Data cleaning, data integration, and data
consolidation - New methods and functionalities
- Association, sequential patterns, classification
methods, ...
5Spatial Data Mining Confluence of Multiple
Disciplines
Spatial DB System
Statistics
Spatial Data Mining
Machine Learning (AI)
Visualization
Geography
Mobile Computing
Remote Sensing
6Outline
- Why geo-spatial data mining?
- Spatial data mining major progress
- Spatial OLAP
- Spatial association
- Spatial classification
- Spatial clustering and outlier analysis
- Research challenges in spatial data mining
7Spatial Data MiningMajor Progress
- Geo-spatial data warehouse and spatial OLAP
- Spatial data classification/predictive modeling
- Spatial clustering/segmentation
- Spatial association and correlation analysis
- Spatial regression analysis
- Spatio-temporal pattern analysis
- Many more to be explored
8Spatial Data Warehousing
- Spatial data warehouse
- Integrated, subject-oriented, time-variant, and
nonvolatile spatial data repository for data
analysis - Spatial data integration a big issue
- Structure-specific formats (raster- vs.
vector-based, OO vs. relational models, different
storage and indexing, etc.) - Vendor-specific formats (ESRI, MapInfo,
Integraph, etc.) - Spatial data cube Multidimensional spatial
database - Both dimensions and measures may contain spatial
components
9Star Schema of the BC Weather Warehouse
- Spatial data warehouse
- Dimensions
- region_name
- time
- temperature
- precipitation
- Measurements
- region_map
- area
- count
Fact table
Dimension table
10Spatial OLAPOLAP on Map Data
11Dynamic Merging of Spatial Objects?
- Materializing (precomputing) all?too much
storage space - On-line merge?slow, expensive!
- A better way object-based, selective (partial)
materialization
12Spatial Association and Correlation Mining
What kind of objects are usually located close to
golf course?
FIND SPATIAL ASSOCIATION RULE DESCRIBING "Golf
Course" FROM Washington_Golf_courses,
Washington WHERE CLOSE_TO(Washington_Golf_courses.
Obj, Washington.Obj, "3 km") AND
Washington.CFCC ltgt "D81" IN RELEVANCE TO
Washington_Golf_courses.Obj, Washington.Obj, CFCC
SET SUPPORT THRESHOLD 0.5
13Efficient Mining of Spatial Associations
- Progressive refinement
- Hierarchy of spatial relationship
- g_close_to near_by, touch, intersect, contain,
etc. - First search for rough relationship and then
refine it - Rough spatial computation (as a filter)
- Using MBR or R-tree for rough estimation
- Detailed spatial algorithm (as refinement)
- Apply only to those objects which have passed
the rough spatial association test (no less than
min_support) - Micro-clustering and join indexing methods
14Spatial Classification and Model Construction
- Generalization- or clustering- based induction
- Interactive classification
15Can Typical Classification Methods Be Applied to
Spatial Classification?
- Decision-tree classification
- Entropy-based information-gain vs. Gini-index vs.
MDL - Tree pruning methods boosting/bagging
- Naïve-Bayesian classifier boosting
- Bayesian belief networks
- Neural network
- Genetic programming
- Nearest neighbor and case-based reasoning
- Support vector machine method
- Association-based multi-dimensional classification
16What Kind of Houses Are Highly Valued?Associative
Classification
C03
C08
C10
H
C04
H
C05
H
L
H
L
H
H
L
L
H
L
L
L
H
L
H
H
H
C01
H
H
H
H
H
H
H
L
H
H
C09
L
C02
L
H
L
H
L
H
H
Highway
C06
lake
17Grouping and Associating Spatial Features for
Classification
18Spatial Classification Typical Examples
- Mining volcanoes on Venus
- Training set provided by experts
- Model constructed can be used for prediction
- Finding stars in galaxies (JPL96)
- QuakeFinder
- Find earth quakes related to spatial info
19Spatial Trend Analysis
- Function
- Detect changes and trends along a spatial
dimension - Study the trend of non-spatial or spatial data
changing with space - Application examples
- Observe the trend of changes of the climate or
vegetation with the increasing distance from an
ocean - Crime rate or unemployment rate change with
regard to city geo-distribution
20Spatial Cluster Analysis
- Mining clustersk-means, k-medoids, hierarchical,
density-based, etc. - Analysis of distinct features of the clusters
21Density-Based Cluster analysis OPTICS Its
Applications
22Clustering and Distribution Density Functions
Density Attractor
23Center-Defined and Arbitrary Shaped
24STING A Statistical Information Grid Approach
- Wang, Yang and Muntz (VLDB97)
- Each cell stores statistical distribution of
measure at low level - Multi-level resolution
25WaveCluster
- G. Sheikholeslami, et al. (1998) Multiple wavelet
transformation-based cluster analysis
26Constraints-Based Clustering
- Constraints on individual objects
- Simple selection of such objects before
clustering - Clustering parameters as constraints
- K-means, density-based radius, min- of points
- Constraints imposed by physical obstacles
- Clustering with Obstructed Distance
- Constraints specified on clusters using SQL
aggregates - Sum of the profits in each cluster gt 1 million
- Average sales in each cluster gt 20 million s
- Min of golden customers (in each cluster) gt 1000
27Constraint-Based Clustering Planning ATM
Locations
C3
C2
Bridge
C1
River
Mountain
C4
Spatial data with obstacles
Clustering without taking obstacles into
consideration
28Clustering with Spatial Obstacles
Taking obstacles into account
Not Taking obstacles into account
29Towards Spatial Data Mining System An
Architecture
Graphic User Interface
Geo-OLAP Analyzer
Geo-Clustor
Geo-Classifier
Geo-Associator
Geo-Predictor
Future Modules
Future Modules
Spatial Database and Warehouse Server
meta data hierarchy
Non-Spatial DB
Spatial DB
30Outline
- Why geo-spatial data mining?
- Spatial data mining major progress
- Spatial OLAP
- Spatial association
- Spatial classification
- Spatial clustering and outlier analysis
- Research challenges in spatial data mining
31Research Challenges in Spatial Data Mining
- Mining temporal spatial data
- Mining spatial-related stream data
- Spatial data mining applications (land use,
bio-medical)
32Conclusions
- Spatial data mining vs. traditional spatial
analysis - Scalability, architecture, functions, methods
- Good progress has been made on spatial data
mining - OLAP, association, clustering, classification,
outlier analysis, etc. - Still lots to be done! Young and promising
direction - Joint efforts (from multiple disciplines) lead to
joyous promises!
33http//www.cs.uiuc.edu/hanj
34Some References on Spatial Data Mining
- H. Miller and J. Han (eds.), Geographic Data
Mining and Knowledge Discovery, Taylor and
Francis, 2001. - Ester M., Frommelt A., Kriegel H.-P., Sander J.
Spatial Data Mining Database Primitives,
Algorithms and Efficient DBMS Support, Data
Mining and Knowledge Discovery, an International
Journal. 4, 2000, pp. 193-216. - J. Han, M. Kamber, and A. K. H. Tung, "Spatial
Clustering Methods in Data Mining A Survey", in
H. Miller and J. Han (eds.), Geographic Data
Mining and Knowledge Discovery, Taylor and
Francis, 2000. - Y. Bedard, T. Merrett, and J. Han, "Fundamentals
of Geospatial Data Warehous ing for Geo-graphic
Knowledge Discovery", in H. Miller and J. Han
(eds.), Geographic Data Mining and Knowledge
Discovery, Taylor and Francis, 2000