Spatial Congeries Pattern Mining - PowerPoint PPT Presentation

1 / 52

About This Presentation

Title:

Spatial Congeries Pattern Mining

Description:

Neighboring class set mining and co-location pattern mining problem are introduced. Spatial Congeries pattern mining is formulated and provided with two Apriori ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 53

Provided by: HKUC

Category:

more less

Transcript and Presenter's Notes

Title: Spatial Congeries Pattern Mining

1
Spatial Congeries Pattern Mining

Presented by Iris Zhang
Supervisor Dr. David Cheung
24 October 2003

2
Outline

Introduction
Motivation
Related work
Formal definition
Algorithms
Experiments
Conclusion

3
Introduction

KDD
Discovery of interesting, implicit, and
previously unknown knowledge from large databases
FPM91
Spatial data mining
Extraction of implicit knowledge, spatial
relations, or other patterns not explicitly
stored in spatial databases KH95

4
Feature of Spatial Data Mining

Spatial autocorrelation
Everything is related to everything else but
nearby things are more related than distant
things (Tobler, 1979)
Spatial heterogeneity
The variation in spatial data is a function of
location

5
Motivation

A famous historical example
In 1909, the residents of Colorado Springs were
discovered that they had healthy teeth and the
local drinking water had high level of fluoride.
Researchers confirmed the positive role of
fluoride in controlling tooth decay.
healthy teeth, high level of fluoride

6
Motivation (Cont)

Another case

HSX02
7
Related work

Neighboring Class Sets Mining
Co-location Pattern Mining

8
Neighboring Class Sets

Access records of mobile services

9
Neighboring Class Sets

Neighboring class sets
((timetable,ticket),4),
((timetable,weather)3),
((ticket,weather),2),
((timetable,ticket,weather),2)

Mor01
10
Neighboring Class Sets

Grouping of points

Mor01
11
Neighboring Class Sets

Grouping of points

Mor01
12
Neighboring Class Sets

Grouping of points

Mor01
13
Neighboring Class Sets

Apriori generation of valid instances

Mor01
14
Problems

Undercount the number of instances
Depend on the order of classes to generate
instances for k-neighboring class set (kgt2)
Provide an absolute number to be support threshold

15
Co-location Patterns Mining

Co-location a subset of Boolean features
E.g. (drought, EL Nino, substantial increase in
vegetation, extremely high precipitation)

16
Co-location Patterns Mining

Row instance I i1,i2,,ik of a co-location
Cf1,f2,,fk
ij is an instance of fj (j 1,2,k)
ip and iq are neighbors to each other
(A.1,B.1) is a row instance of co-location A,B
Table instance T of C is the set of all row
instances of C
(A.1,B.1), (A.2,B.4), (A.3,B.4) is table
instance of A,B

17
Co-location Patterns Mining

Participant ratio for feature fi
Pr(A,B,A3/475, Pr(A,B,B2/540
Participant index of a co-location C
Pi(A,B)min(0.75,0.4)0.4

18
Co-location Pattern Mining

Co-location rule C1?C2(p,cp)
C1 and C2 are co-locations
C1 ? C2 ?
p participant index, cp conditional probability
A?B(40, 75)
Conditional probability of a co-location rule

19
Co-location Patterns Mining

Apriori-property
Participant index is monotonically non-increasing
as the size of the co-location increasing
Apriori-like mining algorithm
Candidate generation
Instances generation

20
Co-location Patterns Mining

Candidate generation
Join
Prune

21
Co-location Patterns Mining

Instance generation
Geometric approach
Rtree join
Combinatorial approach
Sort-merge join
Hybrid approach
Rtree join to get instances for size 2
co-location
Sort-merge join to get instances for size k(kgt2)
co-location

22
Co-location Patterns Mining

Example

The participant index measure may overate some
co-location
The features are binary

Pr(A,B,A)2/825 Pr(A,B,B)6/6100 Pi(A,B)
min(25,100)25 B?A(25,
100) A?B(25, 25) Probability(A,B)7/(86)
?15
24
Spatial Congeries Patterns Mining

Input
D D1,D2,,Dn
Spatial relation to regulate the relation of
objects in patterns
min_fre threshold to determine whether an itemset
is frequent
Output
Complete set of Spatial Congeries patterns

25
Spatial Congeries Patterns Mining

Example of datasets

Attribute values can be translated to
categorical values VD10 WDshallow DOP near
NLexistent can be a pattern
26
Formal Definition

Item an attribute value in a dataset. I is the
set of all items.
E.g. water depth shallow
Itemset subset of I
E.g. VD10 WDshallow DOP near Nexistent
E.g. VD100 WDdepth DOPfar Nabsent

27
Formal Definition

Spatial relation rule to regulate the spatial
relation of objects in patterns
Instances of an item i points which has
attribute value i
Instances of an itemset if instances of all
items in the itemset satisfy the spatial
relation, the combination of these instances is
an instance of the itemset.

28
Observation

The number of instances of itemsets is not
monotonically non-increasing
E.g. an instance of triangle, circle can
construct two instances of triangle, circle,
rectangle
Conclusion the number of instances of an itemset
can be used to be the measure to determine
whether the itemset is a pattern

29
Formal Definition

Frequency of an itemset
Number of instances of the itemset over all
possible combinations of instances of items
E.g. Frequency(A,B)7/(86)?15

30
Formal Definition

Spatial Congeries pattern
If the frequency of an itemset is no less than
frequency threshold min_fre, the itemset is a
Spatial Congeries pattern.

31
Property of Frequency

Lemma the frequency of an itemset is
monotonically non-increasing with the size of the
itemset increasing.
Proof (simplified)
For size k-1 itemset Ik-1 v1, v2,, vk-1 and
size k itemset Ik v1, v2,, vk-1,vk

mq is the number of instances of Iq nq is the
number of instances of item vq.
32
Algorithm-1

Step 1 generate complete set of size 2 patterns
by Rtree-join on complete Rtrees

33
Algorithm-1

Step 1 generate complete set of size 2 patterns
by Rtree-join on complete Rtrees

34
Algorithm-1

Step 1 generate complete set of size 2 patterns
by Rtree-join on complete Rtrees

35
Algorithm-1

Step 1 generate complete set of size 2 patterns
by Rtree-join on complete Rtrees

36
Algorithm-1

Step 2generate size k (kgt2) patterns level by
level
Generate size k (kgt2) candidates
Join two size k-1 patterns
Prune those candidates which have subsets that
are not frequent
Generate size k (kgt2) instances

37
Sample
Square a1 Triangle a2 Circle b1 Diamond c1
38
Process of Algorithm-1

RJ to find the instances of size 2 candidates
Build Rtree for each dataset A, B and C
Do RJ find the instances of size 2 candidates
ma1b1 5, ma2b1 3, ma1c1 2, ma2c1 0, mb1c1
0
Get size 2 patterns a1b1, a2b1,a1c1 according to
the frequency threshold 50
fa1b1 5/(33) ? 56, fa2b1 3/(23) 50,
fa1c1 2/(31) ? 67, fa2c1 0
fb1c1 0

39
Process of Algorithm-1

Sort-merge-join to find the instances of size k
(kgt2) candidates
Generate size 3 candidates
Join size 2 pattern a1b1 and a1c1 to form a1b1c1
Prune a1b1c1 because b1c1 is not a pattern
Get size 3 patterns ( there is no size 3
patterns)

40
Algorithm-2

Step 1generate all patterns for a combination of
subsets. Each subset corresponds to an item. All
points in the subset have the item as their
attribute value.
E.g. The first combination is a1b1c1. It needs
to build rtrees for subsets of a1, b1, c1 in
order to generate size 2 patterns. Then it do
sort-merge join to generate size k(kgt2) patterns.
Step 2 generate all patterns for another
combination until there is no combination
E.g. The second combination is a2b1c1.

41
Process of Algorithm-2

Generate patterns for combination a1b1c1
RJ on Rtrees for a1, b1 and c1 to get instances
of candidates a1b1, a1c1, b1c1
Suppose a1b1 and a1c1 are patterns, size 3
candidates is a1b1c1
Sort-merge-join to get instances of a1b1c1
Generate patterns for combination a2b1c1
RJ on Rtrees for a2, b1, c1 to get instances of
candidates a2b1 and a2c1. Because the instances
of b1c1 has been generated, there is no need to
do it again
Suppose a2b1 is pattern
There is no size 3 candidate

42
Experiment

Environment
CPU type Pentium III Xeon 700MHz
RAM 4096M
Dataset
Synthetic dataset with Gauss distribution
No. of clusters 5
Map size 800
E.g. (622, 478, 5) is a point in a dataset

43
Experiment-1
No. of Datasets 3 No. of Attribute Values
5 Distance threshold 100 Frequency threshold
0.01
44
Experiment-1
No. of Datasets 3 No. of Attribute Values
5 Distance threshold 100 Frequency threshold
0.01
45
Experiment-1
No. of Datasets 3 No. of Attribute Values
5 Distance threshold 100 Frequency threshold
0.01
46
Experiment-2
No. of Points in each datasets 1000 No. of
Attribute Values 5 Distance threshold
100 Frequency threshold 0.01
47
Experiment-3
No. of Datasets 5 No. of Points in each
datasets 1000 No. of Attribute Values
5 Distance threshold 100
48
Experiment-4
No. of Datasets 3 No. of Points in each
datasets 1000 No. of Attribute Values
5 Frequency threshold 0.01
49
Conclusions

Neighboring class set mining and co-location
pattern mining problem are introduced
Spatial Congeries pattern mining is formulated
and provided with two Apriori-like mining
algorithms
Future work
More pruning methods should be used to reduce the
time and space requirement
The experiments should be done on real datasets

50
References

HSX02 Huang Y., Shekhar S., Xiong H.
Discovering Co-location Patterns from Spatial
Datasets A General Approach. Submited to IEEE
TKED (under second round review)
HXSP03 Huang Y., Xiong H., Shekar S., Pei J.
Mining Confident Co-location Rules without A
Support Threshold. Proc. of 18th ACM Symposium on
Applied Computing (ACM SAC), 2003
Mor01 Morimoto Y. Mining Frequent Neighboring
Class Sets in Spatial Databases. Proc. of ACM
SIGKDD Int. Conf. on Knowledge Discovery and Data
Mining, 2001.