Spatial Congeries Pattern Mining - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Spatial Congeries Pattern Mining

Description:

Neighboring class set mining and co-location pattern mining problem are introduced. Spatial Congeries pattern mining is formulated and provided with two Apriori ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 53
Provided by: HKUC
Category:

less

Transcript and Presenter's Notes

Title: Spatial Congeries Pattern Mining


1
Spatial Congeries Pattern Mining
  • Presented by Iris Zhang
  • Supervisor Dr. David Cheung
  • 24 October 2003

2
Outline
  • Introduction
  • Motivation
  • Related work
  • Formal definition
  • Algorithms
  • Experiments
  • Conclusion

3
Introduction
  • KDD
  • Discovery of interesting, implicit, and
    previously unknown knowledge from large databases
    FPM91
  • Spatial data mining
  • Extraction of implicit knowledge, spatial
    relations, or other patterns not explicitly
    stored in spatial databases KH95

4
Feature of Spatial Data Mining
  • Spatial autocorrelation
  • Everything is related to everything else but
    nearby things are more related than distant
    things (Tobler, 1979)
  • Spatial heterogeneity
  • The variation in spatial data is a function of
    location

5
Motivation
  • A famous historical example
  • In 1909, the residents of Colorado Springs were
    discovered that they had healthy teeth and the
    local drinking water had high level of fluoride.
    Researchers confirmed the positive role of
    fluoride in controlling tooth decay.
  • healthy teeth, high level of fluoride

6
Motivation (Cont)
  • Another case

HSX02
7
Related work
  • Neighboring Class Sets Mining
  • Co-location Pattern Mining

8
Neighboring Class Sets
  • Access records of mobile services

9
Neighboring Class Sets
  • Neighboring class sets
  • ((timetable,ticket),4),
  • ((timetable,weather)3),
  • ((ticket,weather),2),
  • ((timetable,ticket,weather),2)

Mor01
10
Neighboring Class Sets
  • Grouping of points

Mor01
11
Neighboring Class Sets
  • Grouping of points

Mor01
12
Neighboring Class Sets
  • Grouping of points

Mor01
13
Neighboring Class Sets
  • Apriori generation of valid instances

Mor01
14
Problems
  • Undercount the number of instances
  • Depend on the order of classes to generate
    instances for k-neighboring class set (kgt2)
  • Provide an absolute number to be support threshold

15
Co-location Patterns Mining
  • Co-location a subset of Boolean features
  • E.g. (drought, EL Nino, substantial increase in
    vegetation, extremely high precipitation)

16
Co-location Patterns Mining
  • Row instance I i1,i2,,ik of a co-location
    Cf1,f2,,fk
  • ij is an instance of fj (j 1,2,k)
  • ip and iq are neighbors to each other
  • (A.1,B.1) is a row instance of co-location A,B
  • Table instance T of C is the set of all row
    instances of C
  • (A.1,B.1), (A.2,B.4), (A.3,B.4) is table
    instance of A,B

17
Co-location Patterns Mining
  • Participant ratio for feature fi
  • Pr(A,B,A3/475, Pr(A,B,B2/540
  • Participant index of a co-location C
  • Pi(A,B)min(0.75,0.4)0.4

18
Co-location Pattern Mining
  • Co-location rule C1?C2(p,cp)
  • C1 and C2 are co-locations
  • C1 ? C2 ?
  • p participant index, cp conditional probability
  • A?B(40, 75)
  • Conditional probability of a co-location rule

19
Co-location Patterns Mining
  • Apriori-property
  • Participant index is monotonically non-increasing
    as the size of the co-location increasing
  • Apriori-like mining algorithm
  • Candidate generation
  • Instances generation

20
Co-location Patterns Mining
  • Candidate generation
  • Join
  • Prune

21
Co-location Patterns Mining
  • Instance generation
  • Geometric approach
  • Rtree join
  • Combinatorial approach
  • Sort-merge join
  • Hybrid approach
  • Rtree join to get instances for size 2
    co-location
  • Sort-merge join to get instances for size k(kgt2)
    co-location

22
Co-location Patterns Mining
  • Example

23
  • The participant index measure may overate some
    co-location
  • The features are binary

Pr(A,B,A)2/825 Pr(A,B,B)6/6100 Pi(A,B)
min(25,100)25 B?A(25,
100) A?B(25, 25) Probability(A,B)7/(86)
?15
24
Spatial Congeries Patterns Mining
  • Input
  • D D1,D2,,Dn
  • Spatial relation to regulate the relation of
    objects in patterns
  • min_fre threshold to determine whether an itemset
    is frequent
  • Output
  • Complete set of Spatial Congeries patterns

25
Spatial Congeries Patterns Mining
  • Example of datasets

Attribute values can be translated to
categorical values VD10 WDshallow DOP near
NLexistent can be a pattern
26
Formal Definition
  • Item an attribute value in a dataset. I is the
    set of all items.
  • E.g. water depth shallow
  • Itemset subset of I
  • E.g. VD10 WDshallow DOP near Nexistent
  • E.g. VD100 WDdepth DOPfar Nabsent

27
Formal Definition
  • Spatial relation rule to regulate the spatial
    relation of objects in patterns
  • Instances of an item i points which has
    attribute value i
  • Instances of an itemset if instances of all
    items in the itemset satisfy the spatial
    relation, the combination of these instances is
    an instance of the itemset.

28
Observation
  • The number of instances of itemsets is not
    monotonically non-increasing
  • E.g. an instance of triangle, circle can
    construct two instances of triangle, circle,
    rectangle
  • Conclusion the number of instances of an itemset
    can be used to be the measure to determine
    whether the itemset is a pattern

29
Formal Definition
  • Frequency of an itemset
  • Number of instances of the itemset over all
    possible combinations of instances of items
  • E.g. Frequency(A,B)7/(86)?15

30
Formal Definition
  • Spatial Congeries pattern
  • If the frequency of an itemset is no less than
    frequency threshold min_fre, the itemset is a
    Spatial Congeries pattern.

31
Property of Frequency
  • Lemma the frequency of an itemset is
    monotonically non-increasing with the size of the
    itemset increasing.
  • Proof (simplified)
  • For size k-1 itemset Ik-1 v1, v2,, vk-1 and
    size k itemset Ik v1, v2,, vk-1,vk

mq is the number of instances of Iq nq is the
number of instances of item vq.
32
Algorithm-1
  • Step 1 generate complete set of size 2 patterns
    by Rtree-join on complete Rtrees

33
Algorithm-1
  • Step 1 generate complete set of size 2 patterns
    by Rtree-join on complete Rtrees

34
Algorithm-1
  • Step 1 generate complete set of size 2 patterns
    by Rtree-join on complete Rtrees

35
Algorithm-1
  • Step 1 generate complete set of size 2 patterns
    by Rtree-join on complete Rtrees

36
Algorithm-1
  • Step 2generate size k (kgt2) patterns level by
    level
  • Generate size k (kgt2) candidates
  • Join two size k-1 patterns
  • Prune those candidates which have subsets that
    are not frequent
  • Generate size k (kgt2) instances

37
Sample
Square a1 Triangle a2 Circle b1 Diamond c1
38
Process of Algorithm-1
  • RJ to find the instances of size 2 candidates
  • Build Rtree for each dataset A, B and C
  • Do RJ find the instances of size 2 candidates
  • ma1b1 5, ma2b1 3, ma1c1 2, ma2c1 0, mb1c1
    0
  • Get size 2 patterns a1b1, a2b1,a1c1 according to
    the frequency threshold 50
  • fa1b1 5/(33) ? 56, fa2b1 3/(23) 50,
  • fa1c1 2/(31) ? 67, fa2c1 0
  • fb1c1 0

39
Process of Algorithm-1
  • Sort-merge-join to find the instances of size k
    (kgt2) candidates
  • Generate size 3 candidates
  • Join size 2 pattern a1b1 and a1c1 to form a1b1c1
  • Prune a1b1c1 because b1c1 is not a pattern
  • Get size 3 patterns ( there is no size 3
    patterns)

40
Algorithm-2
  • Step 1generate all patterns for a combination of
    subsets. Each subset corresponds to an item. All
    points in the subset have the item as their
    attribute value.
  • E.g. The first combination is a1b1c1. It needs
    to build rtrees for subsets of a1, b1, c1 in
    order to generate size 2 patterns. Then it do
    sort-merge join to generate size k(kgt2) patterns.
  • Step 2 generate all patterns for another
    combination until there is no combination
  • E.g. The second combination is a2b1c1.

41
Process of Algorithm-2
  • Generate patterns for combination a1b1c1
  • RJ on Rtrees for a1, b1 and c1 to get instances
    of candidates a1b1, a1c1, b1c1
  • Suppose a1b1 and a1c1 are patterns, size 3
    candidates is a1b1c1
  • Sort-merge-join to get instances of a1b1c1
  • Generate patterns for combination a2b1c1
  • RJ on Rtrees for a2, b1, c1 to get instances of
    candidates a2b1 and a2c1. Because the instances
    of b1c1 has been generated, there is no need to
    do it again
  • Suppose a2b1 is pattern
  • There is no size 3 candidate

42
Experiment
  • Environment
  • CPU type Pentium III Xeon 700MHz
  • RAM 4096M
  • Dataset
  • Synthetic dataset with Gauss distribution
  • No. of clusters 5
  • Map size 800
  • E.g. (622, 478, 5) is a point in a dataset

43
Experiment-1
No. of Datasets 3 No. of Attribute Values
5 Distance threshold 100 Frequency threshold
0.01
44
Experiment-1
No. of Datasets 3 No. of Attribute Values
5 Distance threshold 100 Frequency threshold
0.01
45
Experiment-1
No. of Datasets 3 No. of Attribute Values
5 Distance threshold 100 Frequency threshold
0.01
46
Experiment-2
No. of Points in each datasets 1000 No. of
Attribute Values 5 Distance threshold
100 Frequency threshold 0.01
47
Experiment-3
No. of Datasets 5 No. of Points in each
datasets 1000 No. of Attribute Values
5 Distance threshold 100
48
Experiment-4
No. of Datasets 3 No. of Points in each
datasets 1000 No. of Attribute Values
5 Frequency threshold 0.01
49
Conclusions
  • Neighboring class set mining and co-location
    pattern mining problem are introduced
  • Spatial Congeries pattern mining is formulated
    and provided with two Apriori-like mining
    algorithms
  • Future work
  • More pruning methods should be used to reduce the
    time and space requirement
  • The experiments should be done on real datasets

50
References
  • HSX02 Huang Y., Shekhar S., Xiong H.
    Discovering Co-location Patterns from Spatial
    Datasets A General Approach. Submited to IEEE
    TKED (under second round review)
  • HXSP03 Huang Y., Xiong H., Shekar S., Pei J.
    Mining Confident Co-location Rules without A
    Support Threshold. Proc. of 18th ACM Symposium on
    Applied Computing (ACM SAC), 2003
  • Mor01 Morimoto Y. Mining Frequent Neighboring
    Class Sets in Spatial Databases. Proc. of ACM
    SIGKDD Int. Conf. on Knowledge Discovery and Data
    Mining, 2001.

51
QA
52
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com