Title: Spatial-Temporal Data Mining
1Spatial-Temporal Data Mining
- Wei Wang
- Data Mining Lab
- Computer Science Department UCLA
2Outline
- Introduction
- Active Spatial Data Mining
- Spatial data mining trigger
- Temporal Association Rule with Numerical
Attributes - Correlation among object evolutions
- Conclusions and Future Work
3Introduction
- Huge amount of spatial data are generated
everyday. - Earth Observing System
- National Spatial Data Infrastructure
- National Image Mapping Agency
- One meter resolution data
- Digital earth
- Users are usually interested in the hidden
information. - Aggregate information
- Clustering
- Patterns
4Introduction
- Knowledge discovery processes are computationally
expensive. - Todays technology advances provide necessary
computing power to carry out such complicated
processes.
5Outline
- Introduction
- STING An approach to active spatial data mining
- Temporal association rules with numerical
attributes - Conclusions and Future Work
6STING
- Since data evolves over time, interesting
patterns are likely to emerge or change. - Goal identify and find (most) interesting
patterns - Problems
- Knowledge discovery processes are expensive.
- It is not feasible to re-process the entire
data set for every change. - Periodically examine the data.
- Long delays
- Transient patterns might be missed
- Natural solution Usage of triggers.
7STING
- Traditional database triggers can not be directly
applied - Expressive power of traditional database triggers
is limited, especially in describing spatial
relationships. - Example Trigger investigation when the size of
any cluster exceeds 20.
8STING
- STING was designed to introduce and support
spatial triggers efficiently. - Observation (spatial locality) Only objects
added to the shaded area will contribute to the
growth of cluster size at this moment.
9STING
- STING Strategy Monitor only the area occupied
by potential clusters and their neighborhoods. - Observation (cumulative effect) at least 4 more
objects are needed in order to make the cluster
size be 20. - STING Strategy Space is organized in a
hierarchy so that updates can be suspended at
various levels in the hierarchy until the
cumulative effect might cause the trigger to be
fired.
10STING
- Space is recursively divided into smaller
rectangular cells down to a specified granularity
and is organized via the inherit pyramid
hierarchy.
11STING
- STING decomposes a trigger into a set of
sub-triggers associated with individual cells in
the hierarchical structure to monitor the
cumulative effect of data changes within the cell.
Level 4
12STING
- Updates/insertions are suspended at various
levels in the hierarchy until such time that the
cumulative effect of these insertions might cause
the trigger condition to become satisfied.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Level 1
Level 0
13STING
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Level 3
Level 2
No update of cluster !
14STING
- Primitive event insertion, deletion, update
- Composite event a set of primitive events
- In general, evaluating a trigger T usually
involves two aspects - Find a set of composite events E(s) that may
cause the trigger condition CT to become true. - Each time some composite event in E(s) occurs,
check the status (false or true) of CT (given
that CT was false previously). - Observation As a side effect of the occurrence
of some composite event, E(s) might also evolve
over time.
15STING
.
.
.
.
.
.
.
.
.
- STING Strategy Two sets of composite events are
considered - the set of composite events E(s) that can cause
CT to become true - need to re-evaluate CT
- the set of composite events F(s) that can cause a
change to E(s) - need to update E(s)
- The sub-triggers are used to monitor composite
events in E(s) and F(s) and change accordingly
when E(s) and F(s) evolves.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16STING
- Observation Trigger condition CT is a
conjunction of predicates P1 ? P2 ? ? Pn and
can not be true if one predicate is false. - They can be evaluated in a specific order the
ith predicate is tested when all previous (i -1)
predicates are true. - The evaluation order should be chosen in such a
way that the total cost is minimum.
17STING
- PK-tree is used to index instantiated cells
- Bound on height
- Bounds on number of children
- Uniqueness for any data set
- independent of order of insertion and deletion
- Solid theoretical foundation
- Fast retrieval and efficient maintenance
- Statistical information maintained at each node
is used to facilitate the trigger process. - Sub-trigger
18STING
- Comparison with periodic re-examination via STING
- 200,000 synthetic point objects
- 10,000 insertions/deletions/updates
- If the period is set to be less than 4000
updates, STING consumes less CPU cycles. - Significant delay and transient patterns misses
can occur for larger period. - Not acceptable in many applications
- No delay and no transient patterns missed with
STING.
19Outline
- Introduction
- STING An approach to active spatial data mining
- Temporal association rules with numerical
attributes - Conclusions and Future Work
20Temporal Association Rules
- Now we are considering general databases with
evolving numerical attributes. - Interesting patterns exhibited in the data are
often numerous and complicated. - Customer churning If a customers phone bill
increases by at least 10 each month for six
months, then he is likely to change his long
distance telephone carrier. - Real estate People who receive a raise of at
least 20 of their salary are likely to move away
from big city. - Such patterns can be represented by association
rules of the form X ? Y, which indicates that the
occurrences of X and Y have high correlation.
21Temporal Association Rules
- Earlier work on association rules mainly focused
on binary attributes and intra-transaction
relationship. - E.g., ham ? bread
- Support and strength are two metrics used to
qualify interesting rules. - support number of instances to follow the rule
- N(ham, bread)
- strength how strong the correlation is
-
-
22Temporal Association Rules
- Consider a set of objects, each of which has a
unique ID and a set of time varying numerical
attributes and a sequence of snapshots are taken
at some frequency. - E.g., in an employee database, two attributes are
considered salary and monthly housing expense. - For a given snapshot, each employee can be mapped
to a point in a two dimensional space.
23Temporal Association Rules
- Given a sequence of snapshots, the trace of an
employee can be mapped to a point in a high
dimensional space. - (lts1, mhe1gt, lts2, mhe2gt, lts3, mhe3gt, lts4, mhe4gt,
lts5, mhe5gt)
24Temporal Association Rules
- Temporal association rules represent the
correlation among object evolutions. - (salary 52000, 56000?54000, 58000) ?
(monthly_housing_expense 1200, 1400?1400,
1600) - Each temporal association rule can also be viewed
as an interpretation of a cluster (with certain
shape) of points.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
salary
.
.
.
.
.
.
.
.
.
.
.
.
.
.
monthly_housing_expense
25Temporal Association Rules
- Observation The domain of a numerical attribute
might contain a large number of distinct values
and might even be continuous. - E.g., domain(salary) 50000, 60000.
- Any sub-ranges can appear in a rule.
- The number of possible rules may be very large if
not infinite. - Strategy Each attribute domain is quantized into
a set of equi-length base intervals. - The domain of salary could be quantized into base
intervals of length 2000 - Values within the same interval are not
distinguished. - E.g., 51000 and 51500 are considered as the
same.
50000
60000
26Temporal Association Rules
60000
58000
56000
salary
54000
52000
50000
E1(salary) 52000, 54000 ? 52000, 54000 ?
54000, 56000
E2(salary) 52000, 56000 ? 52000, 54000 ?
52000, 56000
27Temporal Association Rules
28Temporal Association Rules
- The subcube-supercube relationship defines a
lattice among all evolution cubes within the
evolution space. - This also holds for the evolution space of more
than one attributes.
60000
salary
50000
1000
2000
monthly housing expense
29Temporal Association Rules
- Some properties of the metrics enable us to
search efficiently through the lattice in a
bottom-up manner.
30Temporal Association Rules
- Observation Many valid but trivial rules may
exist. - (salary 52000, 56000) ? (monthly_housing_expens
e 1200, 1400) - (salary 50000, 56000) ? (monthly_housing_expens
e 1200, 1400) - Both rules have the same value of support and
strength since no employees salary is between
50000 and 52000. However, the first rule conveys
more precise information.
31Temporal Association Rules
- Strategy An interval can be included in a rule
only if there are some minimum number of objects
whose attributes values fall into that interval. - The density of each base cube within the
evolution cube of a rule has to meet some
threshold. - In the previous example, the second rule can be
eliminated. - Property of density An evolution cube could
satisfy the density threshold only when all of
its subcubes satisfy the density threshold.
min_density 2
32Temporal Association Rules
- General Model
- Data set D
- Language L
- express properties or define subgroup of data
- Selection predicate q
- evaluate whether a sentence ? ? L defines a
potentially interesting class of D - Task find the set ? ? q(D, ?) is true
- If
- a lattice can be formed on sentences in L and
- partial order exists on selection predicate
- then the level-wise algorithm can be used to
prune search space efficiently.
33Temporal Association Rules
- Temporal Association Rule
- Language L each sentence ? ? L is a temporal
association rule. - The selection predicate q(D, ?) is true iff
- support(D, ?) ? min_support and
- strength(D, ?) ? min_strength and
- density(D, ?) ? min_density
- Task find the set of temporal association rules
which satisfy all three predicates. - Specialization relation lt ? a lattice on the
sentences in L - subcube/supercube relationship
q1
q2
q3
34Temporal Association Rules
- partial order on qi with respect to lt
- support(D, ?) ? support(D, ?) if ? lt ?
- if strength (D, ?) lt min_strength for all ? lt ?,
then strength(D, ?) lt min_strength - density(D, ?) ? density(D, ?) if ? lt ?
- level-wise algorithm
- basic scheme starting from the most special
(general) sentences, and then evaluate more and
more general (special) sentences excluding those
sentences that can not be interesting given all
the information obtained in earlier iterations. - Efficient space pruning
- Starting point
- Random sampling
- Order of predicate evaluation
35Temporal Association Rules
- Efficiency of space pruning
- SR algorithm after quantization, base intervals
are combined as long as their density satisfies
the threshold. The original base intervals and
the combined intervals are treated as a set of
items.
100000 objects 100 snapshots 5 attributes 500
rules of length 5 density 2 support
5 strength 1.4
36Conclusions and Future Work
- STING was developed to support spatial data
mining triggers very efficiently by - employing spatial locality property and
- postponing the trigger condition evaluation until
the cumulative effect might cause the trigger to
be fired. - Temporal association rules were introduced to
capture relationship among object evolutions. - Selected continuous work
- Patterns whose cause and consequence do not
happen together - There is a delay for the consequence to show up.
- Patterns involving relationships among objects
- e.g., children tend to live further away from
their parent when they grow up.
37Conclusions and Future Work
- Selected future work
- Data mining over Internet
- data type
- networking issue
- Analytical model
- classify data mining problems
- devise efficient general approach
- Applications
- compiler/programming language
- WWW