Spatial-Temporal Data Mining - PowerPoint PPT Presentation

About This Presentation
Title:

Spatial-Temporal Data Mining

Description:

Spatial-Temporal Data Mining Wei Wang Data Mining Lab Computer Science Department UCLA Outline Introduction Active Spatial Data Mining Spatial data mining trigger ... – PowerPoint PPT presentation

Number of Views:306
Avg rating:3.0/5.0
Slides: 38
Provided by: weiwang
Learn more at: https://www.cs.unc.edu
Category:

less

Transcript and Presenter's Notes

Title: Spatial-Temporal Data Mining


1
Spatial-Temporal Data Mining
  • Wei Wang
  • Data Mining Lab
  • Computer Science Department UCLA

2
Outline
  • Introduction
  • Active Spatial Data Mining
  • Spatial data mining trigger
  • Temporal Association Rule with Numerical
    Attributes
  • Correlation among object evolutions
  • Conclusions and Future Work

3
Introduction
  • Huge amount of spatial data are generated
    everyday.
  • Earth Observing System
  • National Spatial Data Infrastructure
  • National Image Mapping Agency
  • One meter resolution data
  • Digital earth
  • Users are usually interested in the hidden
    information.
  • Aggregate information
  • Clustering
  • Patterns

4
Introduction
  • Knowledge discovery processes are computationally
    expensive.
  • Todays technology advances provide necessary
    computing power to carry out such complicated
    processes.

5
Outline
  • Introduction
  • STING An approach to active spatial data mining
  • Temporal association rules with numerical
    attributes
  • Conclusions and Future Work

6
STING
  • Since data evolves over time, interesting
    patterns are likely to emerge or change.
  • Goal identify and find (most) interesting
    patterns
  • Problems
  • Knowledge discovery processes are expensive.
  • It is not feasible to re-process the entire
    data set for every change.
  • Periodically examine the data.
  • Long delays
  • Transient patterns might be missed
  • Natural solution Usage of triggers.

7
STING
  • Traditional database triggers can not be directly
    applied
  • Expressive power of traditional database triggers
    is limited, especially in describing spatial
    relationships.
  • Example Trigger investigation when the size of
    any cluster exceeds 20.

8
STING
  • STING was designed to introduce and support
    spatial triggers efficiently.
  • Observation (spatial locality) Only objects
    added to the shaded area will contribute to the
    growth of cluster size at this moment.

9
STING
  • STING Strategy Monitor only the area occupied
    by potential clusters and their neighborhoods.
  • Observation (cumulative effect) at least 4 more
    objects are needed in order to make the cluster
    size be 20.
  • STING Strategy Space is organized in a
    hierarchy so that updates can be suspended at
    various levels in the hierarchy until the
    cumulative effect might cause the trigger to be
    fired.

10
STING
  • Space is recursively divided into smaller
    rectangular cells down to a specified granularity
    and is organized via the inherit pyramid
    hierarchy.

11
STING
  • STING decomposes a trigger into a set of
    sub-triggers associated with individual cells in
    the hierarchical structure to monitor the
    cumulative effect of data changes within the cell.

Level 4
12
STING
  • Updates/insertions are suspended at various
    levels in the hierarchy until such time that the
    cumulative effect of these insertions might cause
    the trigger condition to become satisfied.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.








.
.
.
.
.
.
.
.
.
.
Level 1
Level 0
13
STING
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.








.
.
.
.
.
.
.
.
.
.
Level 3
Level 2
No update of cluster !
14
STING
  • Primitive event insertion, deletion, update
  • Composite event a set of primitive events
  • In general, evaluating a trigger T usually
    involves two aspects
  • Find a set of composite events E(s) that may
    cause the trigger condition CT to become true.
  • Each time some composite event in E(s) occurs,
    check the status (false or true) of CT (given
    that CT was false previously).
  • Observation As a side effect of the occurrence
    of some composite event, E(s) might also evolve
    over time.

15
STING
.
.
.
.
.
.
.
.
.
  • STING Strategy Two sets of composite events are
    considered
  • the set of composite events E(s) that can cause
    CT to become true
  • need to re-evaluate CT
  • the set of composite events F(s) that can cause a
    change to E(s)
  • need to update E(s)
  • The sub-triggers are used to monitor composite
    events in E(s) and F(s) and change accordingly
    when E(s) and F(s) evolves.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16
STING
  • Observation Trigger condition CT is a
    conjunction of predicates P1 ? P2 ? ? Pn and
    can not be true if one predicate is false.
  • They can be evaluated in a specific order the
    ith predicate is tested when all previous (i -1)
    predicates are true.
  • The evaluation order should be chosen in such a
    way that the total cost is minimum.

17
STING
  • PK-tree is used to index instantiated cells
  • Bound on height
  • Bounds on number of children
  • Uniqueness for any data set
  • independent of order of insertion and deletion
  • Solid theoretical foundation
  • Fast retrieval and efficient maintenance
  • Statistical information maintained at each node
    is used to facilitate the trigger process.
  • Sub-trigger

18
STING
  • Comparison with periodic re-examination via STING
  • 200,000 synthetic point objects
  • 10,000 insertions/deletions/updates
  • If the period is set to be less than 4000
    updates, STING consumes less CPU cycles.
  • Significant delay and transient patterns misses
    can occur for larger period.
  • Not acceptable in many applications
  • No delay and no transient patterns missed with
    STING.

19
Outline
  • Introduction
  • STING An approach to active spatial data mining
  • Temporal association rules with numerical
    attributes
  • Conclusions and Future Work

20
Temporal Association Rules
  • Now we are considering general databases with
    evolving numerical attributes.
  • Interesting patterns exhibited in the data are
    often numerous and complicated.
  • Customer churning If a customers phone bill
    increases by at least 10 each month for six
    months, then he is likely to change his long
    distance telephone carrier.
  • Real estate People who receive a raise of at
    least 20 of their salary are likely to move away
    from big city.
  • Such patterns can be represented by association
    rules of the form X ? Y, which indicates that the
    occurrences of X and Y have high correlation.

21
Temporal Association Rules
  • Earlier work on association rules mainly focused
    on binary attributes and intra-transaction
    relationship.
  • E.g., ham ? bread
  • Support and strength are two metrics used to
    qualify interesting rules.
  • support number of instances to follow the rule
  • N(ham, bread)
  • strength how strong the correlation is

22
Temporal Association Rules
  • Consider a set of objects, each of which has a
    unique ID and a set of time varying numerical
    attributes and a sequence of snapshots are taken
    at some frequency.
  • E.g., in an employee database, two attributes are
    considered salary and monthly housing expense.
  • For a given snapshot, each employee can be mapped
    to a point in a two dimensional space.

23
Temporal Association Rules
  • Given a sequence of snapshots, the trace of an
    employee can be mapped to a point in a high
    dimensional space.
  • (lts1, mhe1gt, lts2, mhe2gt, lts3, mhe3gt, lts4, mhe4gt,
    lts5, mhe5gt)

24
Temporal Association Rules
  • Temporal association rules represent the
    correlation among object evolutions.
  • (salary 52000, 56000?54000, 58000) ?
    (monthly_housing_expense 1200, 1400?1400,
    1600)
  • Each temporal association rule can also be viewed
    as an interpretation of a cluster (with certain
    shape) of points.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
salary
.
.
.
.
.
.
.
.
.
.
.
.
.
.
monthly_housing_expense
25
Temporal Association Rules
  • Observation The domain of a numerical attribute
    might contain a large number of distinct values
    and might even be continuous.
  • E.g., domain(salary) 50000, 60000.
  • Any sub-ranges can appear in a rule.
  • The number of possible rules may be very large if
    not infinite.
  • Strategy Each attribute domain is quantized into
    a set of equi-length base intervals.
  • The domain of salary could be quantized into base
    intervals of length 2000
  • Values within the same interval are not
    distinguished.
  • E.g., 51000 and 51500 are considered as the
    same.

50000
60000
26
Temporal Association Rules
  • Attribute evolution

60000
58000
56000
salary
54000
52000
50000
E1(salary) 52000, 54000 ? 52000, 54000 ?
54000, 56000
E2(salary) 52000, 56000 ? 52000, 54000 ?
52000, 56000
27
Temporal Association Rules
28
Temporal Association Rules
  • The subcube-supercube relationship defines a
    lattice among all evolution cubes within the
    evolution space.
  • This also holds for the evolution space of more
    than one attributes.

60000
salary
50000
1000
2000
monthly housing expense
29
Temporal Association Rules
  • Some properties of the metrics enable us to
    search efficiently through the lattice in a
    bottom-up manner.

30
Temporal Association Rules
  • Observation Many valid but trivial rules may
    exist.
  • (salary 52000, 56000) ? (monthly_housing_expens
    e 1200, 1400)
  • (salary 50000, 56000) ? (monthly_housing_expens
    e 1200, 1400)
  • Both rules have the same value of support and
    strength since no employees salary is between
    50000 and 52000. However, the first rule conveys
    more precise information.

31
Temporal Association Rules
  • Strategy An interval can be included in a rule
    only if there are some minimum number of objects
    whose attributes values fall into that interval.
  • The density of each base cube within the
    evolution cube of a rule has to meet some
    threshold.
  • In the previous example, the second rule can be
    eliminated.
  • Property of density An evolution cube could
    satisfy the density threshold only when all of
    its subcubes satisfy the density threshold.

min_density 2
32
Temporal Association Rules
  • General Model
  • Data set D
  • Language L
  • express properties or define subgroup of data
  • Selection predicate q
  • evaluate whether a sentence ? ? L defines a
    potentially interesting class of D
  • Task find the set ? ? q(D, ?) is true
  • If
  • a lattice can be formed on sentences in L and
  • partial order exists on selection predicate
  • then the level-wise algorithm can be used to
    prune search space efficiently.

33
Temporal Association Rules
  • Temporal Association Rule
  • Language L each sentence ? ? L is a temporal
    association rule.
  • The selection predicate q(D, ?) is true iff
  • support(D, ?) ? min_support and
  • strength(D, ?) ? min_strength and
  • density(D, ?) ? min_density
  • Task find the set of temporal association rules
    which satisfy all three predicates.
  • Specialization relation lt ? a lattice on the
    sentences in L
  • subcube/supercube relationship

q1
q2
q3
34
Temporal Association Rules
  • partial order on qi with respect to lt
  • support(D, ?) ? support(D, ?) if ? lt ?
  • if strength (D, ?) lt min_strength for all ? lt ?,
    then strength(D, ?) lt min_strength
  • density(D, ?) ? density(D, ?) if ? lt ?
  • level-wise algorithm
  • basic scheme starting from the most special
    (general) sentences, and then evaluate more and
    more general (special) sentences excluding those
    sentences that can not be interesting given all
    the information obtained in earlier iterations.
  • Efficient space pruning
  • Starting point
  • Random sampling
  • Order of predicate evaluation

35
Temporal Association Rules
  • Efficiency of space pruning
  • SR algorithm after quantization, base intervals
    are combined as long as their density satisfies
    the threshold. The original base intervals and
    the combined intervals are treated as a set of
    items.

100000 objects 100 snapshots 5 attributes 500
rules of length 5 density 2 support
5 strength 1.4
36
Conclusions and Future Work
  • STING was developed to support spatial data
    mining triggers very efficiently by
  • employing spatial locality property and
  • postponing the trigger condition evaluation until
    the cumulative effect might cause the trigger to
    be fired.
  • Temporal association rules were introduced to
    capture relationship among object evolutions.
  • Selected continuous work
  • Patterns whose cause and consequence do not
    happen together
  • There is a delay for the consequence to show up.
  • Patterns involving relationships among objects
  • e.g., children tend to live further away from
    their parent when they grow up.

37
Conclusions and Future Work
  • Selected future work
  • Data mining over Internet
  • data type
  • networking issue
  • Analytical model
  • classify data mining problems
  • devise efficient general approach
  • Applications
  • compiler/programming language
  • WWW
Write a Comment
User Comments (0)
About PowerShow.com