SEG 4630 Tutorial 10 - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

SEG 4630 Tutorial 10

Description:

The minimum allowed time difference between two consecutive elements in a sequence ... The maximum allowed time difference of the latest and earliest occurrences of ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 15
Provided by: seem2
Category:

less

Transcript and Presenter's Notes

Title: SEG 4630 Tutorial 10


1
SEG 4630 Tutorial 10
  • Sequential Patterns
  • Lijun Chang

2
Sequential Patterns
  • Elements Events
  • Sequence an ordered list of elements
    (transactions)
  • s lte1e2e3engt
  • Element a collection of one or more events
    (items)
  • e1 i1,i2,ik
  • k-sequence a sequence with k events(items)
  • E.g. a 8-sequence lt1,2 4,6 7 8,2,10gt

3
Examples of Sequence Data
Sequence Database
Element (Transaction)
Event (Item)
235
61
1
Sequence
4
Definition of a Subsequence
  • A sequence lta1 a2 angt is contained in another
    sequence ltb1 b2 bmgt (m n) if there exist
    integers i1 lt i2 lt lt in such that a1 ? bi1 ,
    a2 ? bi1, , an ? bin

5
The count of subsequences
  • How many k-subsequences can be extracted from a
    given n-sequence?
  • lta b c d e f g h igt n 9
  • k4 Y _ _ Y Y _ _ _ Y
  • lta d e igt

6
Sequential Patterns Mining
  • sequential pattern
  • a frequent subsequence
  • support is minsup
  • Aprior principle for sequential data mining
  • Any data sequence that contains a particular
    k-sequence must also contain all of its
    (k-1)-subsequences.
  • Iteratively
  • Generate new candidate k-sequences
  • Prune candidates whose (k-1)-sequences are
    infrequent
  • Count the supports of the candidates
  • Eliminate infrequent candidate k-sequences

7
Candidates Generation
  • Sequence Merging (kgt2)
  • To generate a candidate k-sequence
  • Merging of two frequent (k-1)-sequence w1 and w2
  • if the subsequence obtained by removing the first
    event in w1 is the same as the subsequence
    obtained by removing the last event in w2
  • Resulting candidates -- extended w1 with the last
    event of w2.
  • Situations on merging
  • If the last two events in w2 belong to the same
    element, then the last event in w2 becomes part
    of the last element in w1
  • e.g. Generate a 4-sequence from two 3-sequence
  • w1 lt 1 5 3 gt dropping first event ? lt
    53 gt
  • W2 lt 5 3, 4 gt dropping the last event ? lt
    53 gt and 4 is in same element 3,4
  • lt1 5 3, 4gt
  • Otherwise, the last event in w2 becomes a
    separate element appended to the end of w1

8
Timing Constraints
  • Maxspan
  • The maximum allowed time difference between the
    earliest event and the latest event in the entire
    sequence.
  • Mingap
  • The minimum allowed time difference between two
    consecutive elements in a sequence
  • Maxgap
  • The maximum allowed time difference between two
    consecutive elements in a sequence
  • Window Size
  • The maximum allowed time difference of the latest
    and earliest occurrences of events in any element
    of a sequential pattern.
  • ws0 ? all events in the same element of a
    pattern must occur simultaneously

9
Problem on Maxgap
  • Using of maxgap may produce contradictions to the
    apriori principle
  • Suppose
  • xg 1 (max-gap)
  • ng 0 (min-gap)
  • minsup 60
  • lt2 5gt support 40
  • lt2 3 5gt support 60
  • Support increased when the number of events in a
    sequence increases!?

Aprior principle for sequential data Any data
sequence that contains a particular k-sequence
must also contain all of its (k-1)-subsequences.
10
Modified Apriori Principle
  • If a k-sequence is frequent, then all of its
    contiguous k-1-subsequences must also be
    frequent.
  • Contiguous subsequence
  • S is a contiguous subsequence of w if
  • Deleting an event from either the first or the
    last element
  • E.g. lt1 2gt is contiguous of lt123gt
  • Deleting an event from any element having at
    least two events
  • E.g. lt1 2gt is contiguous of lt1 2,3 3gt
  • S is a contiguous subsequence of t which is a
    contiguous subsequence of w.

11
Examples
  • Q.13(P.480)
  • Consider the following frequent 3-sequences
  • lt 1, 2, 3 gt, lt 1, 23 gt, lt 12, 3 gt, lt
    1, 24 gt,
  • lt 1, 34 gt, lt 1, 2, 4 gt, lt 2, 33 gt, lt
    2, 34 gt,
  • lt 233 gt, and lt 234 gt.
  • (a) the candidate 4-sequences produced by the
    candidate generation step of the GSP algorithm.
  • lt 1, 2, 3 3 gt, lt 1, 2, 3 4 gt, lt 1, 2
    3 3 gt, lt 1, 2 3 4 gt,
  • lt 1 2, 3 3 gt, lt 1 2, 3 4 gt.
  • (b) The candidate 4-sequences pruned during the
    candidate pruning
  • step of the GSP algorithm (assuming no timing
    constraints).
  • lt 1, 2, 3 3 gt, lt 1, 2 3 3 gt, lt 1, 2
    3 4 gt, lt 1 2, 3 3 gt,
  • lt 1 2, 3 4 gt.

12
Examples
  • Q.12(p.480)
  • For each of the sequence w lte1, . . . , elastgt
    below,
  • determine whether they are subsequences of the
    following data sequence
  • s A,BC,DA,BC,DA,BC,D
  • subjected to the following timing constraints
  • mingap 0 (interval between last event in ei
    and first event in ei1 is gt 0)
  • maxgap 2 (interval between first event in ei
    and last event in ei1 is 2)
  • maxspan 6 (interval between first event in e1
    and last event in elast is 6)
  • ws 1 (time between first and last events in ei
    is 1)

13
More Examples
  • s lt1 2 2,5 3 4gt
  • w lt1,2 3,4gt

14
More Examples
  • s1 lt1,2,3,4gt
  • w1 lt1,2 3,4gt ?
  • s2 lt1,2,3,4 5gt
  • w2 lt1,2 3,4,5gt ?
  • s3 lt1 2 3 4gt
  • w3 lt1,3 2,4gt ?
Write a Comment
User Comments (0)
About PowerShow.com