Sequential Patterns

About This Presentation

Title:

Sequential Patterns

Description:

Sequential Patterns. Process Mining. Current State of Research ... (a,b)(c)(a,b,d) a1, a2, a3 (3)(4,5)(8) contained in (7) ... stores the postfix ... – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 31

Provided by: edeg4

Category:

more less

Transcript and Presenter's Notes

Title: Sequential Patterns

1
Sequential PatternsProcess Mining

Current State of Research
Edgar de Graaf
LIACS

2
Mining Sequential Patterns

Sequential Patterns
Sequence Databases
AprioriAll
PrefixSpan
Gap Constraints

3
Sequential Patterns

lt(a,b)(c)(a,b,d)gt
lt a1, a2, a3 gt
lt(3)(4,5)(8)gt contained in lt(7)(3,8)(9)(4,5,6)(8)gt
lt(3)(4,5)(8)gt not contained in lt(7)(3,8)(9)(4)(5,6
)(8)gt

4
Sequential databases
The Database with sequences
5
Sequential databases
lt(3)(4,5)(8)gt
Support count 0
A Generated Candidate Pattern
6
Sequential databases
lt(3)(4,5)(8)gt
Support count 0
1
7
Sequential databases
Support count 1
lt(3)(4,5)(8)gt
Not Contained ? Not Counted
8
Sequential databases
Contained
Support count 1
2
3
4
5
Contained
Contained
IF Minimal Support 50 THEN lt(3)(4,5)(8)gt
frequent
Contained
Contained
9
Lifting order (1)

Notation by examples
ltA,B,Cgt, a ordered list of sets sequence
Every set A,B and C is unordered. E.g. A
(x,y,z) (y,z,x) (z,y,x)
x,y,z is an extension we ignore the order when
counting frequency

10
Lifting order (2)

lt(t1)(t2)(t3)(t4)gt and
lt(t1)(t3)(t2)(t4)gt frequent
?
lt(t1)(t3,t2)(t4)gt is frequent
Says t3 and t2 occurs frequent in-between t1 and
t4 in either order

11
Lifting Order (3)

lt(t1)(t2)(t3)(t4)gt and
lt(t1)(t3)(t2)(t4)gt infrequent
suppose (t1)t3,t2(t4) frequent
Says often t3 and t2 occur in-between t1 and t4

12
Existing Algorithms

AprioriAll the first algorithm based on the
anti-monotone principles
PrefixSpan currently the fastest algorithm
around, it uses projected databases

13
AprioriAll (1)

AprioriAll(DB, min_sup)
L1 frequent sequences size 1
k 2
while(Lk-1 is not empty)
Ck candidateGeneration(Lk-1,k)
Ck candidatePruning(Ck, k)
Lk supportBasedPruning(Ck)
k

14
AprioriAll (2)

candidateGeneration(Lk-1, k)
Ck ø
for each a in Lk-1
for each b in Lk-1
if(all n, 1 n k-2 an bn)
toevoegen aan Ck de sequences
a1ak-2, ak-1, bk-1 en
a1ak-2, bk-1, ak-1

15
PrefixSpan (1)

Assume that the prefix lt(a,b)(c)gt
Scan de projected database to find every frequent
item x such that
lt(a,b)(c,x)gt is frequent or
lt(a,b)(c)(x)gt is frequent
Append the x to the prefix and output the pattern
Now call recursively e.g. PrefixSpan(lt(a,b)(c,x)gt
, newProjDB)

16
PrefixSpan (2)

A projected DB only stores the postfix
E.g. if prefix lt(a,b)gt then we store lt(a,b,x)gt
as lt( _, x)gt
New projected DB Old projected DB sequences
without prefix

17
PrefixSpan (3)

Faster than AprioriAll
No non-existing candidates
Testing on a shrinking projected DB

18
Gap Constraint

Simple idea between sequence-item-sets a maximal
distance
lt(a)(c)(d)(e)gt, e.g. pattern lt(a)(e)gt and gap
1 then this sequence is not counted

19
Process Mining

What is process mining?
Using D/F tables and graphs
Genetic Algorithms
Problem areas
Using sequential patterns

20
What is process mining? (1)

The ordering of events is known e.g. lt(task
A)(task B)(task C)gt
Process mining constructs a petri net

pay
ready
claim
register
to_be_evaluated
send_letter
Source Workflow Management by W. van der Aalst
and K. van Hee. (1997)
21
What is process mining? (2)

Usability of process mining
Given the audit trails, what is the workflow
network?
Mined workflow network original design? (Delta
Analysis)
Mined workflow network better than the original
design? (Performance Analysis)

22
Using D/F tables and graphs (1)

For every task a D/F table
Intuition if A is often followed by B then the
probability of A causing B increases

23
Using D/F tables and graphs (2)

A D/F graph is constructed
IF((A?B N) AND (A gt B s) AND
(B lt A s) THEN connection A to B
More complicated rules deal with recursion and
short loops

24
Using D/F tables and graphs (3)

D/F Graph example

25
Using D/F tables and graphs (4)

AND/OR-Splits
OR if neither C gt B or B gt C is higher
than the threshold
AND if both are higher than threshold

B
A
C
26
Genetic Algorithms (1)

Create a initial population of workflows
Calculate their fitness using audit trails
Create a child
Mutate the child
Repeat 3 to 4 to create the new population
Go to 2

27
Genetic Algorithms (2)

Advantages
Can deal with duplicate tasks and non-free
choice.
Disadvantages
The structure of the chromosome
How do we measure fitness?
How do we do cross-over and mutation?

28
Problem Areas (1)

Hidden tasks
Duplicate tasks when tasks have the same name

B
C
29
Problem Areas (2)

Mining non-free-choice

A
D
C
B
E
30
Problem Areas (3)

Mining Loops
ABCDBCD

D
A
B
C
31
Problem Areas (4)

Delta analysis how do we compare two models?
Other problems time, dealing with noise and
incompleteness.

32
Using sequential patterns

Mining loops?
Fitness measure in a GA?
Use in delta analysis?
Generate the important frequent subsequences to
help the designer

33
Further research in sequences

How about gaps between items in different item
sets?
What type of frequent subsequences to use in
fitness?
Lifting order, is it useful in workflow
generation?
Further research of lifting order

34
The End

Thank you for your attention
Edgar de Graaf
edegraaf_at_liacs.nl

Write a Comment

User Comments (0)

About PowerShow.com

Sequential Patterns - PowerPoint PPT Presentation

Sequential Patterns

Sequential Patterns. Process Mining. Current State of Research ... (a,b)(c)(a,b,d) a1, a2, a3 (3)(4,5)(8) contained in (7) ... stores the postfix ... – PowerPoint PPT presentation