Title: Event Stream Processing with Out-of-Order Data Arrival
1Event Stream Processing with Out-of-Order Data
Arrival
- Presenter Mo Liu
- Presentation based on
- Ming Li, Mo Liu, Luping Ding , Elke A.
Rundensteiner, and Murali Mani - Worcester Polytechnic Institute, Worcester MA USA
- DEPSA at ICDCS 2007, June 29th 2007, Toronto ON
Canada
2Outline
- Introduction
- Preliminary
- Problem with Out-of-Order Event Arrival
- Solution
- Experiment
- Conclusion
- Related Work
3Introduction Event Stream Processing
- Raising interest in the database community
- Wild-range and growing applications
Example of Event Stream Processing Shoplifting
in Retail Management
4Introduction Complex Event Processing (CEP)
- Event Stream Processing Engine
- Stream engine specific for event stream query
generic for detecting and extracting expected
pattern sequence - Performance gain compared to stream system using
joins to handle event sequence query
SASE Approach
5Introduction Limitations
- Total Order Assumption in event arrivals
- Order in which the events are received by the
query system is the same as their timestamp order - By this assumption, later arrival means larger
timestamp - What if Out-of-Order?
- Out-of-Order data arrival is common in
distributed computing environment (i.e., due to
network traffic) - Systems based on total order assumption (i.e.
SASE) miss qualified results and produce spurious
results
6Outline
- Introduction
- Preliminary
- Problem with Out-of-Order Event Arrival
- Solution
- Experiment
- Conclusion
- Related Work
7Preliminary Query Language
-
-
- EVENT ltevent patterngt
- WHERE ltqualificationgt
- WITHIN ltwindowgt
Example EVENT SEQ (A, B, D) WITHIN 10
seconds
Queries in SASE assume above language structure
8Preliminary Finding Result Sequences
- SSC (Sequence Scan and Construction)
- Sequence Scan employs an NFA to detect
matches Sequence Construction constructs
expected results - NFA with AIS (Active Instance Stack)
AIS associates a stack with each state of the NFA
storing the events that triggered the NFA
transition to this state
- RIP (Most Recent Instance in Previous Stack)
field - The field records the temporal order relevant
to the query
9Preliminary Finding Result Sequences (Cont.)
EVENT SEQ(A, B, D) WITHIN 10 Seconds
A
B
D
0
1
2
3
a3
a3 b6
b6 d10
a3 b6 d10
a7
a7 b11
b11 d15
a3 b6 d15 a3 b11 d15 a7 b11 d15
WD
a16
S1
S2
S3
f f a
b
b
a
c
b
a
d
f
c
d
1
11
3
5
6
7
10
12
13
15
Timestamp
16 18 18
10Preliminary Purging Operator States
EVENT SEQ(A, B, D) WITHIN 10 Seconds
A
B
D
0
1
2
3
PSSC You see d15 ? Purge a3 and so on
() a3
(b6) d10
(a3) b6
() a7
(b11) d15
(a7) b11
S1
S3
S2
a
c
b
a
d
f
c
d
f f a
b
b
3
5
6
7
10
12
13
15
1
11
16 18 19
Timestamp
11Outline
- Introduction
- Preliminary
- Problem with Out-of-Order Event Arrival
- Solution
- Experiment
- Conclusion
- Related Work
12Problem with Out-of-Order at SSC Incomplete
Event Retrieval
EVENT SEQ(A, B, D) WITHIN 10 Seconds
SSC Missing Result
b
a
c
b
a
d
f
c
d
f
a
b
d
f
11
3
5
6
7
10
12
13
15
1
16
0
2
18
Received Order
Out-of-Order Event Arrival
Produced Result
Correct Result
A
B
D
0
1
2
3
a3 b6 d10 a7 b11 d15
a0 b1 d2 a3 b6 d10 a7 b11 d15
Missing!
() a3
(b6) d10
(a3) b6
() a7
(b11) d15
(a7) b11
13Problem with Out-of-Order at SSC Event
Misplacement
Produced Result
Correct Result
a3 b6 d8 a3 b11 d8
a3 b6 d8
a3
a3 b6
b6 d10
a7
a7 b11
b11 d15
Wrong!
b11 d8
Missing!
S1
S2
S3
Incorrect AIS Appending
a
c
b
a
d
f
c
d
f
b
d
f
b
3
5
6
7
10
12
13
15
11
16
1
8
18
Received Order
Out-of-Order Event Arrival
14Problem with Out-of-Order at PSSC
Purge in SS You see d15 then purge a3 and so
on After that, OOO d8 comes ? Missing Result!
unauthorized AIS purge ? CLAIM Any
data purge of active instance stack (AIS) is
unauthorized unless total order on the data
arrival holds for the input stream
EVENT SEQ(A, B, D) WITHIN 10 Seconds
A
B
D
0
1
2
3
() a3
(b6) d10
(a3) b6
() a7
(b11) d15
(a7) b11
a3 b6 d8
S1
S2
S3
b
a
c
b
a
d
f
c
d
f
d
f
b
11
3
5
6
7
10
12
13
15
1
16
8
18
Received Order
Out-of-Order Event Arrival Example 3
If precise query result is required, and memory
resources is limited, WD in SS would not be
sufficient for handling Out-of-order event
arrival!
15Outline
- Introduction
- Preliminary
- Problem with Out-of-Order Event Arrival
- Solution
- Experiment
- Conclusion
- Related Work
16Solution in SSC
- Event Retrieval Mechanism
- To avoid incomplete retrieval, all states of
the NFA need to be set active before the
retrieval over the event stream.
b
a
c
b
a
d
f
c
d
f
b
a
d
f
11
3
5
6
7
10
12
13
15
1
0
16
2
17
Received Order
Out-of-Order Event Arrival
A
B
D
Produced Result
0
1
2
3
a0 b1 d2 a3 b6 d10 a7 b11 d15
() a0
(a0) b1
(b1) d2
() a3
(b6) d10
(a3) b6
() a7
(b11) d15
(a7) b11
17Solution in SSC (Cont.)
- AIS Construction Mechanism
- For avoiding event misplacement, use sort
semantics instead of append semantics -
a3 b8 d10 a7 b8 d10 a3 b8 d15 a7 b8 d15
a3
a3 b6
a7
b8 d10
a7 b8
a7 b11
b11 d15
S1
S2
S3
Correct AIS Appending
f
b
b
f
a
c
b
a
d
f
c
d
b
11
3
5
6
7
10
12
13
15
1
16
8
18
Received Order
Out-of-Order Event Arrival
18SSC Algorithm with Out-of-Order Handling
- Out-of-Order Handling Incorporated SSC
- Input
- (1) Sequence Query EVENT SEQ (E1, E2, ,
Em) WITHIN W - (2) AIS constructed from previously input
events - (3) newly received event ei (under event
type Ei) - Output
- (1) updated AIS
- (2) sequence output of SSC
- 1. IF event type Ei is among E1, E2, , Em
- 2. insert ei into stack Si (using sort
semantics) - 3. set eis RIP
- 4. check the RIP values of the instances in
stack Si1 and reset the ones being
affected by ei - 5. produce event sequences containing ei if
any
19Optimization
- Out-of-Order Handling Incorporated SSC with
AIS_CLOCK - Input and output Same as Algorithm 1
- 1. IF event type Ei is among E1, E2, , Em
- 2. IF ei.timestamp lt AIS_CLOCK
- 3. buffer ei
- 4. insert ei into stack Si (using sort
semantics) - 5. set eis RIP
- 6. check the RIP values of the instances in
stack Si1 and reset the ones
being affected - 7. produce event sequences containing
ei if any - 8. ELSE
- 9. buffer ei
- 10. insert ei into stack Si (using
append semantics) - 11. set eis RIP
- 12. IF Ei Em
- 13. produce event sequences
containing ei if any
20Solution for PSSC
- Using K-Slack
- We apply K-Slack based on time units. It
assumes that the out-of-ordering in event
arrivals is within a range of k time units. That
is, an event can be delayed for at most k time
units.
a3 b6 d8
a
c
b
a
d
f
c
d
f
b
d
f
b
3
5
6
7
10
12
13
15
11
16
1
8
18
Received Order
21- Purge condition
- ei.timestamp W K lt CLOCK
- (After waiting for K time units, no
out-of-order event with timestamp less than ei
W can arrive. Thus ei will no longer be able to
contribute to forming a new candidate event
sequence) - CLOCK
- Its value equals to largest timestamp seen so
far from the received events is maintained.
22PSSC Algorithm With Out-of-Order Handling
- Out-of-Order Incorporated SSC Purge (PSSC)
- Input (1) current AIS (2) CLOCK triggering from
SSC - Output updated AIS
- 1. On receiving a CLOCK triggering
- 2. for event instance e in AIS
- 3. IF e.timestamp W K lt CLOCK
- 4. purge e
23Optimization 1 AIS partition
We can divide each stack in AIS into two parts
outdated event instances (e.timestamp W K gt
CLOCK ) up-to-date event instances. (e.timestamp
W gt CLOCK)
SEQ(A, B, D)
W7 K10 (large)
SSC output when d13 comes
Cost !
a3 b5 d18 a3 b5 d18 a3 b11 d18 a7 b11 d18
b1
a3
a7
a3 b5
b5 d10
divider
a7 b11
b11 d18
S1
S2
S3
c
b
b
a
c
b
a
d
f
f
f
d
11
3
4
5
7
10
12
13
18
1
18
15
Received Order
Out-of-Order Event Arrival
24Optimization 2 Lazy Purge
For each CLOCK update, only the instance in the
last AIS stack will be checked for data purge.
For any instance is purged from there, we can
purge instances in other AIS stacks following the
RIP path.
b6 d10
a3
a3 b6
b11 d15
a7
a7 b11
25Outline
- Introduction
- Preliminary
- Problem with Out-of-Order Event Arrival
- Solution
- Experiment
- Conclusion
- Related Work
26Experiment 1Sequence Scan and Construction
(SSC)
SEQ (A, B, C, D, E, F))
CPU gain on applying the AIS_CLOCK
Out-of-order data percentage is 90
Y axis cost Inserting events and resetting RIP
27Experiment 2 Applying AIS partition during the
SSC purge
- Performance Gain On Memory
Performance Gain on CPU cost
28Outline
- Introduction
- Preliminary
- Problem with Out-of-Order Event Arrival
- Solution
- Experiment
- Conclusion
- Related Work
29Conclusion
- In this work, we address the problem of
processing event stream with out-of-order data
arrival - we analyze the problems state-of-the-art event
stream processing technology would experience
when faced with out-of-order data arrival - we propose new implementation and optimization
strategies for the core stream algebra operators - we conduct an experimental study that clearly
demonstrates the effectiveness of our proposed
approach over existing solutions
30Outline
- Introduction
- Preliminary
- Problem with Out-of-Order Event Arrival
- Solution
- Experiment
- Conclusion
- Related Work
31Related Work
- Some initial work uses K-slack to investigate the
out-of-order problem for homogenous-input stream
systems
- Aurora deals with out of order within
operator-level Order-sensitive operators wait a
certain period of time before closing each window - Cayuga system deals with out-of-order by waiting
K time unite before all the processing, which has
higher latency then ours
- Stream punctuation confirms that a certain value
or time stamp will no longer appear in the future
input streams. It requires certain service to
first be created and appropriately associated
32Thank you!?