Title: Operator Scheduling in a Data Stream Manager
1Operator Scheduling in a Data Stream Manager
- D. Charney, U.Çetintemel, A.Rasin, S.Zdonik, -
Brown University - M.Cherniack - Brandeis University
- M.Stonebraker - MIT
- Proceedings of the 29th VLDB Conference, Berlin,
Germany
Presenter Sriram Krishnan Date 3/30/05
2Agenda
- Aurora DSMS Architecture
- Scheduling Algorithms
- Tuple Batching
- Experimental Evaluation
- QoS Aware Scheduling
- Conclusion
3Overview of Stream Processing
- Many applications / devices create data streams
- Examples sensor networks, position tracking,
network management, Health monitor, etc. - These applications require timely processing of
large number of continuous, potentially rapid and
asynchronous data streams.
4Aurora data stream manager
- Addresses the performance and processing
requirements of stream-based applications. - Supports multiple concurrent continuous queries
on one or more application data streams - continuous query consists of a directed acyclic
graph of a well-defined set of operators (boxes
in Aurora) - Applications define their service expectations
using Quality-of-Service (QoS) specifications
5Operator Scheduler
- A key component of any data stream management
system. - Multiplexes processor usage to multiple
continuous queries according to application
specified QoS. - Simple processor allocation can be achieved by
assigning a thread per operator. - Not good (why?)
6Paper overview
- This paper shows that having finer-grain control
of processor allocation can make a significant
difference to overall system performance. - The paper describes the design and implementation
of the Aurora scheduler.
7Motivation Cost components of continuous query
- Random and Round robin scheduling.
- Inference?
- The actual time spent for processing is smaller
than 5 of the overall execution time in both
cases.
8Aurora scheduler
- Performs the following tasks
- Constructs a Dynamic scheduling-plan that
specifies, - Which boxes to schedule
- In which order to schedule the boxes
- How many tuples to process at each box execution.
- Schedules based on the QoS
- Strives to maximize the overall QoS delivered to
the client applications
9Aurora System Model (High Level)
- Fundamentally a data-flow system.
- Tuples flow through a loop-free, directed graph
of processing operations (a.k.a. boxes).
10Aurora System Model
- Tuples generated by data sources arrive at the
input and are queued for processing. - The scheduler selects boxes with waiting tuples
and executes them on one or more of their input
tuples. - The output tuples of a box are queued at the
input of the next box in sequence. - The QoS is specified primarily based on the
notion of the latency (i.e., delay) of output
tuples - Output tuples should be produced in a timely
fashion, otherwise, QoS will degrade as latencies
get longer.
11Aurora Architecture
The box processor executes the appropriate
operation and then forwards the output tuples to
the router. Question Why should we monitor QoS?
- Conceptually the Scheduler
- Picks a box for execution.
- Ascertains how many tuples to process from its
input. - Passes the information to the multi-threaded box
processor.
12Execution Model
- Thread-based execution
- Each operator/query is processed in its own
thread - The operating system manages resource allocation
- Advantages
- Easy to program
- Efficient operating system algorithms
- Disadvantages
- Overhead due to cache misses, lock contention and
context switching. - Software has limited control of resource
management.
13Aurora - Execution Model
- Aurora uses a state-based scheduling execution
model. - There is a single scheduler thread that tracks
system state and maintains the execution queue. - The execution queue is shared among a small
number of worker threads - This model
- Enables fine grained allocation of resources
according to application specifications - Enables effective batching of operators and
tuples (Why is this not possible with Thread
based?).
14Execution Model - Comparison
- As system workload increases, Performance
degrades almost linearly in Aurora and
exponentially in thread-per-box. - What Does it mean?
15Two-Level Scheduling
- First level involves which continuous (sub-)query
to process. - Used for dynamically assigning priorities to
operators - Second level involves how precisely the selected
query should be processed. - Used for choosing the order in which the
component operators will be executed. - Outcome of above decisions are a sequence of
operators, referred to as a scheduling plan.
16Sample Query Tree
- The tree is rooted at box b1 (Aurora constraint)
- We will refer to this tree in subsequent slides
17Superbox - Operator Batching
- A tree of boxes rooted at an output box
- Sequence of boxes that is scheduled as an atomic
group. - Superboxes decrease the overall execution costs
and improve scalability - They reduce the scheduling overhead by scheduling
multiple boxes as a single unit - They eliminate the need to access the storage
manager for each individual box execution.
18Scheduling
- First-level scheduling - Superbox selection
- Static and dynamic scheduling approaches
- Static approaches to scheduling are defined prior
to runtime. - Aurora implements a static superbox selection -
application-at-a-time one superbox per query. - Dynamic approaches use runtime information and
statistics to adjust and prioritize scheduling
order. - Second-level scheduling Superbox traversal
- Specifies the ordering of the boxes in the
scheduling plan. - Accomplished by traversing the superbox.
19Superbox Traversal
- Superbox traversal refers to how the operators
within a superbox should be executed - Three traversal Algorithms
- Min-Cost (MC)
- Min-Latency (ML)
- Min-Memory (MM)
20Superbox Traversal Min Cost
- Min-Cost (MC) Attempts to optimize throughput
by minimizing the number of box calls per output
tuple. - Accomplished by traversing the superbox in post
order. - A box is scheduled for execution only after all
the boxes in its sub-tree are scheduled.
21Superbox Traversal Min Cost (Contd.)
- Assume each box has
- A Processing cost per tuple of p
- A Box call overhead of o
- A selectivity equal to one (what is this?)
- Exactly one non-empty input queue that contains a
single tuple. - MC traversal executes each box only once
- In which order the boxes are traversed?
- b4 ? b5 ? b3 ? b2 ? b6 ? b1
- Execution cost - 15p 6o (why?)
- Average output tuple latency is - 12.5p 6o
22Superbox Traversal Min Latency
- Min-Latency (ML) Average latency of the output
tuples can be reduced by producing initial output
tuples as fast as possible. - Defines a value called output cost for each box.
- An estimate of the latency incurred in producing
one output tuple. - Output Selectivity
- How many tuples must be processed from the input
to produce 1 tuple at the output. - Product of selectivity of all boxes downstream,
including the current box. - Relation between output selectivity and Output
cost? - Approximately inversely proportional (depends on
the cost of boxes involved.)
23Superbox Traversal Min Latency
- Traversal?
- b1 ? b2 ? b1 ? b6 ? b1 ? b4 ? b2 ? b1 ? b3 ? b2
? b1 ? b5 ? b3 ? b2 ? b1 - The ML traversal incurs nine extra box calls over
an MC traversal - Note MC incurred six box calls.
- Total execution cost is 15p 15o
- Which one has lower execution time ML or MC?
- MC always has a lower execution time.
24Superbox Traversal Min Memory
- Min-Memory (MM) Attempts to minimize memory
usage - Schedules boxes in an order that yields maximum
increase in available memory - Defines Expected memory reduction rate for each
box. - EMRR function (current queue size, Selectivity,
Cost)
25Superbox Traversal Min Memory
- Assume following box selectivity and cost
- b1 (0.9, 2) b2 (0.4, 2) b3
(0.5, 1) b4 (1.0, 2) b5 (0.4, 3)
b6 (0.6, 1) - Assuming initial queue size of 1
- Computed EMRR for the boxes are
- b10.05, b20.3, b30.5, b40, b50.2, b60.4
- What will be the Scheduling Plan?
- b3 ? b6 ? b2 ? b5 ? b3 ? b2 ? b1 ? b4 ? b2 ? b1
26Tuple Batching (Train Processing)
- A Tuple Train is the process of executing tuples
in a batch within a single operator call. - The goal of Tuple Train processing is to reduce
overall processing cost. How? - Decreased number of total box calls.
- Cuts down on low level overhead such as context
switching, scheduling, and execution queue
maintenance - Improves memory utilization (low memory)
- Reduces the tuple from shuttling back and forth
between memory and disk. - Some operators execute faster with larger number
of tuples available in their queues.
27Tuple Batching
- The Aurora scheduler implements train processing
by telling each box when to execute and how many
queued tuples to process. - Aurora allows an arbitrary number of tuples to be
contained within a train. - What variables dictate the size of a train?
- Variance in latencies
- Total memory footprint
28Operator Batching Evaluation
Capacity Percent of system resources used.
- RR_BAAT - Round Robin - Box At A Time.
- MC_AAAT Minimum Cost - Application At A Time.
- What can we infer?
- The scheduling overhead of the box-at-a-time
approach is very evident.
29Latency Min-cost Vs Min-Latency
- What can we Infer?
- For larger processing costs, ML wins as it
optimizes the traversal by minimizing output
latency. - For smaller box processing costs, box call
overheads dominate overall costs and MC wins.
30Memory requirements Evaluation
The curves are normalized with respect to the MM
values.
- Inference?
- ML is most inefficient in its use of memory with
MC performing second. - Crossover towards the end of the time period is a
consequence of the fact that different traversals
take different times to finish.
31Tuple Batching - Evaluation
Train size (x-axis) is given as a percentage of
the queue size. Overhead Total execution time
less processing time. In order to isolate the
effects of operator scheduling, round-robin BAAT
was used for this experiment.
- Inference?
- For a burst size of 4, the overhead quadruples.
- When the train size is equal to one (the entire
queue), the average overhead approaches the
overhead for the non bursty case.
32Comparison of Execution times
- - TAAT (tuple-at-a-time)
- - BAAT (tuple trains)
- MC (Superbox)
- Number at the top shows actual time for
processing 100k tuples in the system.
- TAAT is significantly worse than the other
methods. - Superbox scheduling decreases the overall
execution time of the system running tuple-trains
almost by 50 - As we go from left to right, the scheduler
algorithms become increasingly more intelligent
and sophisticated, taking more time to generate
the scheduling plans.
33QoS-Driven Scheduling
- Keep track of the latency of tuples that reside
at the queues. - Pick the tuples whose execution will provide the
most expected increase in the aggregate QoS. - This approach is not scalable (Why?)
- Tuple batching will be difficult
- High scheduling overhead.
- Aurora Scheduler maintains latency information at
the granularity of individual boxes - Latency of a box is the averaged latencies of the
tuples in its input queue.
34QoS-Driven Scheduling - Algorithm
Expected output latency Eol(b) latency(b)
cost(D(b)) Utility utility(b)
gradient(eol(b)) Expected slack time est(b), is
an indication of how close a box is to a critical
point. Critical point A point where the QoS
changes sharply.
- Priority is assigned so as to order the boxes in
terms of their utility and urgency. - Utility - This value is an estimation of where a
boxs tuples currently are on the QoS-latency
curve at the corresponding output. When is the
Utility lowest and highest? - Urgency Given by the expected slack time.
35QoS-Driven Scheduling
- Scheduler algorithm
- First choose for execution those boxes that have
the highest utility, - Choose from among those that have the same
utility, the ones that have the minimum slack
time.
36QoS-Driven Scheduling - Evaluation
37CONCLUSION
- Presents an experimental investigation of
scheduling algorithms for stream data management
systems. - The authors
- showed that a naïve scheduling approach of using
a thread per box does not scale. - showed that the approach of train scheduling and
superbox scheduling help a lot to reduce system
overheads. - addressed QoS issues and extended their basic
algorithms to address application-specific QoS
expectations.