Title: Engine Design: Stream Operators Everywhere
1Engine DesignStream Operators Everywhere
- Theodore Johnson
- ATT Labs Research
- johnsont_at_research.att.com
Contributors Chuck Cranor Vladislav
Shkapenyuk Oliver Spatscheck
2Early Data Reduction
- Goal Query high-speed links using inexpensive
off-the-shelf servers. - OC48 2 x 2.4 Gb/sec., 7 million packets/sec.
- OC192 2 x 7.2 Gb/sec., 21 million packets/sec.
- Goal Evaluate queries over every bit of every
packet. - Problem Not enough cycles in a second.
- 3 Ghz / 21 Mpacket/sec 142 cycles / packet
- Solution Push data reduction operators as far
down the protocol stack as possible. - Into the hardware if possible.
- View hardware bit twiddling as stream operators.
3Early Data Reduction in Gigascope
- Gigascope was designed to monitor very high speed
(optical) links using complex query sets. - Multiple levels of data reduction
- Data reduction in the NIC depends on NIC
capabilities - Snap length (projection)
- BPF filters
- Approximate filtering (bitmasks)
- Data reduction queries (replace the NIC run time
system) - Low level queries
- Run queries on kernel input buffers
- Preliminary filter for the query set
- Other possibilities .
4Example Router Monitoring
High Level Queries
- Selection/projection/aggregation
- Pre-filter
Low Level Queries
Kernel
Libpcap / BPF filters
Circular Buffer
Router
- Snap length (projection)
- Approximate filter (selection)
- Selection/projection/aggregation queries
(replace run time system)
Select Stream
Network Tap
5Stream Operators
- Problem Great heterogeneity in the specifics of
manipulating the hardware mechanism - Stream selection vs. NIC filters vs. kernel
filters, etc. - Programmable NIC vs. bit-twiddling NIC vs.
non-programmable NIC, etc. - Solution
- Define a set of stream operators to evaluate the
stream query. - Selection, projection, (partial) aggregation
- Merge, join, reorder ?
- Define hardware capabilities as the types of
queries they can execute - Multiple query optimization over the query set
- Low level query nodes feed multiple user queries
6Example (network monitoring)
select timestamp, sourceIP, destIP, source_port,
dest_port, len, total_length, gp_header from
GAMEPROTOCOL where sample_hash50, sourceIP,
destIP and protocol17 and offset0
- NIC snap_len 134 (projection)
- Pre-filter protocol17 and offset0
- Low-level query
select timestamp, sourceIP, destIP, source_port,
dest_port, len, total_length, gp_header from
GAMEPROTOCOL where sample_hash50, sourceIP,
destIP and protocol17 and offset0
7Other Operators?
- Merge Some NICs deliver packets out of order
- Optical links are not duplex
ordered stream
Almost ordered stream
Stream Merge
In Buffer
Out Buffer
In Buffer
Out Buffer
NIC
NIC
timestamp
timestamp
8Summary
- Early data reduction is critical for monitoring
very high-speed streams - Selection, projection, aggregation.
- Use stream operators to mask the complexity and
heterogenity of hardware / kernel data reduction. - Issues
- Multiple query optimization
- Push more complex operators down the stack?
- Join? Stratified sampling? Sketches?
- Optimization at low level / hardware level
- Approximate filters
- Avoid duplicate filters. Where to place them?
- Re-organization when the query set changes.