Title: Query Processing and Networking Infrastructures
1Query Processing and Networking Infrastructures
- Day 2 of 2
- Joe Hellerstein
- UC Berkeley
- September 27, 2002
2Outline
- Day 1 Query Processing Crash Course
- Intro
- Queries as indirection
- How do relational databases run queries?
- How do search engines run queries?
- Scaling up cluster parallelism and distribution
- Day 2 Research Synergies w/Networking
- Queries as indirection, revisited
- Useful (?) analogies to networking research
- Some of our recent research at the seams
- Some of your research?
- Directions and collective discussion
3Indirections
4Standard Spatial Indirection
- Allows referent to move without changes to
referers - Doesnt matter where the object is, we find it.
- Alternative copying
- Works if updates are managed carefully, or dont
exist
5Temporal Indirection
- Asynchronous communication is indirection in time
- Doesnt matter when the object arrives, you find
it - Analogy to space
- Sender ? referer
- Recipient ? referent
6Generalizing
- Indirection in Space
- x-to-one or x-to-many?
- Physical or Logical mapping?
- Indirection in Time
- Persistence model storage or re-xmission
- Persistence role sender or receiver
7Indirection in Space, Redux
- One-to-one, one-to-many, many-to-many?
- Standard relational issue
- E.g. virtual address is many-to-one
- E.g. email distribution list is one-to-many
- Physical or logical
- Mapping table?
- E.g. page tables, mailing list, DNS, multicast
group lists - Logical
- E.g. queries, subscriptions, interests
8Indirection in Time, Redux
- Persistence model storage or re-xmission
- Storage e.g. DB, heap, stack, NW buffer,
mailqueue - Re-xmission e.g. polling, retries.
- Joe is so persistent
- Persistence of put or get
- Put e.g. DB insert, email, retry
- Get e.g. subscription, polling
9Examples Storage Systems
- Virtual Memory System
- Space 1-to-1, physical
- Time synchronous (no indirection)
- Database System
- Space many-to-many, logical
- Time synchronous (no indirection)
- Broadcast Disks
- Space 1-to-1
- Time re-xmitted put
10Examples Split-Phase APIs
- Polling
- Space no indirection
- Time re-xmitted get
- Callbacks
- Space no indirection
- Time stored get
- Active Messages
- Space no indirection
- Time stored get
- App stores a get with putter, which tags it on
messages
11Examples Communication
- Email
- Space One-to-many, physical
- Mapping is one-to-many, delivery is one-to-one
(copies) - Time stored put
- Multicast
- Space One-to-many, physical
- Both mapping and delivery are one-to-many
- Time roughly synchronous?
12Examples Distributed APIs
- RPC
- Space 1-to-1, physical
- Can be 1-to-many
- Time synchronous (no indirection)
- Messaging systems
- Space 1-to-1, physical
- Often 1-to-many
- Time depends!
- Transactional messaging is stored put
- Exactly-once transmission guaranteed
- Other schemes are re-xmitted put
- At least once transmission. Idempotency of
message becomes important!
13Examples Logic-based APIs
- Publish-Subscribe
- Space one-to-many, logical
- Time stored receiver
- Tuplespaces
- Space one-to-many, logical
- Time stored sender
14Indirection Summary
- 2 binary indirection variables for space, 2 for
time - Can have indirection in one without the other
- Leads to 24 indirection options
- 16 joint space/time indirections, 4 space-only, 4
time-only - And few lessons about the tradeoffs!
- Note issues here in performance and SW
engineering and - E.g. Are tuplespaces better than pub/sub?
- Not a unidimensional question!
15Rendezvous
- Indirection on both sender and receiver side
- In time and/or space on each side
- Most general neither sender nor receiver know
where or when rendezvous will happen! - Each chases a reference for where
- Each must persist for when
16Join as Rendezvous
- Recall pipelining hash join
- Combine all blue and gray tuples that match
- A batch rendezvous
- In space the data items were not stored in a
fixed location, copied into HT - In time both sides do put-persist in the join
algorithm via storage - A hint of things to come
- In parallel DBs, the hash table is
content-addressed (via the exchange routing
function) - What if hash table is distributed?
- If a tuple in the join is doing get, then is
there a distinction between sender/recipient?
Between query and data?
17Some resonances
- We said that query systems are an indirection
mechanism. - Logical, many-to-many, but synchronous
- Query-response
- And some dataflow techniques inside query engines
seem to provide useful indirection mechanisms - If we add a network into the picture, life gets
very interesting - Indirection in space very useful
- Indirection in time is critical
- Rendezvous is a basic operation
18More Resonance
19More Interaction CS262 Experiment w/ Eric Brewer
- Merge OS DBMS grad class, over a year
- Eric/Joe, point/counterpoint
- Some tie-ins were obvious
- memory mgmt, storage, scheduling, concurrency
- Surprising QP and networks go well side by side
- E.g. eddies and TCP Congestion Control
- Both use back-pressure and simple Control Theory
to learn in an unpredictable dataflow
environment
20Scout
- Paths the key to comm-centric OS
- Making Paths Explicit in the Scout Operating
System, David Mosberger and Larry L. Peterson.
OSDI 96.
Figure 3Example Router Graph
21CLICK
- A NW router is a query plan!
- With a twist flow-based context
- An opportunity for autonomous query
optimization
22Revisiting a NW Classic with DB Goggles
23Clark Tennenhouse, SIGCOMM 90
- Architectural Considerations for a New Generation
of Protocols - Love it for two reasons
- Tries to capture the essence of what networks do
- Great for people who need the 10,000-foot view!
- Im a fan of doing this (witness last week)
- Tries to move the community up the food chain
- Resonances everywhere!!
24CT Overview (for amateurs like me)
- Core function of protocols data xfer
- Data Manipulation
- buffer, checksum, encryption, xfer to/from app
space, presentation - Transfer Control
- flow/congestion ctl, detecting transmission
problems, acks, muxing, timestamps, framing
25C Ts Wacky Ideas
- Thesis nets are good at xfer control, not so
good at data manipulation - Some CT wacky ideas for better data manipulation
- Xfer semantic units, not packets (ALF)
- Auto-rewrite layers to flatten them (ILP)
- Minimize cross-layer ordering constraints
- Control delivery in parallel via packet content
26DB People Should Be Experts!
- BUT remember
- Basic Internet assumptiona network of unknown
topology and with an unknown, unknowable and
constantly changing population of competing
conversations (Van Jacobson) - Spoils the whole optimize-then-execute
architecture of query optimization - What happens when denvironment/dt lt query
length?? - What about the competing conversations?
- How do we handle the unknown topology?
- What about partial failure?
- Ideally, wed like
- the semantics and optimization of DB dataflow
- with the agility and efficiency of NW dataflow
27The Cosmic Convergence
Data Models, Query Opt, DataScalability
DATABASE RESEARCH
Adaptive QueryProcessing
ContinuousQueries, Streams
P2P QueryEngines
SensorQuery Engines
XML Routing
Router Toolkits
Content Addressingand DHTs
DirectedDiffusion
NETWORKING RESEARCH
Adaptivity, Federated Control, GeoScalability
28What does the QP perspective add?
- In terms of high-level languages?
- In terms of a reusable set of operators?
- In terms of optimization opportunities?
- In terms of batch-I/O tricks?
- In terms of approximate answers?
- A safe route to Active Networks?
- Not computationally complete
- Optimizable and reconfigurable -- data
independence applies - Fun to be had here!
- Addressing a few fronts at Berkeley
29Some of our work at the seams
- Starting with centralized engine for remote data
sets and streams - Telegraph eddies, SteMs, FLuX
- Deep Web, filesharing systems, sensor streams
- More recently, querying sensor networks
- TinyDB/TAG in-network queries
- And DHT-based overlay networks
- PIER
30Telegraph Overview
31Telegraph An Adaptive Dataflow System
- Themes Adaptivity and Sharing
- Adaptivity encapsulated in operators
- Eddies for order of operations
- State Modules (SteMs) for transient state
- FLuX for parallel load-balance and availability
- Work- and state-sharing across flows
- Unlike traditional relational schemes, try to
share physical structures - Franklin, Hellerstein, Hong and students (to
follow)
32Telegraph Architecture
Request Parsing, Metadata
XML Catalog
Explicit Dataflows
SQL
Online Query Processing
Join Select Project Group Aggregate
Transitive Closure DupElim
InterModule Comm and scheduling (Fjords)
Modules
Adaptive Routing and Optimization
Juggle
Eddy
FLuX
SteM
Ingress
File Reader
Sensor Proxy
P2P Proxy
TeSS
33Continuous Adaptivity Eddies
Eddy
- A little more state per tuple
- Ready/done bits (extensible a la
Volcano/Starburst) - Minimal state in Eddy itself
- Queue parameters being learning
- Decisions which tuple in queue to which operator
- Query processing dataflow routing!!
- Ron Avnur
34Two Key Observations
- Break the set-oriented boundary
- Usual DB model algebra expressions (R S)
T - Common DB implementation pipelining operators!
- Subexpressions neednt be materialized
- Typical implementation is more flexible than
algebra - We can reorder in-flight operators
- Dont rewrite graph. Impose a router
- Graph edge absence of routing constraint
- Observe operator consumption/production rates
- Consumption cost. Production costselectivity
- Could break these down per values of tuples
- So fun!
- Simple, incremental, general
- Brings all of query optimization online
- And hence a bridge to ML, Control Theory, Queuing
Theory
35State Modules (SteMs)
static dataflows
- Goal Further adaptivity through competition
- Multiple mirrored sources (AMs)
- Handle rate changes, failures, parallelism
- Multiple alternate operators
- Join Routing State
- SteM operator manages tradeoffs
- State Module, unifies caches, rendezvous buffers,
join state - Competitive sources/operators share
building/probing SteMs - Join algorithm hybridization!
- Eddies SteMs tackle the full (single-site)
query optimization problem online - Vijayshankar Raman, Amol Deshpande
eddy
eddy stems
36FLuX Routing Across Cluster
- Fault-tolerant, Load-balancing eXchange
- Continuous/long-running flows need high
availability - Big flows need parallelism
- Adaptive Load-Balancing reqd
- FLuX operator Exchange plus
- Adaptive flow partitioning (River)
- Transient state replication migration
- Replication checkpointing for SteMs
- Note set-based, not sequence-based!
- Needs to be extensible to different ops
- Content-sensitivity
- History-sensitivity
- Dataflow semantics
- Optimize based on edge semantics
- Networking tie-in again
- At-least-once delivery?
- Exactly-once delivery?
- In/Out of order?
- Mehul Shah
37Continuously AdaptiveContinuous Queries (CACQ)
- Continuous Queries clearly need all this stuff!
- Natural application of Telegraph infrastructure
- 4 Ideas in CACQ
- Use eddies to allow reordering of ops.
- But one eddy will serve for all queries
- Queries are data join with Grouped Filter
- A la stored get!
- This idea extended in PSOUP (Chandrasekaran
Franklin) - Explicit tuple lineage
- Mark each tuple with per-op ready/done bits
- Mark each tuple with per-query completed bits
- Joins via SteMs, shared across all queries
- Note mixed-lineage tuples in a SteM. I.e.
shared state is not shared algebraic expressions! - Delete a tuple from flow only if it matches no
query - Sam Madden, Mehul Shah, Vijayshankar Raman,
Sirish Chandrasekaran
38Sensor QP TinyDB/TAG
39Wireless Sensor Networks
Palm DevicesLinux
Smart Dust MotesTinyOS
- A spectrum of devices
- Varying degrees of power and network constraints
- Fun is on the small side!
- Our current platform Mica and TinyOS
- 4Mhz Atmel CPU, 4KB RAM, 40kBit radio, 512K
EEPROM, 128K Flash - Sensors temp, light, accelerometer,
magnetometer, mic, etc. - Wireless, single-ported, multi-hop ad-hoc network
- Spanning-tree communication through root
40TinyDB
- A query/trigger engine for motes
- Declarative (SQL-like) language for
optimizability - Data independence arguments in spades here!
- Non-programmers can deal with it
- Lots of challenges at the seams of queries and
routing - Query plans over dynamic multi-hop network
- With power and bandwidth consumption as key
metrics - Sam Madden (w/Hellerstein, Hong, Franklin)
41Focus Hierarchical Aggregation
- Aggregation natural in sensornets
- The big picture typically interesting
- Aggregation can smooth noise and loss
- E.g. signal processing aggs like wavelets
- Provides data reduction
- Power/Network Reductionin-network aggregation
- Hierarchical version of parallel aggregation
- Tricky design space
- power vs. quality
- topology-selection
- value-based routing
- dynamic environment requires adaptivity
42TinyDB Sample Apps
- Habitat Monitoring what is the average
humidity in the populated petrel burrows on
Great Duck Island right now? - Smart Office find me the conference rooms that
have been reserved but unoccupied for 5
minutes. - Home Automation lower blinds when light
intensity is above a threshold.
43Performance in SensorNets
- Power consumption
- Communication gtgt Computation
- METRIC radio wake time
- Send gt Receive
- METRIC messages generated
- Run for 5 years vs. Burn power for critical
events vs. Run my experiment - Bandwidth Constraints
- Internal gtgt External
- Volume gtgt surface area
- Result Quality
- Noisy sensors
- Discrete sampling of continuous phenomena
- Lossy communication channel
44TinyDB
- SQL-like language for specifying continuous
queries and triggers - Schema management, etc.
- Proxy on desktop, small query engine per mote
- Plug and play (query snooping)
- To keep the engine tiny, use an eddy-style arch
- One explicit copy of each iterators code image
- Adaptive dataflow in network
- Alpha available for download on SourceForge
45Some of the Optimization Issues
- Extensible Aggregation API
- Init(), Iter(), SplitFlow(), Close()
- Properties
- Amount of intermediate state
- Duplicate sensitivity
- Monotonicity
- Exemplary vs. Summary
- Hypothesis Testing
- Snooping and Suppression
- Compression, Presumption, Interpolation
- Generally, QP and NW issues intertwine!
46PIER Querying the Internet
47Querying the Internet
- As opposed to querying over the Internet
- Have to deal with Internet realities
- Scale, dynamics, federated admin, partial
failure, etc. - Standard distributed DBs wont work
- Applications
- Start with real-time, distributed network
monitoring - Traffic monitoring, intrusion/spam detection,
software deployment detection (e.g. via TBIT),
etc. - Use PIERs SQL as a workload generator for
networks? - Virtual tables determine load produced by each
site - Queries become a way of specifying site-to-site
communication - Move to infect the network more deeply?
- E.g. Indirection schemes like i3, rendezvous
mechanisms, etc. - Overlays only?
48And p2p QP, Obviously
- Gnutella done right
- And its so easy! -)
- Crawler-free web search
- Bring WYGIWIGY queries to the people
- Ranking, recommenders, etc.
- Got to be more fun here
- If p2p takes off in a big way, queries have to be
a big piece - Why p2p DB, anyway?
- No good reason I can think of! -)
- Focus on the grassroots nature of p2p
- Schema integration and transactions and ??
- No! Work with what you got! Query the data
thats out there - Nothing complicated for users will fly
- Avoid the DB word P2P QP, not P2P DB
49Approach Leverage DHTs
- Distributed Hash Tables
- Family of distributed content-routing schemes
- CAN, CHORD, Pastry, Tapestry, etc.
- Internet scale hash table
- A la wide-area, adaptive Exchange routing table
- With some notion of storage
- Leverage DHTs aggressively
- As distributed indexes on stored data
- As state modules for query processing
- E.g. use DHTs as the hash tables in a hash join
- As rendezvous points for exchanging info
- E.g. Bloom Filters
50PIER P2p Information Exchange and Retrieval
- Relational-style query executor
- With front-ends for SQL and catalogs
- Standard and continuous queries
- With access to DHT APIs
- Currently CAN and Chord, working on Tapestry
- Common DHT API would help
- Currently simulating queries running on 10s of
thousands of nodes - Look ma, it scales!
- Widest-scale relational engine ever, looks
feasible - Most of the simulator code will live on in
implementation - On Millennium and PlanetLab this fall/winter
- Ryan Huebsch and Boon Thau Loo (w/Hellerstein,
Shenker, Stoica)
51PIER Challenges
- How does this batch workload stress DHTs?
- How does republishing of soft-state interact with
dataflow? - And semantics of query answers
- Materialization/precomputation/caching
- Physical tuning meets SteMs meets materialized
views - How to do query optimization in this context
- Distributed eddies!
- Partial failure a reality
- At storage nodes, query execution nodes?
- Impact on results, mitigation
- What about aggregation?
- Similarities/difference with TAG?
- With Astrolabe Birman et al?
- The usual CQ and data stream query issues,
distributed - Analogous to work in Telegraph, and at Brown,
Wisconsin, Stanford
52All together now?
- I thought about changing the names
- Telegraph, Teletiny?
- The group didnt like the branding
- Teletubby!
- Seriously integration?
- Its a plausible need
- Sensor data map data historical sensor logs
- Filesharing Web
- We have done both of these cheesily
- But fun questions of doing it right
- E.g. pushing predicates and data into sensor net
or not?
53References Resources
54Database Texts
- Undergrad textbooks
- Ramakrishnan Gehrke, Database Management
Systems - Silberschatz, Korth, Sudarshan, Database System
Concepts - Garcia-Molina, Ullman, Widom, Database Systems -
The Complete Book - ONeil ONeil, DATABASE Principles,
Programming, and Performance - Abiteboul, Hull, Vianu, Foundations of Databases
- Graduate texts
- Stonebraker Hellerstein, Readings in Database
Systems (a.k.a The Red Book) - Brewer Hellerstein Readings book (e-book?) in
progress. Fall 2003?
55Research Links
- DB group at Berkeley db.cs.berkeley.edu
- GiST gist.cs.berkeley.edu
- Telegraph telegraph.cs.berkeley.edu
- TinyDB telegraph.cs.berkeley.edu/tinydb
berkeley.intel-research.net/tinydb - Red Book redbook.cs.berkeley.edu