Title: Implementing Declarative Overlays
1Implementing Declarative Overlays
- Boon Thau Loo1
- Tyson Condie1, Joseph M. Hellerstein1,2,
- Petros Maniatis2, Timothy Roscoe2, Ion Stoica1
- 1University of California at Berkeley, 2Intel
Research Berkeley
2Overlays Everywhere
- Overlay networks are widely used today
- Routing and forwarding component of large-scale
distributed systems - Provide new functionality over existing
infrastructure - Many examples, variety of requirements
- Packet delivery Multicast, RON
- Content delivery CDNs, P2P file sharing, DHTs
- Enterprise systems MS Exchange
Overlay networks are an integral part of many
large-scale distributed systems.
3Problem
- Non-trivial to design, build and deploy an
overlay correctly - Iterative design process
- Desired properties ? Distributed algorithms and
protocols ? Simulation ? Implementation ?
Deployment ? Repeat - Each iteration takes significant time and
utilizes a variety of expertise
4The Goal of P2
- Make overlay development more accessible
- Focus on algorithms and protocol designs, not the
implementation - Tool for rapid prototyping of new overlays
- Specify overlay network at a high level
- Automatically translate specification to protocol
- Provide execution engine for protocol
- Aim for good enough performance
- Focus on accelerating the iterative design
process - Can always hand-tune implementation later
5Outline
- Overview of P2
- Architecture By Example
- Data Model
- Dataflow framework
- Query Language
- Chord
- Additional Benefits
- Overlay Introspection
- Automatic Optimizations
- Conclusion
6Traditional Overlay Node
Traditional Overlay Node
Packets Out
Packets In
Overlay Program
7P2 Overlay Node
Packets Out
Packets In
Overlay Program
P2 Query Processor
8Advantages of the P2 Approach
- Declarative Query Language
- Concise/high level expression
- Statically checkable (termination, correctness)
- Ease of modification
- Unifying framework for introspection and
implementation - Automatic optimizations
- Query and dataflow level
9Data Model
- Relational data relational tables and tuples
- Two kinds of tables
- Stored, soft state
- E.g. neighbor(Src,Dst), forward(Src,Dst,NxtHop)
- Transient streams
- Network messages message (Rcvr, Dst)
- Local timer-based events periodic (NodeID,10)
10Dataflow framework
- Dataflow graph
- C dataflow elements
- Similar to Click
- Flow elements (mux, demux, queues)
- Network elements (cc, retry, rate limitation)
- In addition
- Relational operators (joins, selections,
projections, aggregation)
11Outline
- Overview of P2
- Architecture By Example
- Data Model
- Dataflow framework
- Query Language
- Chord in P2
- Additional Benefits
- Overlay Introspection
- Automatic Optimizations
- Conclusion
Simple ring routing example
12Example Ring Routing
Each node has an address and an identifier
Objects served by successor
Each object has an identifier.
Every node knows its successor
13Ring State
- Stored tables
- node(NAddr, N)
- succ(NAddr, Succ, SAddr)
node(IP58,58) succ(IP58,60,IP60)
node(IP40,40) succ(IP40,58,IP58)
14Example Ring lookup
- Find the responsible node for a given key k?
- n.lookup(k)
- if k in (n, n.successor)
- return n.successor.addr
- else
- return n.successor. lookup(k)
15Ring Lookup Events
- Event streams
- lookup(Addr, Req, K)
- response(Addr, K, Owner)
node(IP58,58) succ(IP58,60,IP60)
response(IP37,59,IP60)
node(IP40,40) succ(IP40,58,IP58)
n.lookup(k) if k in (n, n.successor) return
n.successor.addr else return n.successor.
lookup(k)
lookup(IP58,IP37,59)
lookup(IP40,IP37,59)
lookup(IP37,IP37,59)
16Pseudocode ? Dataflow Strands
Pseudocode n.lookup(k) if k in (n, n.successor)
return n.successor.addr else return
n.successor. lookup(k)
17Dataflow Strand
Strand Elements
Element2
Element1
Actions
Event Stream
Elementn
Event Incoming network messages, periodic timers
Condition Process event using strand elements
Action Outgoing network messages, local table
updates
18Pseudocode ? Strand 1
- Stored tables
- node(NAddr, N)
- succ(NAddr, Succ, SAddr)
n.lookup(k) if k in (n, n.successor) return
n.successor.addr else return
n.successor.lookup(k)
- Event streams
- lookup(Addr, Req, K)
- response(Addr, K, Owner)
Event
RECEIVE lookup(NAddr, Req, K)
node(NAddr, N) succ(NAddr, Succ,
SAddr) K in (N, Succ
Condition
Action
SEND response(Req, K, SAddr) to Req
19Pseudocode to Strand 1
Event RECEIVE lookup(NAddr, Req, K) Condition
node(NAddr, N) succ(NAddr, Succ, SAddr)
K in (N, Succ Action SEND response(Req, K,
SAddr) to Req
Dataflow strand
Match lookup.Addr succ.Addr
Match lookup.Addr node.Addr
Join
Join
Format Response(Req,K,SAddr)
Project
Filter K in (N,Succ)
Select
Response
n.lookup(k) if k in (n, n.successor) return
n.successor.addr else return n.successor.
lookup(k)
20Pseudocode to Strand 2
Event RECEIVE lookup(NAddr, Req, K) Condition
node(NAddr, N) succ(NAddr, Succ, SAddr)
K not in (N, Succ
Action SEND lookup(SAddr, Req, K) to SAddr
Dataflow strand
Join lookup.Addr succ.Addr
Join lookup.Addr node.Addr
Select K not in (N,Succ)
Project lookup(SAddr,Req,K)
lookup
n.lookup(k) if k in (n, n.successor) return
n.successor.addr else return n.successor.
lookup(k)
21Strand Execution
lookup
lookup
lookup
response
lookup/ response
lookup
22Actual Chord Lookup Dataflow
23Query Language Overlog
- SQL equivalent for overlay networks
- Based on Datalog
- Declarative recursive query language
- Well-suited for querying properties of graphs
- Well-studied in database literature
- Static analysis, optimizations, etc
- Extensions
- Data distribution, asynchronous messaging,
periodic timers and state modification
24Query Language Overlog
Datalog rule syntax
ltheadgt ? ltcondition1gt, ltcondition2gt, ,
ltconditionNgt.
Overlog rule syntax
ltActiongt ? lteventgt, ltcondition1gt, ,
ltconditionNgt.
25Query Language Overlog
Overlog rule syntax
ltActiongt ? lteventgt, ltcondition1gt, ,
ltconditionNgt.
Event RECEIVE lookup(NAddr, Req, K) Condition
lookup(NAddr, Req, K) node(NAddr, N)
succ(NAddr, Succ, SAddr) K in (N,
Succ Action SEND response(Req, K, SAddr) to Req
response_at_Req(Req, K, SAddr) ?
lookup_at_NAddr(Naddr, Req, K),
node_at_NAddr(NAddr, N), succ_at_NAddr(NAddr, Succ,
SAddr), K in (N,Succ.
26P2-Chord
- Chord Routing, including
- Multiple successors
- Stabilization
- Optimized finger maintenance
- Failure recovery
- 47 OverLog rules
- 13 table definitions
- Other examples
- Narada, flooding, routing protocols
10 pt font
27Performance Validation
- Experimental Setup
- 100 nodes on Emulab testbed
- 500 P2-Chord nodes
- Main goals
- Validate expected network properties
28Sanity Checks
- Logarithmic diameter and state (correct)
- BW-efficient 300 bytes/s/node
29Churn Performance
- Metric Consistency Rhea at al
- P2-Chord
- P2-Chord_at_64mins 97 consistency
- P2-Chord_at_16mins 84 consistency
- P2-Chord_at_8min 42 consistency
- Hand-crafted Chord
- MIT-Chord_at_47mins 99.9 consistency
- Outperforms P2 under higher churn
- Not intended to replace a carefully hand-crafted
Chord
30Benefits of P2
- Introspection with Queries
- Automatic optimizations
- Reconfigurable Transport (WIP)
31Introspection with Queries
With Atul Singh (Rice) and Peter Druschel (MPI)
- Unifying framework for debugging and
implementation - Same query language, same platform
- Execution tracing/logging
- Rule and dataflow level
- Log entries stored as tuples and queried
- Correctness invariants, regression tests as
queries - Is the Chord ring well formed? (3 rules)
- What is the network diameter? (5 rules)
- Is Chord routing consistent? (11 rules)
32Automatic Optimizations
- Application of traditional Datalog optimizations
to network routing protocols (SIGCOMM 2005) - Multi-query sharing
- Common subexpression elimination
- Caching and reuse of previously computed results
- Opportunistically share message propagation
across rules
33Automatic Optimizations
- Cost-based optimizations
- Join ordering affects performance
Join lookup.Addr succ.Addr
Join lookup.Addr node.Addr
Join lookup.Addr succ.Addr
Join lookup.Addr node.Addr
Select K not in (N,Succ)
lookup
Project lookup(SAddr,Req,K)
lookup
Project Response(Req,K,SAddr)
response
Select K in (N,Succ)
34Open Questions
- The role of rapid prototyping?
- How good is good enough performance for rapid
prototypes? - When do developers move from rapid prototypes to
hand-crafted code? - Can we get achieve production quality overlays
from P2?
35Future Work
- Right language
- Formal data and query semantics
- Static analysis
- Optimizations
- Termination
- Correctness
36Conclusion
- P2 Declarative Overlays
- Tool for rapid prototyping new overlay networks
- Declarative Networks
- Research agenda Specify and construct networks
declaratively - Declarative Routing Extensible Routing with
Declarative Queries (SIGCOMM 2005)
37Thank You
http//p2.cs.berkeley.edu
38Latency CDF for P2-Chord
Median and average latency around 1s.