Title: Implementing Declarative Overlays
1Implementing Declarative Overlays
- From two talks by
- Boon Thau Loo1
- Tyson Condie1, Joseph M. Hellerstein1,2,
- Petros Maniatis2, Timothy Roscoe2, Ion Stoica1
- 1University of California at Berkeley, 2Intel
Research Berkeley
2Overlays Everywhere
- Overlay networks are widely used today
- Routing and forwarding component of large-scale
distributed systems - Provide new functionality over existing
infrastructure - Many examples, variety of requirements
- Packet delivery Multicast, RON
- Content delivery CDNs, P2P file sharing, DHTs
- Enterprise systems MS Exchange
Overlay networks are an integral part of many
large-scale distributed systems.
3Problem
- Non-trivial to design, build and deploy an
overlay correctly - Iterative design process
- Desired properties ? Distributed algorithms and
protocols ? Simulation ? Implementation ?
Deployment ? Repeat - Each iteration takes significant time and
utilizes a variety of expertise
4The Goal of P2
- Make overlay development more accessible
- Focus on algorithms and protocol designs, not the
implementation - Tool for rapid prototyping of new overlays
- Specify overlay network at a high level
- Automatically translate specification to protocol
- Provide execution engine for protocol
- Aim for good enough performance
- Focus on accelerating the iterative design
process - Can always hand-tune implementation later
5Outline
- Overview of P2
- Architecture By Example
- Data Model
- Dataflow framework
- Query Language
- Chord
- Additional Benefits
- Overlay Introspection
- Automatic Optimizations
- Conclusion
6All-Pairs Reachability
R1 reachable(S,D) ? link(S,D)
R2 reachable(S,D) ? link(S,Z), reachable(Z,D)
For all nodes S,D, If there is a
link from S to D, then S can reach D.
link(a,b) there is a link from node a to node
b
reachable(a,b) node a can reach node b
- Input link(source, destination)
- Output reachable(source, destination)
7All-Pairs Reachability
R1 reachable(S,D) ? link(S,D)
R2 reachable(S,D) ? link(S,Z), reachable(Z,D)
For all nodes S,D and Z, If there is
a link from S to Z, AND Z can reach D, then S
can reach D.
- Input link(source, destination)
- Output reachable(source, destination)
8All-Pairs Reachability
R1 reachable(S,D) ? link(S,D)
R2 reachable(S,D) ? link(S,Z), reachable(Z,D)
For all nodes S,D, If there is a
link from S to D, then S can reach D.
link(a,b) there is a link from node a to node
b
reachable(a,b) node a can reach node b
- Input link(source, destination)
- Output reachable(source, destination)
9All-Pairs Reachability
R1 reachable(S,D) ? link(S,D)
R2 reachable(S,D) ? link(S,Z), reachable(Z,D)
For all nodes S,D and Z, If there is
a link from S to Z, AND Z can reach D, then S
can reach D.
- Input link(source, destination)
- Output reachable(source, destination)
10Towards Network Datalog
- Specify tuple placement
- Value-based partitioning of tables
- Tuples to be combined are co-located
- Rule rewrite ensures body is always single-site
- All communication is among neighbors
- No multihop routing during basic rule execution
- Link-restricted rules Enforced via simple
syntactic restrictions
11Network Datalog
R1 reachable(_at_S,D) ? link(_at_S,D) R2
reachable(_at_S,D) ? link(_at_S,Z), reachable(_at_Z,D)
Query reachable(_at_a,N)
Query reachable(_at_M,N)
link
link
link
link
Input table
b
d
c
a
reachable
reachable
reachable
reachable
Output table
Query reachable(_at_a,N)
12All-Pairs Reachability
R1 reachable(S,D) ? link(S,D)
R2 reachable(S,D) ? link(S,Z), reachable(Z,D)
For all nodes S,D, If there is a
link from S to D, then S can reach D.
link(a,b) there is a link from node a to node
b
reachable(a,b) node a can reach node b
- Input link(source, destination)
- Output reachable(source, destination)
13All-Pairs Reachability
R1 reachable(S,D) ? link(S,D)
R2 reachable(S,D) ? link(S,Z), reachable(Z,D)
For all nodes S,D and Z, If there is
a link from S to Z, AND Z can reach D, then S
can reach D.
- Input link(source, destination)
- Output reachable(source, destination)
14Query Execution
R1 path(_at_S,D,P) ? link(_at_S,D), P(S,D). R2
path(_at_S,D,P) ? link(_at_S,Z), path(_at_Z,D,P2), PS?P2.
Query path(_at_a,d,P)
link
link
link
link
Neighbor table
b
d
c
a
Forwarding table
15Chord Model
- Relational data relational tables and tuples
- Two kinds of tables
- Stored, soft state
- E.g. neighbor(Src,Dst), forward(Src,Dst,NxtHop)
- Transient streams
- Network messages message (Rcvr, Dst)
- Local timer-based events periodic (NodeID,10)
16Example Ring Routing
Each node has an address and an identifier
Objects served by successor
Each object has an identifier.
Every node knows its successor
17Ring State
- Stored tables
- node(NAddr, N)
- succ(NAddr, Succ, SAddr)
node(IP58,58) succ(IP58,60,IP60)
node(IP40,40) succ(IP40,58,IP58)
18Example Ring lookup
- Find the responsible node for a given key k?
- n.lookup(k)
- if k in (n, n.successor)
- return n.successor.addr
- else
- return n.successor. lookup(k)
19Ring Lookup Events
- Event streams
- lookup(Addr, Req, K)
- response(Addr, K, Owner)
node(IP58,58) succ(IP58,60,IP60)
response(IP37,59,IP60)
node(IP40,40) succ(IP40,58,IP58)
n.lookup(k) if k in (n, n.successor) return
n.successor.addr else return n.successor.
lookup(k)
lookup(IP58,IP37,59)
lookup(IP40,IP37,59)
lookup(IP37,IP37,59)
20Query Language Overlog
Datalog rule syntax
ltheadgt ? ltcondition1gt, ltcondition2gt, ,
ltconditionNgt.
Overlog rule syntax
ltActiongt ? lteventgt, ltcondition1gt, ,
ltconditionNgt.
21Query Language Overlog
Overlog rule syntax
ltActiongt ? lteventgt, ltcondition1gt, ,
ltconditionNgt.
Event RECEIVE lookup(NAddr, Req, K) Condition
lookup(NAddr, Req, K) node(NAddr, N)
succ(NAddr, Succ, SAddr) K in (N,
Succ Action SEND response(Req, K, SAddr) to Req
response_at_Req(Req, K, SAddr) ?
lookup_at_NAddr(Naddr, Req, K),
node_at_NAddr(NAddr, N), succ_at_NAddr(NAddr, Succ,
SAddr), K in (N,Succ.
22P2-Chord
- Chord Routing, including
- Multiple successors
- Stabilization
- Optimized finger maintenance
- Failure recovery
- 47 OverLog rules
- 13 table definitions
- Other examples
- Narada, flooding, routing protocols
10 pt font
23Introspection with Queries
With Atul Singh (Rice) and Peter Druschel (MPI)
- Unifying framework for debugging and
implementation - Same query language, same platform
- Execution tracing/logging
- Rule and dataflow level
- Log entries stored as tuples and queried
- Correctness invariants, regression tests as
queries - Is the Chord ring well formed? (3 rules)
- What is the network diameter? (5 rules)
- Is Chord routing consistent? (11 rules)
24Automatic Optimizations
- Application of traditional Datalog optimizations
to network routing protocols (SIGCOMM 2005) - Multi-query sharing
- Common subexpression elimination
- Caching and reuse of previously computed results
- Opportunistically share message propagation
across rules
25Automatic Optimizations
- Cost-based optimizations
- Join ordering affects performance
Join lookup.Addr succ.Addr
Join lookup.Addr node.Addr
Join lookup.Addr succ.Addr
Join lookup.Addr node.Addr
Select K not in (N,Succ)
lookup
Project lookup(SAddr,Req,K)
lookup
Project Response(Req,K,SAddr)
response
Select K in (N,Succ)
26Future Work
- Right language
- Formal data and query semantics
- Static analysis
- Optimizations
- Termination
- Correctness
27Conclusion
- P2 Declarative Overlays
- Tool for rapid prototyping new overlay networks
- Declarative Networks
- Research agenda Specify and construct networks
declaratively - Declarative Routing Extensible Routing with
Declarative Queries (SIGCOMM 2005)