Title: Implementing Declarative Overlays
1Implementing Declarative Overlays
- Timothy Roscoe
- Joint work with Boon Thau Loo, Tyson Condie,
Joseph M. Hellerstein, Petros Maniatis, Ion
Stoica - Intel Research and U.C. Berkeley
- Tuesday, December 08, 2009
2Broad Challenge Network Routing Implementation
- Protocol-centric approach is usual
- Finite state automata
- Asynchronous messages / events
- Intuitive, but
- Hard to
- reason about structure
- check/debug
- compose/abstract/reuse
- But few, if any, new abstractions have emerged
for the problem.
3Talk Overview
- Approach take high-level view
- Routing and Query Processing
- Declarative specifications
- P2 a declarative overlay engine
- OverLog language
- Software dataflow implementation
- Evaluation Chord as a test case
- Ongoing and future work
4Declarative Networking
- The set of routing tables in a network represents
a distributed data structure - The data structure is characterized by a set of
ideal properties which define the network - Thinking in terms of structure, not protocol
- Routing is the process of maintaining these
properties in the face of changing ground facts - Failures, topology changes, load, policy
5Routing and Query Processing
- In database terms, the routing table is a view
over changing network conditions and state - Maintaining it is the domain of distributed
continuous query processing
6Distributed Continuous Query Processing
- Relatively new and active field
- SDIMS, Mercury, IrisLog, Sophia, etc., in
particular PIER - ? May not have all the answers yet
- But brings a wealth of experience and knowledge
from database systems - Relational, deductive, stream processing, etc.
7Goal Declarative Networks
- Express network properties as queries in a
high-level declarative language - More than configuration or policy language
- Apply static checking
- Modular decomposition
- Compile/interpret to maintain network
- Dynamic optimization (e.g. eddies)
- Sharing of computation/communication
8Other advantages
- Can incorporate other knowledge into routing
policies - E.g., physical network knowledge
- Naturally integrates discovery
- Often missing from current protocols
- Also provides an abstraction point for such
information - Knowledge itself doesnt need to be exposed.
9Two directions
- Declarative expression of Internet Routing
protocols - Loo et. al., ACM SIGCOMM 2005
- Declarative implementation of overlay networks
- Loo et. al., ACM SOSP 2005
- The focus of this talk
10Specfic case overlays
- Application level
- e.g. DHTs, P2P networks, ESM, etc.
- IP-oriented
- e.g. RON, IPVPNs, SOS, M/cast, etc.
- More generally routing fn of any large
distributed system - e.g MS Exchange, mgmt systems
11Why overlays?
- Overlays in a very broad sense
- Any application-level routing system
- Email servers, multicast, CDNs, DHTs, etc.
- ? broad applicability
- Ideal test case
- Clearly deployable short-term
- Defers interoperability issues
- Testbed for other domains
- The overlay design space is wide
- ? ensure we cover the bases
12Background
- PIER distributed relational query processor
(Huebsch et.al.) - Used DHT for hashing, trees, etc.
- Click modular s/w forwarding engine (Kohler
et.al.) - Used dataflow element graph
- XORP router (Handley et.al.)
- Dataflow approach to BGP, OSPF, etc.
13P2 A declarative overlay engine
OverLog specification
Software Dataflow Graph
Parser
Planner
Received Packets
Sent Packets
14Data Model
- Relational tuples
- Two types of named relation
- Soft-state tables
- Streams of transient tuples
- Simple, natural model for network state
- Concisely expressed in a declarative language
15Language DataLog
- Well-known relational query language from the
literature - Particularly deductive databases
- Prolog with no imperative constructs
- Equivalent to SQL with recursion
- OverLog variant of DataLog
- Streams tables
- Location specifiers for tuples
16Why DataLog?
- Advantages
- Generality allows great flexibility
- Easy to map prior optimization work
- Simple syntax, easy to extend
- Disadvantages
- Hard for imperative programmers
- Structure may not map to network concepts
- Good initial experimental vehicle
17Overlog by example
- Gossiping a mesh
- materialise(neighbour, 1, 60, infinity).
- materialise(member, 1, 60, infinity).
- gossipEvent(X) - localNode(X), periodic(X,E,10
). - gossipPartner_at_X(X,Y) - gossipEvent_at_X(X),
neighbour_at_X(Y). - member_at_Y(Z) - gossipPartner_at_X(X,Y),
coinflip(weight), member_at_X(Z).
18Software Dataflow Graph
- Elements represented as C objects
- V. efficient tuple handoff
- Virtual fn call refcounts
- Blocking/unblocking w/ continuations
- Single-threaded async i/o scheduler
?
?
19Typical dataflow elements
- Relational operators
- Select, join, aggregate, groupby
- Generalised projection (PEL)
- Networking stack
- Congestion control, routing, SAR, etc.
- Glue elements
- Queues, muxers, schedulers, etc.
- Debugging
- Loggers, watchpoints, etc.
20Evaluation Chord test case
- Why Chord?
- Quite complex overlay
- Several different data structures
- Maintenance dynamics, inc. churn
- Need to show
- We can concisely express Chords properties
- We can execute the specification with acceptable
performance
21Chord (Stoica et. al. 2001)a distributed hash
table
- Flat, cyclic key space of 160-bit identifiers
- Nodes pick a random identifier
- E.g. SHA-1 of IP address, port
- Owner of key k node with lowest ID greater than
k - Efficiently route to owner of any key in log(n)
hops
22Chord data structures
- Predecessor node
- Successor set
- log(n) next nodes
- Finger table
- Pointers to power-of-2 positions around the ring
23Chord dynamics
- Nodes join by looking up the owner of their ID
- Download successor sets from neighbours and
perform lookups for fingers - Periodically measure connectivity to successors
fingers - Stabilization continously optimizes finger table
24Example Chord in 33 rules
25Dataflow graph(some of it, at least)
26Comparison MIT Chord in C
27Perhaps a fairer comparison
- Macedon (OSDI 2004)
- State machines, timers, marshaling, embedded C
- Macedon Chord 360 lines
- 32-bit IDs, no stabilization, single successor
- P2 Chord 34 lines
- 160-bit IDs, full stabilization, log(n) successor
sets, optimized
28Performance?
- Note aim is acceptable performance, not
necessarily that of hand-coded Chord - Analogy SQL / RDBM systems
29Lookup length in hops
30Maintenance bandwidth(comparable with MIT Chord)
31Latency without churn
32Latency under churn
33Ongoing work
- More overlays!
- Pastry, parameterized small-world graphs
- Link-state, distance vector algorithms
- Assorted multicast graphs
- Proper library interface
- Code release later this summer
- Integrate discovery
- Exploit power of full query processor
- Can implement PIER in P2
- Integrated management, monitoring, measurement
34Ongoing work
- Rich seam for further research!
- The right language (SIGMOD possibly)
- Optimization techniques
- Proving safety properties
- Reconfigurable transport protocols
- Dataflow framework facilitates composition
- P2P networks introduce new space for transport
protocols - Debugging support
- Use query processor for online distributed
debugging - Potentially very powerful
35Conclusion
- Diverse overlay networks can be expressed
concisely in a OverLog - Specifications can be directly executed by P2 to
maintain the overlay - Performance of P2 overlays remains comparable
with hand-coded protocols
36Long-term implications
- An abstraction and infrastructure for radically
rethinking networking - One possibility System R for networks
- Where does the network end and the application
begin? - E.g. can run queries to monitor the network at
the endpoints - Integrate resource discovery, management, routing
- Chance to reshuffle the networking deck
37Thanks! Questions?
- Timothy Roscoe
- troscoe_at_acm.org
38Its real
39(No Transcript)
40Consistency under churn
41 Bandwidth usage under churn
42P2 A declarativeoverlay engine
- Everything is a declarative query
- Overlay construction, maintenance, routing,
monitoring - Queries compiled to software dataflow graph and
directly executed - System written from scratch (C)
- Deployable (PlanetLab, Emulab)
- Reasonable performance so far for deployed
overlays