Title: Matthew Caesar, Tyson Condie, Jayanthkumar Kannan,
1Routing on Flat Labels
- Matthew Caesar, Tyson Condie, Jayanthkumar
Kannan, - Karthik Lakshminarayanan, Ion Stoica, Scott
Shenker - Appeared in Sigcomm 2006
-
- Presented by
- Jackson Pang
- CS 217B, Spring 2009
2Whats wrong with Internet addressing today?
- Hierarchical addressing allows excellent scaling
properties - But, forces addressing to conform to network
topology - Since topology is not static, addresses cant
persistently identify hosts
3Topology-Based Addressing
- Disadvantages complicates
- Access control
- Topology changes
- Multi-homing
- Mobility
- Advantage
- Scalability
- Scalability
- Scalability
-
4Whats wrong with Internet addressing today?
- The concept of location vs. identity in IP is
mixed - Most network applications today require
persistent identity - Its hard to provide persistent identity in
presence of hierarchical addressing - Need to decouple identity from addressing
- Drastically complicates network configuration,
mobility, address assignment - Is hierarchy the only way to scale routing?
5Motivation for Flat Identifiers
- Stable references
- Shouldnt have to change when object moves
- Object replication
- Store object at many different locations
- Avoid fighting over names
- Avoid cyber squatting, typo squatting,
Proposed
Current
ltA HREF http//isp.com/dog.jpg gtmy friends
doglt/Agt
ltA HREF http//f0120123112/ gtmy friends doglt/Agt
6Is there an alternative?
- Why not route on flat host identifiers?
- Assign addresses independently of network
topology - Benefits
- No separate name resolution system required
- Simpler network config/allocation/mobility
- Simpler network-layer access controls
7Basic idea behind ROFL
- Scalable routing on flat identifiers
- Goal 1 Scale to Internet topologies
- How do you fetch a web page by typing google.com?
- Goal 2 Support for BGP policies
- How do you preserve the customer provider,
peering relationships? - Highly challenging problem, but solution would
give a number of benefits
8Basic mechanisms behind ROFL
- Goal 1 Scale to Internet topologies
- Mechanism DHT-style routing, maintain
source-routes to successors/fingers - Provides Scalable network routing without
aggregation - Goal 2 Support for BGP policies
- Mechanism Intelligently choose
successors/fingers to conform to ISP
relationships - Provides Support for policies, operational model
of BGP
9How ROFL works
1. hosts are assigned topology-independent flat
identifiers
10Background Ideas
- Distributed Hash Tables
- Hashing, Consistent Hashing, DHT goals
- Chord
- Join, leave, index fingers, caching
- Stoica I., et al., Chord A Scalable
Peer-to-peer Lookup Protocol for Internet
Applications, IEEE/ACM Transactions on
networking, 2003 - Canon
- Support for hierarchy so that we can do BGP-style
routing - Ganesan, P., et al., Canon in G Major Designing
DHTs with Hierarchical Structure, ACM
Distributed Computing Systems, 2004.
11DHT Background Hash Table
- Name-value pairs (or key-value pairs)
- E.g., www.cnn.com/foo.html (key) and the Web page
(value) - Hash table
- Data structure that associates keys with values
value
lookup(key)
value
key
12DHT Background Hash Functions
- Hashing
- Transform the key into a number
- And use the number to index an array
- Example hash function
- Hash(x) x mod 101, mapping to 0, 1, , 100
- More sophisticated ones MD5, SHA1, SHA256
- Challenges (collisions, joins and leaves)
- What if there are more than 101 nodes? Fewer?
- Which nodes correspond to each hash value?
- What if nodes come and go over time?
- But ROFL assumes that weve got the hashing issue
covered.
13DHT Background Consistent Hashing
- Large, sparse identifier space (e.g., 128 bits)
- Hash a set of keys x uniformly to large id space
- Hash nodes to the id space as well
0
1
2128-1
Id space represented as a ring.
Hash(name) ? object_id Hash(IP_address) ? node_id
14DHT (Distributed Hash Table)
- Hash table spread over many nodes
- Distributed over a wide area
- Main design goals
- Decentralization no central coordinator
- Scalability efficient even with large of nodes
- Fault tolerance tolerate nodes joining/leaving
- Two key design decisions
- How do we map names on to nodes?
- How do we route a request to that node?
15Chord Background What is Chord? What does it do?
- Supports just one operation given a key, it maps
the key onto a node - In short a peer-to-peer lookup service
- Solves problem of locating a data item in a
collection of distributed nodes, considering
frequent node arrivals and departures - Core operation in most p2p systems is efficient
location of data items
16Chord Characteristics (O(LogN))
- Simplicity, provable correctness, and provable
performance - Each Chord node needs routing information about
only a few other nodes - Resolves lookups via messages to other nodes
(iteratively or recursively) - Maintains routing information as nodes join and
leave the system
17Chord Characteristics
- Simplicity, provable correctness, and provable
performance - Each Chord node needs routing information about
only a few other nodes - Resolves lookups via messages to other nodes
(iteratively or recursively) - Maintains routing information as nodes join and
leave the system
18DNS vs. Chord
- DNS
- provides a host name to IP address mapping
- relies on a set of special root servers
- names reflect administrative boundaries
- is specialized to finding named hosts or services
- Chord
- can provide same service Name key, value IP
- requires no special servers
- imposes no naming structure
- can also be used to find data objects that are
not tied to certain machines
But that doesnt mean we replace DNS in ROFL
19The Base Chord Protocol
- Specifies how to find the locations of keys
- How new nodes join the system
- How to recover from the failure or planned
departure of existing nodes
20Chord Recall Consistent Hashing
- Hash function assigns each node and key an m-bit
identifier using a base hash function such as
SHA-1 - ID(node) hash(IP, Port)
- ID(key) hash(key)
- Practically, we hash the nodes public key to get
the hash key - Leverage on the benefits of consistent hashing
21Chord Definitions Successor Nodes
- Shows which node holds which keys
1
successor(1) 1
identifier circle
6
2
successor(2) 3
successor(6) 0
22Node Joins and Departures
6
1
6
successor(6) 7
successor(1) 3
2
1
23Scalable Key Location
- A very small amount of routing information
suffices to implement consistent hashing in a
distributed environment - Each node need only be aware of its successor
node on the circle - Queries for a given identifier can be passed
around the circle via these successor pointers - Resolution scheme correct, BUT inefficient it
may require traversing all N nodes!
24Acceleration of Lookups
- Lookups are accelerated by maintaining additional
routing information - Each node maintains a routing table with (at
most) m entries (where N2m) called the finger
table - ith entry in the table at node n contains the
identity of the first node, s, that succeeds n by
at least 2i-1 on the identifier circle
(clarification on next slide) - s successor(n 2i-1) (all arithmetic mod 2)
- s is called the ith finger of node n, denoted by
n.finger(i).node
25Finger Tables (of successors)
1 2 4
1,2) 2,4) 4,0)
1 3 0
26Finger Tables - Node Joins
finger table
keys
start
int.
succ.
6
1 2 4
1,2) 2,4) 4,0)
1 3 0
6
6
finger table
keys
start
int.
succ.
2
4 5 7
4,5) 5,7) 7,3)
0 0 0
6
6
27Finger Tables - Node Departure
finger table
keys
start
int.
succ.
1 2 4
1,2) 2,4) 4,0)
1 3 0
3
6
finger table
keys
start
int.
succ.
1
2 3 5
2,3) 3,5) 5,1)
3 3 0
6
finger table
keys
start
int.
succ.
6
7 0 2
7,0) 0,2) 2,6)
0 0 3
finger table
keys
start
int.
succ.
2
4 5 7
4,5) 5,7) 7,3)
6 6 0
0
28Source of InconsistenciesConcurrent Operations
and Failures
- Basic stabilization protocol is used to keep
nodes successor pointers up to date, which is
sufficient to guarantee correctness of lookups - Those successor pointers can then be used to
verify the finger table entries - Every node runs stabilize periodically to find
newly joined nodes
29Stabilization after Join
- n joins
- predecessor nil
- n acquires ns as successor via some n
- n notifies ns being the new predecessor
- ns acquires n as its predecessor
- np runs stabilize
- np asks ns for its predecessor (now n)
- np acquires n as its successor
- np notifies n
- n will acquire np as its predecessor
- all predecessor and successor pointers are now
correct - fingers still need to be fixed, but old fingers
will still work
ns
pred(ns) n
n
succ(np) ns
pred(ns) np
succ(np) n
np
30Failure Recovery
- Key step in failure recovery is maintaining
correct successor pointers - To help achieve this, each node maintains a
successor-list of its r nearest successors on the
ring - If node n notices that its successor has failed,
it replaces it with the first live entry in the
list - stabilize will correct finger table entries and
successor-list entries pointing to failed node - Performance is sensitive to the frequency of node
joins and leaves versus the frequency at which
the stabilization protocol is invoked
31Hierarchical DHT Support for Inter-domain
Routing and BGP-ness
- What is a Hierarchical DHT?
USA
UCLA
MIT
CS
EE
Path Locality
Path Convergence
Efficiency, Security, Fault isolation
Caching, Bandwidth Optimization
Local DHTs Access Control
32CrescendoConvert flat DHTs to hierarchical
(Canon-ization)
- Key idea Recursive structure
- Construct bottom-up merge smaller DHTs
- Lowest level Chord
33Crescendo Merging two Chord rings
3
5
- Black node x Connect to y iff
- y closer than any other black node
- y succ(x 2i)
2
8
0
10
13
12
34Now Back to the ROFL. How ROFL works
1. hosts are assigned topology-independent flat
identifiers
35Identifiers
- Identity tied to public/private key pair
- Everyone can know the public key
- Only authorized parties know the private key
- Self-certifying identifier hash of public key
- Host associates with a hosting router
- Proves it knows private key, to prevent spoofing
- Router joins the ring on the hosts behalf
- Anycast
- Multiple nodes have the same identifier
36AS and BGP Policies (A review)
- Economic relationships
- Peering
- Provider/customer
- Isolation
- routing contained within hierarchy
37Isolation in ROFL (Canon)
- Traffic between two hosts traverses no higher
than their lowest common provider in the AS
hierarchy - Accomplished by carefully merging the rings
38Policy support in ROFL (BGP support)
A
B
C
Peering link
39Scalability in ROFL
- Two extensions to improve locality
- Maintain proximity-based fingers in a policy-safe
fashion - Pointer caching strategies prefer nearby,
popular pointers
40Evaluation
- Intra-domain
- Trace based on Rocketfuel over 4 large ISPs
with 2-3 millions of hosts - Each host w/ 128-bit ID
- Routers have 9Mbits of memory for pointer cache
and finger tables - Inter-domain
- Using Routeviews, inter-AS topology graph are
derived - Ran simulations with 30 thousand nodes and
extrapolated results to to 600 million hosts
41Metric ROFL BGPDNS
Join overhead 450 per-host lightweight joins 14 per-host typically 0 per host 40,000 per prefix
Latency Baseline 135ms With pointer caches 70ms With lookup 137ms No lookup 54ms
State 2 million pointer cache entries in core, 100 entries at edge 150 thousand entries at core and edge 77 million DNS entries at root servers
Failure Cached pointers equivalent of backup paths, failures only affect successors and fingers Undergoes convergence process, failures have global impact
42Metric ROFL BGPDNS
Join overhead 450 per-host lightweight joins 14 per-host typically 0 per host 40,000 per prefix
Latency Baseline 135ms With pointer caches 70ms With lookup 137ms No lookup 54ms
State 2 million pointer cache entries in core, 100 entries at edge 150 thousand entries at core and edge 77 million DNS entries at root servers
Failure Cached pointers equivalent of backup paths, failures only affect successors and fingers Undergoes convergence process, failures have global impact
43Metric ROFL BGPDNS
Join overhead 450 per-host lightweight joins 14 per-host typically 0 per host 40,000 per prefix
Latency Baseline 135ms With pointer caches 70ms With lookup 137ms No lookup 54ms
State 2 million pointer cache entries in core, 100 entries at edge 150 thousand entries at core and edge 77 million DNS entries at root servers
Failure Cached pointers equivalent of backup paths, failures only affect successors and fingers Undergoes convergence process, failures have global impact
44Metric ROFL BGPDNS
Join overhead 450 per-host lightweight joins 14 per-host typically 0 per host 40,000 per prefix
Latency Baseline 135ms With pointer caches 70ms With lookup 137ms No lookup 54ms
State 2 million pointer cache entries in core, 100 entries at edge 150 thousand entries at core and edge 77 million DNS entries at root servers
Failure Cached pointers equivalent of backup paths, failures only affect successors and fingers Undergoes convergence process, failures have global impact
45Metric ROFL BGPDNS
Join overhead 450 per-host lightweight joins 14 per-host typically 0 per host 40,000 per prefix
Latency Baseline 135ms With pointer caches 70ms With lookup 137ms No lookup 54ms
State 2 million pointer cache entries in core, 100 entries at edge 150 thousand entries at core and edge 77 million DNS entries at root servers
Failure Cached pointers equivalent of backup paths, failures only affect successors and fingers Undergoes convergence process, failures have global impact
46Contributions
- Clean-slate design with novel ideas
- A world without RIRs to issue IP addresses!
- The design includes both inter-domain and
intra-domain routing issues. Did not side-step
essential the features. - ROFL is very attractive for new Internet usage
patterns such as multicast, mobility,
multi-homing. - Secret Sauces
- Brings up design issues we will face in a
clean-slate approach - Put up a good fight to challenge the traditional
IP and DNS-based infrastructure - Extension of already powerful ideas such as DHT,
Chord, Hierarchical DHT (combined 1000
citations). - Honesty about conservation of dirt didnt
bother to sweep them elsewhere - Fundamental computer science ideas hashing,
caching, recursion, doubly linked-lists
47ROFL Weaknesses
- Identity tied to public / private key pair
- PKI notion in Internet who are the trusted
authorities? - Unlikely but possible hash collision
- Identity -gt hash translation
- Per-AS link-state information assumed to be
current and precise. Assumes existence of
OSPF-like link-state information - Accomplished via the control messages
- Needs large pointer cache, but ok with current
technology? - Inter-AS routing needs large bloom filter for
each neighboring AS - Issues with quickly updating bloom filters on
joins and leaves?
48References
- Stoica I., et al., Chord A Scalable
Peer-to-peer Lookup Protocol for Internet
Applications, IEEE/ACM Transactions on
networking, 2003 - Ganesan, P., et al., Canon in G Major Designing
DHTs with Hierarchical Structure, Distributed
Computing Systems, 2004. Proceedings. - Special Thanks for the presentation material
- Jennifer Rexford, Princeton
- Markus Böhning, UC Berkeley