Title: Fault-tolerant Routing in Peer-to-Peer Systems
1Fault-tolerant Routing in Peer-to-Peer Systems
James Aspnes Zoë Diamadi Gauri Shah
Yale University PODC 2002
2P2P network
Key
- Bunch of peers.
- Store resources identified by keys.
- Peers subject to crash failures.
- Goal locate resources efficiently.
3Properties of ideal network
- Data availability
- Decentralization
- Fault-tolerance
- Scalability
- Load balancing
- Maintaining network
- Dynamic node addition/deletion
- Self-stabilization
- Efficient searching
- Incorporating geography
- Incorporating locality
4Early P2P systems
Napster
x
?
x
x
Central server bottleneck
5Tapestry JKZ01
Uses Plaxtons Algorithm
Node xyz links to XX, xX and xy all
digits, X any digit
Correct one digit at a time to reach target.
Pastry DR01 is also similar.
6CAN RFHKS01
Partition d-dimensional co-ordinate space into
zones.
(0,1)
(1,1)
zone
3
5
d2
7
2
8
(1,0)
(0,0)
Nodes own zones and keys hashed to them. Greedy
routing forward to neighbor closest to target.
7Chord SMKKB01
Nodes and resources mapped to identifier
circle. Routing table successor nodes at
distances .
0
7
1
successors
identifier circle (n8)
0 0 3
6
2
6 6 0
3
5
4
Greedy routing forward to node in routing
table closest to target
8Common underlying structure
- Underlying metric space.
- Nodes embedded in metric space.
- Location determined by key.
- Hashing to balance load.
- Greedy routing.
- O(log n) space at each node.
- O(log n) routing time.
9Unifying approach
Virtual Route
v4
Nodes
Keys
v2
v1
HASH
Physical Link
v3
Virtual Link
v1 v2 v3 v4
Actual Route
PHYSICAL NETWORK
VIRTUAL OVERLAY NETWORK
10Link Distribution
Each node independently selects k long-hop links
as per some distribution .
x-d1
x
Nodes
x-d2
11Abstract model
Simple metric space 1D line. Hash(key) Metric
space location. Short-hop links immediate
neighbors. Long-hops linksinverse-distance
distribution. Predge(u,v)
1/d(u,v) / Greedy Routing forward message to
neighbor closest to
target in metric space.
1/d(u,v)
12What do we care about?
- Do we get similar upper bounds on routing
- time with failures?
- Is it possible to design a link distribution
- that beats the O(log2n) bound for routing
- given by 1/d distribution?
- Can we dynamically construct such a network?
13Greedy routing with failures
Analyze message delivery in phases Kleinberg
99.
Phase 0
Phase 1
Target t
Phase 2
Message at node n in phase i 2i d(n, t) lt
2i1 At most (log n 1) such phases.
141..log n long-hop links
Suppose each node has k long-hop links. Average
time spent in each phase ((log n)/k). With
O(log n) such phases Total time
O((log2n)/k). With failures Suppose each
node/link fails with prob (1-p). Average time
spent in each phase ((log n)/pk). Total time
O((log2n)/pk)
15Simulation results
n131072 nodes log n17 links
What happens with gt log n links?
16What do we care about?
- Do we get similar upper bounds on routing
- time with failures?
- Is it possible to design a link distribution
- that beats the O(log2n) bound for routing
- given by 1/d distribution?
- Lower bound on routing time as a function
- of number of links per node.
- Can we dynamically construct such a network?
17Intuition for lower bound KUW88
Time needed for a non-increasing
real-valued Markov chain X0, X1, X2. to drop to
1 bounded by
where EXt Xt1 Xt z is a
non-decreasing function of z.
18Upper bound on time
SFO
NYC
z
x
Starting from x, average speed at z .
gives lower bound on average crossing speed.
( is non-decreasing so )
gives upper bound on time.
19Lower bound on time
SFO
NYC
x
z
gives upper bound on average crossing speed.
mz sup
gives lower bound on time.
This may give too large an estimate, so condition
against high bursts of speed.
20Tool for lower bound
Non-increasing Markov chain X0, X1, X2 ..,
state space S.
Few long jumps
21Applying tool to routing
Cannot bound progress of single node with an
arbitrary distribution!
22Lower bounds
Random graph G. Node x has k independent links on
average. x links to (x-1) and (x1). Expected
time to reach 0 from a Point chosen uniformly
from 1..n
Probability of choosing links symmetric about 0
and unimodal.
(ln2n) worse that O(ln n) for a tree cost
of assuming symmetry between nodes.
23What do we care about?
- Do we get similar upper bounds on routing
- time with failures?
- Is it possible to design a link distribution
- that beats the O(log2n) bound for routing
- given by 1/d distribution?
- Can we dynamically construct such a network?
24Heuristic for construction
New node chooses neighbors using inverse distance
distribution. Links to live nodes closest to
chosen ones. Selects older nodes to point to it.
new link
adjusted link
initial link
ideal link
x
y
older node
new node
absent node
25Open problems
- Does lower bound generalize to multidimensional
- metric spaces?
- Does backtracking give provably good routing
bound?
- Analyze security properties such as anonymity
- and byzantine failures.