PeertoPeer Networks

About This Presentation

Title:

PeertoPeer Networks

Description:

Peer-to-Peer Networks. Distributed Algorithms for P2P. Distributed ... Censor-resistant stores [Eternity, FreeNet, ...] Application-layer multicast [Narada, ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 55

Provided by: Pascal103

Category:

more less

Transcript and Presenter's Notes

Title: PeertoPeer Networks

1
Peer-to-Peer Networks

Distributed Algorithms for P2P
Distributed Hash Tables

P. Felber Pascal.Felber_at_eurecom.frhttp//www.eure
com.fr/felber/
2
Agenda

What are DHTs? Why are they useful?
What makes a good DHT design
Case studies
Chord
Pastry (locality)
TOPLUS (topology-awareness)
What are the open problems?

3
What is P2P?

A distributed system architecture
No centralized control
Typically many nodes, but unreliable and
heterogeneous
Nodes are symmetric in function
Take advantage of distributed, shared resources
(bandwidth, CPU, storage) on peer-nodes
Fault-tolerant, self-organizing
Operate in dynamic environment, frequent join and
leave is the norm

Internet
4
P2P Challenge Locating Content

Simple strategy expanding ring search until
content is found
If r of N nodes have copy, the expected search
cost is at least N / r, i.e., O(N)
Need many copies to keep overhead small

Who has this paper?
5
Directed Searches

Idea
Assign particular nodes to hold particular
content (or know where it is)
When a node wants this content, go to the node
that is supposes to hold it (or know where it is)
Challenges
Avoid bottlenecks distribute the
responsibilities evenly among the existing
nodes
Adaptation to nodes joining or leaving (or
failing)
Give responsibilities to joining nodes
Redistribute responsibilities from leaving nodes

6
Idea Hash Tables

A hash table associates data with keys
Key is hashed to find bucket in hash table
Each bucket is expected to hold items/buckets
items
In a Distributed Hash Table (DHT), nodes are the
hash buckets
Key is hashed to find responsible peer node
Data and load are balanced across nodes

7
DHTs Problems

Problem 1 (dynamicity) adding or removing nodes
With hash mod N, virtually every key will change
its location!
h(k) mod m ? h(k) mod (m1) ? h(k) mod (m-1)
Solution use consistent hashing
Define a fixed hash space
All hash values fall within that space and do not
depend on the number of peers (hash bucket)
Each key goes to peer closest to its ID in hash
space (according to some proximity metric)

8
DHTs Problems (contd)

Problem 2 (size) all nodes must be known to
insert or lookup data
Works with small and static server populations
Solution each peer knows of only a few
neighbors
Messages are routed through neighbors via
multiple hops (overlay routing)

9
What Makes a Good DHT Design

For each object, the node(s) responsible for that
object should be reachable via a short path
(small diameter)
The different DHTs differ fundamentally only in
the routing approach
The number of neighbors for each node should
remain reasonable (small degree)
DHT routing mechanisms should be decentralized
(no single point of failure or bottleneck)
Should gracefully handle nodes joining and
leaving
Repartition the affected keys over existing nodes
Reorganize the neighbor sets
Bootstrap mechanisms to connect new nodes into
the DHT
To achieve good performance, DHT must provide low
stretch
Minimize ratio of DHT routing vs. unicast latency

10
DHT Interface

Minimal interface (data-centric)
Lookup(key) ? IP address
Supports a wide range of applications, because
few restrictions
Keys have no semantic meaning
Value is application dependent
DHTs do not store the data
Data storage can be build on top of DHTs
Lookup(key) ? data
Insert(key, data)

11
DHTs in Context
User Application
load_file
store_file
File System
Retrieve and store files Map files to blocks
CFS
load_block
store_block
Storage Replication Caching
Reliable Block Storage
DHash
lookup
DHT
Lookup Routing
Chord
receive
send
Transport
Communication
TCP/IP
12
DHTs Support Many Applications

File sharing CFS, OceanStore, PAST,
Web cache Squirrel,
Censor-resistant stores Eternity, FreeNet,
Application-layer multicast Narada,
Event notification Scribe
Naming systems ChordDNS, INS,
Query and indexing Kademlia,
Communication primitives I3,
Backup store HiveNet
Web archive Herodotus

13
DHT Case Studies

Case Studies
Chord
Pastry
TOPLUS
Questions
How is the hash space divided evenly among nodes?
How do we locate a node?
How does we maintain routing tables?
How does we cope with (rapid) changes in
membership?

14
Chord (MIT)

Circular m-bit ID space for both keys and nodes
Node ID SHA-1(IP address)
Key ID SHA-1(key)
A key is mapped to the first node whose ID is
equal to or follows the key ID
Each node is responsible for O(K/N) keys
O(K/N) keys move when a node joins or leaves

m6
2m-1 0
15
Chord State and Lookup (1)

Basic Chord each node knows only 2 other nodes
on the ring
Successor
Predecessor (for ring management)
Lookup is achieved by forwarding requests around
the ring through successor pointers
Requires O(N) hops

m6
2m-1 0
N1
N56
N8
K54
N51
N14
N48
N42
N21
N38
N32
16
Chord State and Lookup (2)
Finger table

Each node knows m other nodes on the ring
Successors finger i of n points to node at n2i
(or successor)
Predecessor (for ring management)
O(log N) state per node
Lookup is achieved by following closest preceding
fingers, then successor
O(log N) hops

N81
N14
N82
N14
m6
N84
N14
2m-1 0
N88
N21
N1
N816
N32
N832
N42
N56
N8
K54
1
2
N51
4
N14
N48
8
16
32
N42
N21
N38
N32
17
Chord Ring Management

For correctness, Chord needs to maintain the
following invariants
For every key k, succ(k) is responsible for k
Successor pointers are correctly maintained
Finger table are not necessary for correctness
One can always default to successor-based lookup
Finger table can be updated lazily

18
Joining the Ring

Three step process
Initialize all fingers of new node
Update fingers of existing nodes
Transfer keys from successor to new node

19
Joining the Ring Step 1

Initialize the new node finger table
Locate any node n in the ring
Ask n to lookup the peers at j20, j21, j22
Use results to populate finger table of j

20
Joining the Ring Step 2

Updating fingers of existing nodes
New node j calls update function on existing
nodes that must point to j
Nodes in the rangesj-2i , pred(j)-2i1
O(log N) nodes need to be updated

N81
N14
N82
N14
m6
N84
N14
2m-1 0
N88
N21
N1
N816
N32
N28
N832
N42
N56
N8
N51
N14
N48
16
N42
N21
N38
N32
21
Joining the Ring Step 3

Transfer key responsibility
Connect to successor
Copy keys from successor to new node
Update successor pointer and remove keys
Only keys in the range are transferred

22
Stabilization

Case 1 finger tables are reasonably fresh
Case 2 successor pointers are correct, not
fingers
Case 3 successor pointers are inaccurate or key
migration is incomplete MUST BE AVOIDED!
Stabilization algorithm periodically verifies and
refreshes node pointers (including fingers)
Basic principle (at node n)
x n.succ.predif x ? (n, n.succ) n
n.succ notify n.succ
Eventually stabilizes the system when no node
joins or fails

23
Dealing With Failures

Failure of nodes might cause incorrect lookup
N8 doesnt know correct successor, so lookup of
K19 fails
Solution successor list
Each node n knows r immediate successors
After failure, n knows first live successor and
updates successor list
Correct successors guarantee correct lookups

m6
2m-1 0
N1
N56
N8
1
lookup(K19) ?
2
N51
4
N14
N48
8
16
N18
K19
N42
N21
N38
N32
24
Dealing With Failures (contd)

Successor lists guarantee correct lookup with
some probability
Can choose r to make probability of lookup
failure arbitrarily small
Assume half of the nodes fail and failures are
independent
P(n.successor-list all dead) 0.5r
P(n does not break the Chord ring) 1 - 0.5r
P(no broken nodes) (1 0.5r)N
r 2log(N) makes probability 1 1/N
With high probability (1/N), the ring is not
broken

25
Evolution of P2P Systems

Nodes leave frequently, so surviving nodes must
be notified of arrivals to stay connected after
their original neighbors fail
Take time t with N nodes
Doubling time time from t until N new nodes join
Halving time time from t until N nodes leave
Half-life minimum of halving and doubling time
Theorem there exist a sequence of joins and
leaves such that any node that has received fewer
than k notifications per half-life will be
disconnected with probability at least (1
1/(e-1))k 0.418k

26
Chord and Network Topology
Nodes numerically-close are not
topologically-close (1M nodes 10 hops)
27
Pastry (MSR)

Circular m-bit ID space for both keys and nodes
Addresses in base 2b with m / b digits
Node ID SHA-1(IP address)
Key ID SHA-1(key)
A key is mapped to the node whose ID is
numerically-closest the key ID

m8
2m-1 0
b2
28
Pastry Lookup

Prefix routing from A to B
At hth hop, arrive at node that shares prefix
with B of length at least h digits
Example 5324 routes to 0629 via 5324 ? 0748 ?
0605 ? 0620 ? 0629
If there is no such node, forward message to
neighbor numerically-closer to destination
(successor) 5324 ? 0748 ? 0605 ? 0609 ? 0620 ?
0629
O(log2b N) hops

29
Pastry State and Lookup

For each prefix, a node knows some other node (if
any) with same prefix and different next digit
For instance, N0201
N- N1???, N2???, N3???
N0 N00??, N01??, N03??
N02 N021?, N022?, N023?
N020 N0200, N0202, N0203
When multiple nodes, choose topologically-closest
Maintain good locality properties (more on that
later)

Routing table
m8
2m-1 0
b2
N0002
N0122
N3200
N0201
N0212
N0221
N3033
N0233
N0322
N3001
N2222
N1113
N2120
K2120
N2001
30
A Pastry Routing Table
b2, so node ID is base 4 (16 bits)
b2
m16
Node ID 10233102
Contains the nodes that are numerically closest
to local node MUST BE UP TO DATE
Leaf set
lt SMALLER
LARGER gt
10233033
10233021
10233120
10233122
10233001
10233000
10233230
10233232
Routing Table
02212102
1
22301203
31203203
Entries in the mth column have m as next digit
0
11301233
12230203
13021022
2
10031203
10132102
10323302
3
10222302
10200230
10211302
nth digit of current node
m/b rows
3
10230322
10231000
10232121
10233001
10233232
1
Entries in the nth row share the first n
digits with current node common-prefix
next-digit rest
10233120
0
2
Contains the nodes that are closest to local
node according to proximity metric
Neighborhood set
2b-1 entries per row
Entries with no suitable node ID are left empty
13021022
10200230
11301233
31301233
02212102
22301203
31203203
33213321
31
Pastry and Network Topology
Expected node distance increases with row number
in routing table
Smaller and smaller numerical jumps Bigger and
bigger topological jumps
32
Joining
X joins
X 0629
0629s routing table
33
Locality

The joining phase preserves the locality property
First A must be near X
Entries in row zero of As routing table are
close to A, A is close to X ? X0 can be A0
The distance from B to nodes from B1 is much
larger than distance from A to B (B is in A0) ?
B1 can be reasonable choice for X1, C2 for X2,
etc.
To avoid cascading errors, X requests the state
from each of the node in its routing table and
updates its own with any closer node
This scheme works pretty well in practice
Minimize the distance of the next routing step
with no sense of global direction
Stretch around 2-3

34
Node Departure

Node is considered failed when its immediate
neighbors in the node ID space cannot communicate
with it
To replace a failed node in the leaf set, the
node contacts the live node with the largest
index on the side of failed node, and asks for
its leaf set
To repair a failed routing table entry Rdl, node
contacts first the node referred to by another
entry Ril, i?d of the same row, and ask for that
nodes entry for Rdl
If a member in the M table, is not responding,
node asks other members for their M table, check
the distance of each of the newly discovered
nodes, and update its own M table

35
CAN (Berkeley)

Cartesian space (d-dimensional)
Space wraps up d-torus
Incrementally split space between nodes that join
Node (cell) responsible for key k is determined
by hashing k for each dimension

d2
1
36
CAN State and Lookup

A node A only maintains state for its immediate
neighbors (N, S, E, W)
2d neighbors per node
Messages are routed to neighbor that minimizes
Cartesian distance
More dimensions means faster the routing but also
more state
(dN1/d)/4 hops on average
Multiple choices we can route around failures

d2
B
N
W
A
E
S
37
CAN Landmark Routing

CAN nodes do not have a pre-defined ID
Nodes can be placed according to locality
Use well known set of m landmark machines (e.g.,
root DNS servers)
Each CAN node measures its RTT to each landmark
Orders the landmarks in order of increasing RTT
m! possible orderings
CAN construction
Place nodes with same ordering close together in
the CAN
To do so, partition the space into m! zones m
zones on x, m-1 on y, etc.
A node interprets its ordering as the coordinate
of its zone

38
CAN and Network Topology
C
CAB
A
Use m landmarks to split space in m! zones
CBA
ACB
Nodes get random zone in their zone
Topologically-close nodes tend to be in the same
zone
BCA
B
ABC
BAC
39
Topology-Awareness

Problem
P2P lookup services generally do not take
topology into account
In Chord/CAN/Pastry, neighbors are often not
locally nearby
Goals
Provide small stretch route packets to their
destination along a path that mimics the
router-level shortest-path distance
Stretch DHT-routing / IP-routing
Our solution
TOPLUS (TOPology-centric Look-Up Service)
An extremist design to topology-aware DHTs

40
TOPLUS Architecture
Group nodes in nested groups using IP prefixes
AS, ISP, LAN (IP prefix contiguous address range
of the form w.x.y.z/n)
Use IPv4 address range (32-bits) for node IDs and
key IDs
Assumption nodes with same IP prefix are
topologically close
IP Addresses
41
Node State
Each node n is part of a series of telescoping
sets Hi with siblings Si
Node n must know all up nodes in inner group
Node n must know one delegate node in each tier i
set S ? Si
IP Addresses
42
Routing with XOR Metric

To lookup key k, node n forwards the request to
the node in its routing table whose ID j is
closest to k according to XOR metric
Let j j31j30...j0 k k31k30...k0
Refinement of longest-prefix match
Note that closest ID is unique d(j,k) d(j,k)
? j j
Example (8 bits)
k 10010110
j 10110110 d(j,k) 25 32
j 10001001 d(j,k) 24 23 22 21 20
31

43
Prefix Routing Lookup
Perform longest-prefix match against entries in
routing table using XOR metric
Route message to node in inner group with closest
ID (according to XOR metric)
Compute 32-bits key k (using hash function)
k
IP Addresses
44
TOPLUS and Network Topology
Smaller and smaller numerical and topological
jumps
Always move closer to the destination
45
Group Maintenance

To join the system, a node n find its closest
node n
n copies the routing and inner-group tables of n
n modifies its routing table to satisfy a
diversity property
Requires that the delegate nodes of n and n are
distinct with high probability
Allows us to find a replacement delegate in case
of failure
Upon failure, update inner-group tables
Lazy update of routing tables
Membership tracking within groups (local, small)

46
On-Demand Caching
To look up k, create kk with r first bits
replaced by w.x.y.z/r (node responsible for k in
cache)
Cache data in group (ISP, campus) with prefix
w.x.y.z/r
Extends naturally to multiple levels (cache
hierarchy)
k
IP Addresses
47
Measuring TOPLUS Stretch

Obtained prefix information from
BGP tables from Oregon, Michigan Universities
Routing registries from Castify, RIPE
Sample of 1000 different IP address
Point-to-point IP measurements using King
TOPLUS distance weighted average of all possible
paths between source and destination
Weights probability of a delegate to be in each
group
TOPLUS stretch TOPLUS distance / IP distance

48
Results

Original Tree
250,562 distinct IP prefixes
Up to 11 levels of nesting
Mean stretch 1.17
16-bit regrouping (gt16 ? 16)
Aggregate small tier-1 groups
Mean stretch 1.19
8-bit regrouping (gt16 ? 8)
Mean stretch 1.28
Original1 add one level with 256 8-bit prefixes
Mean stretch 1.9
Artificial, 3-tier tree
Mean stretch 2.32

49
TOPLUS Summary

Problems
Non-uniform ID space (requires bias in hash to
balance load)
Correlated node failures
Advantages
Small stretch
IP longest-prefix matching allows fast forwarding
On-demand P2P caching straightforward to
implement
Can be easily deployed in a static environment
(e.g., multi-site corporate network)
Can be used as benchmark to measure speed of
other P2P services

50
Other Issues Hierarchical DHTs

The Internet is organized as a hierarchy
Should DHT designs be flat?
Hierarchical DHTs multiple overlays managed by
possibly different DHTs (Chord, CAN, etc.)
First, locate group responsible for key in
top-level DHT
Then, find peer in next-level overlay, etc.
By designating the most reliable peers as
super-nodes (part of multiple overlays), number
of hops can be significantly decreased
How can we deploy, maintain such architectures?

51
Hierarchical DHTs Example
CAN Group
s1
s4
Top-level Chord Overlay
s2
s3
Chord Group
52
Other Issues DHT Querying

DHTs allow us to locate data very quickly...
Lookup(Beatles/Help) ? IP address
...but it only works for perfect matching
Users tend to submit broad queries
Lookup(Beatles/) ? IP address
Queries may be inaccurate
Lookup(Beattles/Help) ? IP address
Idea Index data using partial queries as keys
Other approach Fuzzy matching (UCSB)

53
Some Other Issues

Better handling of failures
In particular, Byzantine failures A single
corrupted node may compromise the system
Reasoning with the dynamics of the system
A large system may never achieve a quiescent
ideal state
Dealing with untrusted participants
Data authentication, integrity of routing tables,
anonymity and censor resistance, reputation
Traffic-awareness, load balancing

54
Conclusion

DHT is a simple, yet powerful abstraction
Building block of many distributed services (file
systems, application-layer multicast, distributed
caches, etc.)
Many DHT designs, with various pros and cons
Balance between state (degree), speed of lookup
(diameter), and ease of management
System must support rapid changes in membership
Dealing with joins/leaves/failures is not trivial
Dynamics of P2P network is difficult to analyze
Many open issues worth exploring