LOOKING UP DATA IN P2P SYSTEMS - PowerPoint PPT Presentation

About This Presentation
Title:

LOOKING UP DATA IN P2P SYSTEMS

Description:

LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS Key Idea Survey paper Discusses how to access data in ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 35
Provided by: Pari83
Learn more at: https://www2.cs.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: LOOKING UP DATA IN P2P SYSTEMS


1
LOOKING UP DATAIN P2P SYSTEMS
  • Hari Balakrishnan M. Frans Kaashoek David
    Karger Robert MorrisIon Stoica
  • MIT LCS

2
Key Idea
  • Survey paper
  • Discusses how to access data in a P2P system
  • Covers four solutions
  • CAN
  • Chord
  • Pastry
  • Tapestry

3
INTRODUCTION
  • P2P systems are popular due to
  • Low startup cost
  • High scalability at very low cost
  • Use of resources that would otherwise remain
    unused
  • Potential for greater robustness
  • Fully decentralized and distributed

4
The lookup problem
  • How do we locate data in large P2P systems?
  • One solution
  • Distributed hash tables (DHT)

5
Previous solutions (I)
  • Centralized database
  • Napster
  • Not scalable
  • Vulnerable to attacks on database

6
Previous solutions (II)
  • Broadcasting
  • Customers broadcast their requests to their
    neighbors, which forward them to their own
    neighbors and so on
  • Gnutella
  • Does not scale either
  • Broadcast messages consume too much bandwidth

7
Previous solutions (III)
  • Internet DNS
  • Organizes network nodes into an hierarchy
  • All searches start at top of hierarchy
  • Propagate down
  • Used by KaZaA, Grokster and others
  • Nodes higher in the tree do much more work than
    lower nodes
  • Solution vulnerable to loss of root node(s)

8
Previous solutions (IV)
  • Freenet
  • Forwards queries from node to node until
    requested data are found
  • Emphasis is on anonymity
  • Not performance
  • Unpopular documents may become inaccessible
  • Nobody cares!

9
DISTRIBUTED HASH TABLES
  • Implements primitive lookup(key)
  • Produces a path going from a node no to the
    node holding key
  • Big tradeoff is between
  • Keeping paths short
  • Minimizing state information kept by nodes

10
Main design issues
  • Mapping keys to nodes in a balanced way
  • Use a hash function
  • Forwarding a lookup for a key to appropriate node
  • Find at each step a node closer to the node
    holding the key
  • Building routing tables
  • Each node should have a successor

11
CAN
  • Uses a d-dimensional key space
  • Partitioned into hyper-rectangles
  • "Zones"
  • Each node manages a zone
  • Responsible for all keys in zone

12
Neighbors
  • Each node keeps track of addresses of all its
    neighbors
  • Routing table
  • Neighbors are defined as nodes sharing a (d-1)
    dimensional hyper-plane
  • Contacts with fewer dimensions in common do not
    count

13
A two-dimensional example (I)
14
A two-dimensional example (II)
(1, 1)
(0, 1)
X(0, 0.5 0.5, 1)
X(0.5, 0.5 1, 1)
X(0.5, 0.25 0.75, 0.5)
X(0.75, 01, 0.5)
X(0.5, 0 0.75, 0.25)
(0, 0)
(1, 0)
In reality the state space wraps
15
A path from (0.25, 0.3) to (0.8, 0.8)
(1, 1)
(0, 1)
X(0, 0.5 0.5, 1)
X(0.5, 0.5 1, 1)
X(0, 0 0.5, 0.5)
X(0.5, 0.25 0.75, 0.5)
X(0.5, 0 0.75, 0.25)
X(0.75, 01, 0.5)
(0, 0)
(1, 0)
In reality the state space wraps
16
Lookup
  • Routing tries to approximate the straight path
    between current zone and zone holding the key
  • Various optimizations attempt to reduce lookup
    latency

17
Dynamic behavior
  • When a node joins the network
  • It picks random point in space
  • Find node managing the zone
  • Splits with it current zone
  • When a node departs
  • Zones are merged
  • More complex process

18
Fault-tolerance
  • When a node fails neighbor with smallest zone
    takes over
  • Multiple failures may cause too many nodes to
    handle multiple zones

19
CHORD
  • Assigns ID's to keys and nodes in the same
    address space
  • ID's are organized in a ring
  • ID 0 follows the highest ID
  • Each node is responsible for all keys that
    immediately precede it in the key space

20
Example
K1
N 24
N 4
N 20
K 6
K 15
N 12
K 10
21
Finger table
  • Each node keeps a table containing IP addresses
    of nodes
  • Halfway around in the key space
  • Quarter-of-the-way around
  • Table has log N entries
  • Allows O(log N) searches

22
Partial example
N 24
N 4
N 20
N 12
23
Fault-tolerance
  • Each node has a successor list
  • Contains IP addresses of next r successors
  • Guarantees routing progress as long as all r
    successors are not down

24
Dynamic behavior
  • New node n learns its place in the Chord ring by
    asking any extant node to do a lookup(n)
  • Must also
  • Update successor list of its predecessor
  • Create its own successor list

25
PASTRY
  • Scalable, self-organizing, routing and object
    location infrastructure
  • Each node has a node ID
  • IDs are uniformly distributed in the ID space
  • Includes a proximity metric to measure distances
    between pairs of ID's

26
Pastry Nodes
  • Each node maintains three sets of nodes
  • Leaf set
  • Closest nodes in terms of node ID's
  • Same function as Chord's successor list
  • Nodes in routing table
  • Prefix routing (big idea)
  • Neighborhood set
  • Closest nodes in terms of proximity metric

27
Dynamic behavior
  • Pastry is self-organizing
  • Nodes come and go
  • Includes a seed discovery protocol

28
Prefix Routing
  • At each step, a node forwards an incoming request
    to a node whose node id has largest common
    prefix with
  • Destination ID 1230
  • Node ID 1023
  • Next Hop 12--

29
Routing table for node 1023
No common prefix
One common digit
Two common digits
Three common digits
0221 2230 3120
1130 1233 1302
1003 1013 1032
1020 1022
30
Routing request for node 1230
No common prefix
One common digit
Two common digits
Three common digits
0221 2230 3120
1130 1223 1302
1003 1013 1032
1020 1022
Request is always send to a node having at least
one more common prefix digit. Here it's node 1223
31
At node 1233
0221 2230 3120
1030 1130 1302
1201 1211 1220
1230 1232
No common prefix
One common digit
Two common digits
Three common digits
Node with at least one more common prefix
digitis node 1230
32
TAPESTRY
  • Interprets keys as sequences of digits
  • Incremental prefix routing
  • Similar to Pastry
  • Main contribution is emphasis on proximity
  • In the actual world
  • Reduces query latency
  • Makes system much more complex

33
CONCLUSIONS
  • Major issues include
  • Operational costssearches are all O(log n)
    storage costs vary
  • Fault-tolerance and concurrent changesonly
    Chord and Tapestry can handle them
  • Proximity routingPastry, CAN and Tapestry have
    heuristics
  • Malicious nodesPastry checks node ID's

34
Summary of costs
CAN Chord Pastry Tapestry
Node state1 d log N log N log N
Lookup2 dN1/d log N log N log N
Join2 dN1/d d log N log2 N log2 N log2 N
1 number of other nodes known by a given
Write a Comment
User Comments (0)
About PowerShow.com