LOOKING UP DATA IN P2P SYSTEMS - PowerPoint PPT Presentation

About This Presentation

Title:

LOOKING UP DATA IN P2P SYSTEMS

Description:

LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS Key Idea Survey paper Discusses how to access data in ... – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 35

Provided by: Pari83

Learn more at: https://www2.cs.uh.edu

Category:

more less

Transcript and Presenter's Notes

Title: LOOKING UP DATA IN P2P SYSTEMS

1
LOOKING UP DATAIN P2P SYSTEMS

Hari Balakrishnan M. Frans Kaashoek David
Karger Robert MorrisIon Stoica
MIT LCS

2
Key Idea

Survey paper
Discusses how to access data in a P2P system
Covers four solutions
CAN
Chord
Pastry
Tapestry

3
INTRODUCTION

P2P systems are popular due to
Low startup cost
High scalability at very low cost
Use of resources that would otherwise remain
unused
Potential for greater robustness
Fully decentralized and distributed

4
The lookup problem

How do we locate data in large P2P systems?
One solution
Distributed hash tables (DHT)

5
Previous solutions (I)

Centralized database
Napster
Not scalable
Vulnerable to attacks on database

6
Previous solutions (II)

Broadcasting
Customers broadcast their requests to their
neighbors, which forward them to their own
neighbors and so on
Gnutella
Does not scale either
Broadcast messages consume too much bandwidth

7
Previous solutions (III)

Internet DNS
Organizes network nodes into an hierarchy
All searches start at top of hierarchy
Propagate down
Used by KaZaA, Grokster and others
Nodes higher in the tree do much more work than
lower nodes
Solution vulnerable to loss of root node(s)

8
Previous solutions (IV)

Freenet
Forwards queries from node to node until
requested data are found
Emphasis is on anonymity
Not performance
Unpopular documents may become inaccessible
Nobody cares!

9
DISTRIBUTED HASH TABLES

Implements primitive lookup(key)
Produces a path going from a node no to the
node holding key
Big tradeoff is between
Keeping paths short
Minimizing state information kept by nodes

10
Main design issues

Mapping keys to nodes in a balanced way
Use a hash function
Forwarding a lookup for a key to appropriate node
Find at each step a node closer to the node
holding the key
Building routing tables
Each node should have a successor

11
CAN

Uses a d-dimensional key space
Partitioned into hyper-rectangles
"Zones"
Each node manages a zone
Responsible for all keys in zone

12
Neighbors

Each node keeps track of addresses of all its
neighbors
Routing table
Neighbors are defined as nodes sharing a (d-1)
dimensional hyper-plane
Contacts with fewer dimensions in common do not
count

13
A two-dimensional example (I)
14
A two-dimensional example (II)
(1, 1)
(0, 1)
X(0, 0.5 0.5, 1)
X(0.5, 0.5 1, 1)
X(0.5, 0.25 0.75, 0.5)
X(0.75, 01, 0.5)
X(0.5, 0 0.75, 0.25)
(0, 0)
(1, 0)
In reality the state space wraps
15
A path from (0.25, 0.3) to (0.8, 0.8)
(1, 1)
(0, 1)
X(0, 0.5 0.5, 1)
X(0.5, 0.5 1, 1)
X(0, 0 0.5, 0.5)
X(0.5, 0.25 0.75, 0.5)
X(0.5, 0 0.75, 0.25)
X(0.75, 01, 0.5)
(0, 0)
(1, 0)
In reality the state space wraps
16
Lookup

Routing tries to approximate the straight path
between current zone and zone holding the key
Various optimizations attempt to reduce lookup
latency

17
Dynamic behavior

When a node joins the network
It picks random point in space
Find node managing the zone
Splits with it current zone
When a node departs
Zones are merged
More complex process

18
Fault-tolerance

When a node fails neighbor with smallest zone
takes over
Multiple failures may cause too many nodes to
handle multiple zones

19
CHORD

Assigns ID's to keys and nodes in the same
address space
ID's are organized in a ring
ID 0 follows the highest ID
Each node is responsible for all keys that
immediately precede it in the key space

20
Example
K1
N 24
N 4
N 20
K 6
K 15
N 12
K 10
21
Finger table

Each node keeps a table containing IP addresses
of nodes
Halfway around in the key space
Quarter-of-the-way around
Table has log N entries
Allows O(log N) searches

22
Partial example
N 24
N 4
N 20
N 12
23
Fault-tolerance

Each node has a successor list
Contains IP addresses of next r successors
Guarantees routing progress as long as all r
successors are not down

24
Dynamic behavior

New node n learns its place in the Chord ring by
asking any extant node to do a lookup(n)
Must also
Update successor list of its predecessor
Create its own successor list

25
PASTRY

Scalable, self-organizing, routing and object
location infrastructure
Each node has a node ID
IDs are uniformly distributed in the ID space
Includes a proximity metric to measure distances
between pairs of ID's

26
Pastry Nodes

Each node maintains three sets of nodes
Leaf set
Closest nodes in terms of node ID's
Same function as Chord's successor list
Nodes in routing table
Prefix routing (big idea)
Neighborhood set
Closest nodes in terms of proximity metric

27
Dynamic behavior

Pastry is self-organizing
Nodes come and go
Includes a seed discovery protocol

28
Prefix Routing

At each step, a node forwards an incoming request
to a node whose node id has largest common
prefix with
Destination ID 1230
Node ID 1023
Next Hop 12--

29
Routing table for node 1023
No common prefix
One common digit
Two common digits
Three common digits
0221 2230 3120
1130 1233 1302
1003 1013 1032
1020 1022
30
Routing request for node 1230
No common prefix
One common digit
Two common digits
Three common digits
0221 2230 3120
1130 1223 1302
1003 1013 1032
1020 1022
Request is always send to a node having at least
one more common prefix digit. Here it's node 1223
31
At node 1233
0221 2230 3120
1030 1130 1302
1201 1211 1220
1230 1232
No common prefix
One common digit
Two common digits
Three common digits
Node with at least one more common prefix
digitis node 1230
32
TAPESTRY

Interprets keys as sequences of digits
Incremental prefix routing
Similar to Pastry
Main contribution is emphasis on proximity
In the actual world
Reduces query latency
Makes system much more complex

33
CONCLUSIONS

Major issues include
Operational costssearches are all O(log n)
storage costs vary
Fault-tolerance and concurrent changesonly
Chord and Tapestry can handle them
Proximity routingPastry, CAN and Tapestry have
heuristics
Malicious nodesPastry checks node ID's

34
Summary of costs
CAN Chord Pastry Tapestry
Node state1 d log N log N log N
Lookup2 dN1/d log N log N log N
Join2 dN1/d d log N log2 N log2 N log2 N
1 number of other nodes known by a given

Write a Comment

User Comments (0)