Making P2P Networks Scalable - PowerPoint PPT Presentation

About This Presentation
Title:

Making P2P Networks Scalable

Description:

Making P2P Networks Scalable a paper presentation by Derek Tingle – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 35
Provided by: swar95
Category:

less

Transcript and Presenter's Notes

Title: Making P2P Networks Scalable


1
Making P2P Networks Scalable
  • a paper presentation by
  • Derek Tingle

2
P2P Basics
  • Files stored on clients machines
  • Typically read only
  • Search mechanism
  • Download mechanism
  • Wildly popular

3
Gnutella
  • Decentralized
  • Unstructured
  • Flood search
  • Routing Table

4
(No Transcript)
5
Gnutella
  • Decentralized
  • Unstructured
  • Flood search
  • Routing table
  • DANGER!
  • Not scalable

6
Design Goals
  • Allow the Gnutella-like p2p to handle higher
    amount of queries
  • Make it scalable
  • Utilize heterogeneity of machines

7
Search Protocol
  • GIA search is based on random walks
  • Like floods, but less messages
  • But because random walks are blind, there are
    scaling issues
  • So GIA uses biased walks
  • Toward high degree or high capacity?

8
High degree AND high capacity
  • Using a dynamic topology adaptation algorithm
  • Ensures
  • high capacity nodes have a high degree
  • low capacity nodes are close to high capacity
    nodes
  • Level of satisfaction S
  • Measures how close the sum of capacities of a
    node's neighbors normalized by degrees is to that
    node's own capacity
  • The lower the satisfaction level, the shorter the
    adaptation interval

9
To add a node n...
  • Accept if num_nabr is still lt max_nbrs
  • Select the node with the highest degree out of
    the subset of neighbors with a capacity less than
    that of n
  • Only drop that node if it has less neighbors than
    n

10
One-hop replication
  • Each node records an index of neighbor nodes'
    content
  • Ensures that high capacity nodes can respond to a
    greater number of queries

11
Flow control
  • To avoid overloading a node
  • Only can direct queries to a neighbor if the
    neighbor is ready
  • Node uses tokens to signify it can handle queries
  • Node gives out tokens at the rate it can process
    queries
  • If queries are being queued, decrease allocation
    rate
  • Weights the allocation of tokens for capacity
  • If a node isn't using tokens, they are allocated
    to other neighbors
  • Can be piggy backed

12
Search Protocol (again)?
  • Biased random walks aren't random
  • Send queries to highest capacity neighbor with
    tokens
  • Time To Live decremented at each node
  • Book-keeping limits same path traversal
  • MAX_RESPONSES decremented for each found answer
  • Append address of owning node to the forwarded
    query

13
Query Resilience
  • Can't let a query die with a node
  • Keep-alive messages
  • query responses
  • dummy query responses
  • Originator can resend query if no keep-alive
    messages arrive for a while
  • When the topology adapts, the previous
    connections are maintained for a while

14
Simulations
  • GIA compared to
  • Flood
  • Random Walks over Random Topologies
  • Supernode mechanisms
  • queries only flooded between supernodes

15
Assumptions
  • All nodes produce queries at same rate
  • Capacity number of messages processed per unit
    time
  • queues have infinite length
  • specific keyword searches
  • min_alloc min(C/num_nbrs) 4
  • For Flood and Super
  • average diameter is 7
  • TTL is 10
  • Look at relative performance, not absolute

16
Metrics
  • Success rate fraction of queries issued that
    reach the file
  • hop-count
  • delay time taken from query's start to finish
  • Collapse Point (CP) node query rate at point
    beyond which success rate drops below 90
  • Average hop-count before collapse

17
Performance Comparison
  • Search terminates after finding a single answer
  • 5000 and 10,000 nodes for each system
  • .01, .05, .1, .5, 1 replication rates

18
Performance results
  • RWRT better than Flood at high replication, equal
    at low replication
  • GIA has higher hop counts than Flood and Super
  • GIA hop counts lower as replication goes up
  • Flood and Super aren't scalable... duh.

19
Multiple Search Responses
  • Same tests, MAX_RESPONSES 1, 10, 20
  • Flood and Super unchanged
  • GIA and RWRT decline as M_R increases

20
(No Transcript)
21
Node Failure model
  • Force nodes to fail at a uniformly random time
    between 0 and MAXLIFETIME
  • MAXLIFTIME 10s, 100s, 1000s, forever
  • Even at 10s, GIA is 2-4 orders of magnitude
    better than RWRT, Super, and Flood when they
    aren't fialing.

22
Types of P2P searching
  • Centralized (Napster)?
  • Based on user provided file lists
  • Decentralized
  • Queries are distributed to peers
  • Unstructured (Gnutella)?
  • Structured (Chord)?

23
Distributed Hash Tables
  • Pros
  • Scalable
  • Quick lookup
  • O(log n) steps
  • O(n) steps for Gnutella
  • Can find needles
  • Cons (why not DHTs)
  • P2P Clients are transient
  • Require O(log n) repair operations after each
    failure
  • DHTs only support exact match searches
  • P2Ps look for hay
  • (Not really a con)?

24
Analysis and Comparison of P2P Search Models
  • Dimitrios Tsoumakos
  • Nick Roussopoulos

25
Blind
  • Gnutella (flood)?
  • Modified-BFS
  • Iterative Deepening
  • Random Walks

26
Informed
  • Gnutella2 (Super-peer)?
  • Intelligent-BFS
  • APS
  • Local Indices
  • Routing Indices
  • Distributed Resource Location Protocol
  • Gnutella with Shortcuts
  • GIA

27
Gnutella2
  • Uses super-peers (hubs)?
  • They act as local servers for their peers
  • Hubs are connected
  • Queries the hubs sequentially

28
Intelligent-BFS
  • Query similarity metric to find similar queries
  • Forwards to neighbors most likely to answer that
    query
  • Focuses on object discovery rather than message
    reduction
  • Increased number of hits
  • Does not handle node departures well
  • Assumes a node specializes in one file type

29
APS
  • Uses indices to weight random walks
  • Each index value represents a query for a
    specific object directed toward a specific node
  • Index value is raised or lowered based on outcome
    of query
  • Optimistic and pessimistic update approaches
  • Originating node sends query to all neighbors,
    those neighbors send query to one neighbor

30
Local Indices
  • Each node indexes objects stored on nodes within
    a radius r and can answer queries for them
  • A BFS like search is performed
  • Queries hop a distance of 2r1 nodes
  • Accuracy and hits are very high
  • Decreases actual processing time
  • Floods the network with messages
  • Churn is very costly b/c flooding is used to
    update the repository for all joins/leaves

31
Routing Indices
  • Files are assumed to fall into themes
  • Each node stores the number of files of each
    theme reachable from each outgoing path
  • Three functions used to determine best outgoing
    path
  • Queries forwarded to best outgoing path
  • Flooding is used for creation and update, so
    serious issues with dynamic networks
  • Bloom filters...

32
Distributed Resource Location Protocol
  • Initially, random flooding is used to find
    objects
  • When an object is discovered, the query
    backtracks, storing the location of the found
    object on those nodes
  • If a node knows where a queried object is
    located, it can directly contact that node
  • Depending on specificity of queries, only one
    replica of a certain object is ever found
  • In a dynamic network, much flooding

33
Gnutella with Shortcuts
  • Uses standard flooding initially
  • If a peer provides an answer, it is indexed on
    the requesting nodes
  • Nodes forward queries to the ranked shortcuts
    first, then flood if necessary
  • Shortcuts ranked by success rate
  • Very high success rate
  • Works well when users make related queries

34
Results
  • All algorithms that implement flooding in some
    fashion have high success rates
  • Systems that use shortcuts aren't hurt as badly
    by departures as expected, because the more flood
    searches utilized, the more accurate the
    shortcuts
  • GIA is middle of the pack
  • No collapse point test
Write a Comment
User Comments (0)
About PowerShow.com