Efficient Search in Peer to Peer Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient Search in Peer to Peer Networks

Description:

Need to find some middle ground between ... Current Reason for Inefficiency Suggested Improvement Techniques Problem Framework Problem Framework ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 37
Provided by: ece75
Learn more at: https://eecs.ceas.uc.edu
Category:

less

Transcript and Presenter's Notes

Title: Efficient Search in Peer to Peer Networks


1
Efficient Search in Peer to Peer Networks
  • By
  • Beverly Yang
  • Hector Garcia-Molina
  • Presented By
  • Anshumaan Rajshiva
  • Date May 20,2002

2
P2P Networks
  • Distributed systems in which nodes of equal roles
    and capabilities exchange information and
    services directly with each other.

3
Key Challenges
  • Efficient techniques for search and retrieval of
    data.
  • Best search techniques for a system depends on
    the needs of the application.
  • Current search techniques in loose P2P systems
    tend to be very inefficient, either generating
    too much load on the system, or providing for a
    very bad user experience.

4
Current Reason for Inefficiency
  • Queries are processed by more nodes than desired.

5
Suggested Improvement
  • Processing queries through fewer nodes.

6
Techniques
  • Iterative Deepening
  • Directed BFS
  • Local Indices

7
Problem Framework
  • P2P Undirected graph
  • Vertices nodes in the n/w
  • Edges Open connections between
    neighbors.
  • Message will travel from A to B in hops.
  • Length of the path Number of hops
  • Source of query Node submitting the query

8
Problem Framework
  • When a node receives a query it should process
    the query locally and respond to the
    query/forward/drop the query
  • Address of the source node will be unknown to the
    responding node (scheme used by Gnutella)

9
Metrics
  • Cost Average Aggregate Bandwidth
  • Average Aggregate Processing Cost
  • Quality of Results Number of Results
  • Satisfaction of the query

10
Cost
  • Message propagates across nodes,each node spend
    some processing resources on behalf of the query
  • Main cost described in terms of bandwidth and
    processing cost

11
Cost
  • Average Aggregate Bandwidth The average over a
    set of representative queries of the aggregate BW
    consumed(in bytes) over each edge on behalf of
    the query
  • Average Aggregate Processing Cost The average
    over a set of representative queries of the
    aggregate processing power consumed at each node
    on behalf of the query

12
Quality of Results
  • Results from the perspective of user
  • Number of results the size of total result set
  • Satisfaction of the querya query is satisfied if
    Z or more results are returned, where Z is some
    value specified by user
  • Time to satisfaction how long the user must wait
    for the Zth result to arrive

13
Current Techniques
  • Gnutella BFS technique is used with depth limit
    of D, where D TTL of the message.At all levels
    ltD query is processed by each node and results
    are sent to source and at level D query is
    dropped.
  • Freenet uses DFS with depth limit D.Each node
    forwards the query to a single neighbor and waits
    for a definite response from the neighbor before
    forwarding the query to another neighbor(if the
    query was not satisfied), or forwarding the
    results back to the query source(if query was
    satisfied).

14
Stop and Think
  • Quality of results measured only by number of
    results then BFS is ideal
  • If Satisfaction is metrics of choice BFS wastes
    much bandwidth and processing power
  • With DFS each node processes the query
    sequentially,searches can be terminated as soon
    as the query is satisfied, thereby minimizing
    cost.But poor response time due to the above

15
Broadcast Policy
  • BFS and DFS falls on opposite extremes of
    bandwidth/processing cost and response time.
  • Need to find some middle ground between the two
    extremes, while maintaining quality of results.

16
Iterative Deepening
  • When satisfaction is the metric of choice
  • Multiple BFS are initiated with successively
    larger depths, until query is satisfied or the
    maximum depth limit D is reached

17
Iterative Deepening
  • System wide policy specifying at what depth the
    iterations are to occur
  • Last depth in policy must be set to D
  • A waiting period W ( time between successive
    iterations in the policy)must be specified

18
Working of Iterative Deepening
  • Policy Pa,b,c
  • S initiates a BFS of depth a by sending out a
    query message with TTLa to all its neighbors
  • Once a node at depth a receives and process the
    message, instead of dropping it, the node will
    store the message temporarily
  • Query becomes frozen there at all nodes a hops
    away from S (Frontier nodes)
  • S receives response from those nodes that have
    processed the query so far.

19
Working of Iterative Deepening
  • After waiting for time W if S finds that the
    query has already been satisfied, then it does
    nothing.
  • Otherwise, if the query is not yet satisfied,s
    will start the next iteration, initiating BFS at
    depth b.
  • S send out a resend message with TTLa.
  • A node that receives a resend message,simply
    unfreeze the query(stored temporarily) and
    forward the same with TTL b-a to its neighbors.
  • This process continues in the similar fashion
    till TTLD is reached.At depth D, the query is
    dropped

20
Working of Iterative Deepening
  • To identify queries with Resend messages, every
    query is assigned a system wide unique
    identifier.
  • The resend message will contain the identifier of
    the query it is representing and nodes at the
    frontier of a search will know which query to
    unfreeze by inspecting this identifier.

21
Iterative Deepening
Source
Node3
Node4
Level 2
22
Directed BFS
  • If minimizing response time is important then
    Directed BFS.
  • Strategy used is to send queries to a subset of
    nodes that will return many returns quickly by
    intelligently selecting those nodes based on some
    parameters.
  • For this purpose, a node will maintain statistics
    on its neighbors

23
Directed BFS
  • These statistics will be based on the number of
    results that were received through the neighbors
    for past queries
  • By sending the queries to small subset of the
    nodes, the cost incurred will be reduced
    significantly
  • The quality of results is not decreased
    significantly,provided we make neighbor selection
    intelligently

24
Local Indices
  • A node maintains an index over the data of each
    node within r hops of itself, where r is a system
    wide variable called radius
  • When a node receives a Query message, it can then
    process the query on behalf of every node within
    r hops of itself
  • Collections of many nodes can be searched by
    processing the query at few nodes, while keeping
    the cost low

25
Local Indices
  • R should be small.
  • The index will be small- typically of the order
    of 50 KB- independent of the size of the network

26
Working of Local Indices
  • Policy specifies at which depth query will be
    processed
  • To create and maintain the indices at each node
  • All nodes at depths not listed in the policy will
    simply forward the query to the next depth

27
Maintaining Indices
  • Joining a new node sends a join message with
    TTLr and all the nodes within r hops update
    their indices.
  • Join message contains the metadata about the
    joining node
  • When a node receives this join message it, in
    turn, send join message containing its meta data
    directly to the new node.New node updates its
    indices

28
Maintaining Indices
  • Node dies Other nodes update their indices based
    on the timeouts
  • Updating the node When a node updates its
    collection, his node will send out a small update
    message with TTL r, containing the metadata of
    the affected item.All nodes receiving this
    message subsequently update their index.

29
Results for Iterative Deepening
30
Results for Iterative Deepening
31
Results for Directed BFS
32
Results for Directed BFS
33
Results for Local Indices
34
Conclusions
  • Compared to current techniques used in existing
    systems, the discussed techniques greatly reduce
    the aggregate cost of processing query over the
    entire system, while maintaining the quality of
    results
  • Schemes are simple and practical to implement on
    the existing systems.

35
Conclusions
36
References
  • Beverly Yang, Hector Garcia Molina Efficient
    search in peer to peer network, Available at
    http//dbpubs.stanford.edu/pub/2001-47
Write a Comment
User Comments (0)
About PowerShow.com