Efficient Search in Peer to Peer Networks - PowerPoint PPT Presentation

About This Presentation

Title:

Efficient Search in Peer to Peer Networks

Description:

Need to find some middle ground between ... Current Reason for Inefficiency Suggested Improvement Techniques Problem Framework Problem Framework ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 37

Provided by: ece75

Learn more at: https://eecs.ceas.uc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Efficient Search in Peer to Peer Networks

1
Efficient Search in Peer to Peer Networks

By
Beverly Yang
Hector Garcia-Molina
Presented By
Anshumaan Rajshiva
Date May 20,2002

2
P2P Networks

Distributed systems in which nodes of equal roles
and capabilities exchange information and
services directly with each other.

3
Key Challenges

Efficient techniques for search and retrieval of
data.
Best search techniques for a system depends on
the needs of the application.
Current search techniques in loose P2P systems
tend to be very inefficient, either generating
too much load on the system, or providing for a
very bad user experience.

4
Current Reason for Inefficiency

Queries are processed by more nodes than desired.

5
Suggested Improvement

Processing queries through fewer nodes.

6
Techniques

Iterative Deepening
Directed BFS
Local Indices

7
Problem Framework

P2P Undirected graph
Vertices nodes in the n/w
Edges Open connections between
neighbors.
Message will travel from A to B in hops.
Length of the path Number of hops
Source of query Node submitting the query

8
Problem Framework

When a node receives a query it should process
the query locally and respond to the
query/forward/drop the query
Address of the source node will be unknown to the
responding node (scheme used by Gnutella)

9
Metrics

Cost Average Aggregate Bandwidth
Average Aggregate Processing Cost
Quality of Results Number of Results
Satisfaction of the query

10
Cost

Message propagates across nodes,each node spend
some processing resources on behalf of the query
Main cost described in terms of bandwidth and
processing cost

11
Cost

Average Aggregate Bandwidth The average over a
set of representative queries of the aggregate BW
consumed(in bytes) over each edge on behalf of
the query
Average Aggregate Processing Cost The average
over a set of representative queries of the
aggregate processing power consumed at each node
on behalf of the query

12
Quality of Results

Results from the perspective of user
Number of results the size of total result set
Satisfaction of the querya query is satisfied if
Z or more results are returned, where Z is some
value specified by user
Time to satisfaction how long the user must wait
for the Zth result to arrive

13
Current Techniques

Gnutella BFS technique is used with depth limit
of D, where D TTL of the message.At all levels
ltD query is processed by each node and results
are sent to source and at level D query is
dropped.
Freenet uses DFS with depth limit D.Each node
forwards the query to a single neighbor and waits
for a definite response from the neighbor before
forwarding the query to another neighbor(if the
query was not satisfied), or forwarding the
results back to the query source(if query was
satisfied).

14
Stop and Think

Quality of results measured only by number of
results then BFS is ideal
If Satisfaction is metrics of choice BFS wastes
much bandwidth and processing power
With DFS each node processes the query
sequentially,searches can be terminated as soon
as the query is satisfied, thereby minimizing
cost.But poor response time due to the above

15
Broadcast Policy

BFS and DFS falls on opposite extremes of
bandwidth/processing cost and response time.
Need to find some middle ground between the two
extremes, while maintaining quality of results.

16
Iterative Deepening

When satisfaction is the metric of choice
Multiple BFS are initiated with successively
larger depths, until query is satisfied or the
maximum depth limit D is reached

17
Iterative Deepening

System wide policy specifying at what depth the
iterations are to occur
Last depth in policy must be set to D
A waiting period W ( time between successive
iterations in the policy)must be specified

18
Working of Iterative Deepening

Policy Pa,b,c
S initiates a BFS of depth a by sending out a
query message with TTLa to all its neighbors
Once a node at depth a receives and process the
message, instead of dropping it, the node will
store the message temporarily
Query becomes frozen there at all nodes a hops
away from S (Frontier nodes)
S receives response from those nodes that have
processed the query so far.

19
Working of Iterative Deepening

After waiting for time W if S finds that the
query has already been satisfied, then it does
nothing.
Otherwise, if the query is not yet satisfied,s
will start the next iteration, initiating BFS at
depth b.
S send out a resend message with TTLa.
A node that receives a resend message,simply
unfreeze the query(stored temporarily) and
forward the same with TTL b-a to its neighbors.
This process continues in the similar fashion
till TTLD is reached.At depth D, the query is
dropped

20
Working of Iterative Deepening

To identify queries with Resend messages, every
query is assigned a system wide unique
identifier.
The resend message will contain the identifier of
the query it is representing and nodes at the
frontier of a search will know which query to
unfreeze by inspecting this identifier.

21
Iterative Deepening
Source
Node3
Node4
Level 2
22
Directed BFS

If minimizing response time is important then
Directed BFS.
Strategy used is to send queries to a subset of
nodes that will return many returns quickly by
intelligently selecting those nodes based on some
parameters.
For this purpose, a node will maintain statistics
on its neighbors

23
Directed BFS

These statistics will be based on the number of
results that were received through the neighbors
for past queries
By sending the queries to small subset of the
nodes, the cost incurred will be reduced
significantly
The quality of results is not decreased
significantly,provided we make neighbor selection
intelligently

24
Local Indices

A node maintains an index over the data of each
node within r hops of itself, where r is a system
wide variable called radius
When a node receives a Query message, it can then
process the query on behalf of every node within
r hops of itself
Collections of many nodes can be searched by
processing the query at few nodes, while keeping
the cost low

25
Local Indices

R should be small.
The index will be small- typically of the order
of 50 KB- independent of the size of the network

26
Working of Local Indices

Policy specifies at which depth query will be
processed
To create and maintain the indices at each node
All nodes at depths not listed in the policy will
simply forward the query to the next depth

27
Maintaining Indices

Joining a new node sends a join message with
TTLr and all the nodes within r hops update
their indices.
Join message contains the metadata about the
joining node
When a node receives this join message it, in
turn, send join message containing its meta data
directly to the new node.New node updates its
indices

28
Maintaining Indices

Node dies Other nodes update their indices based
on the timeouts
Updating the node When a node updates its
collection, his node will send out a small update
message with TTL r, containing the metadata of
the affected item.All nodes receiving this
message subsequently update their index.

29
Results for Iterative Deepening
30
Results for Iterative Deepening
31
Results for Directed BFS
32
Results for Directed BFS
33
Results for Local Indices
34
Conclusions

Compared to current techniques used in existing
systems, the discussed techniques greatly reduce
the aggregate cost of processing query over the
entire system, while maintaining the quality of
results
Schemes are simple and practical to implement on
the existing systems.

35
Conclusions
36
References