Title: Search and Replication in Unstructured Peer-to-Peer Networks
1Search and Replication in Unstructured
Peer-to-Peer Networks
- Pei Cao, Christine Lv., Edith Cohen, Kai Li and
Scott Shenker - ICS 2002
2Outline
- Brief survey of P2P architectures
- Evaluation Methodology
- Search Methods
- Replication
- Conclusions
3Peer-to-Peer Networks
- Peers are connected by an overlay network.
- Users cooperate to share files (e.g., music,
videos, etc.) - Dynamic nodes join or leave frequently
4P2P Network Architectures I
- Centralized
- Use of central directory server (CDS)
- Peers query to the CSD to find other peers that
hold the desired object - Pros very efficient
- Cons poorly scales
- single point of failure
5P2P Network Architectures II
- Decentralized No central directory server
- But structured
- P2P network topology is tightly controlled
- Files are placed at specified locations
- Unstructured
- No control in Network topology or file placement
6P2P Network Architectures III
- Decentralized but Structured
- loose structured
- Placement of files is based on hints
- tight structure
- Precisely declare
- structure of P2P network and
- file placement
- Use of distributed hash table
- Pros Efficient satisfaction of queries
- Good scaling
- Cons No proof it works
7P2P Network Architectures IV
- Decentralized and Unstructured
- Placement of files not based on topology
knowledge - Finding files
- Node queries neighbors (usually using flooding)
- Pros extremely resilient to network changes
- Cons extremely unscalable
- generates large loads
8Evaluation Methodology I
- Terminology
- Network Topology
- instant graph formed by nodes in the network
- Query Distribution
- frequency of lookups to files
- Replication Distribution
- percentage of nodes that have a particular file
9Evaluation Methodology II
- Network Topologies
- Powel-Law Random Graph (PLRG)
- Max node degree 1746, median 1 average 4.46
- Normal Random Graph (Random)
- Average and median node degree is 4
- Gnutella graph (Gnutella)
- Oct 2000 snapshot
- Max degree 136, median 2, average 5.5
- Two-dimensional Grid
- 100x100 ? 10000 nodes
10Evaluation Methodology III
- Object query distribution qi
- Uniform
- Zipf-like
- Object replication density distribution ri
- Uniform
- Proportional ri ? qi
- Square-Root ri ? ? qi
11Evaluation Methodology IV
- Metrics
- User aspects
- Pr(success)
- hops
- Load aspects
- Average messages per node
- nodes visited
- Peak messages
12Limitation of Flooding I
- Gnutella uses TTL to check hops queries travel
- Problem
- Hard to choose TTL
- For objects that are widely present in the
network, small TTLs suffice - For objects that are rare in the network, large
TTLs are necessary - Number of query messages grow exponentially as
TTL grows
13Limitation of Flooding II
- Node may receive the same messages more than once
- Need for duplication detection mechanisms
- Still duplication increases as TTL increases in
flooding
14Limitation of Flooding Conclusion
- Flooding increases per-node overhead
- Need for more scalable search methods
- Expanding Ring
- Random Walks
15Expanding Ring
- Adaptively Adjust TTL
- Multiple floods start with TTL1 increment TTL
by 2 each time until search succeeds
Still have duplicate messages
16Random Walk
- Simple random walk
- Takes too long to find anything
- Multiple-walker random walk
- K walkers after each walking T steps visits as
many nodes as 1 walker walking KT steps - More messages ? more overhead
- When to terminate the search
- TTL
- Checking check back with query originator once
every C steps
17Search Traffic Comparison
18Search Delay Comparison
19Lessons Learned about Search Methods
- Key Cover the right number of nodes as quickly
as possible and with as little overhead as
possible - Pay Attention to
- Adaptive termination
- Minimize message duplication
- Small expansion in each step
20Replication
- In unstructured P2P systems, search success is
essentially about coverage visiting enough nodes
to find the object gt replication density matters - Goal minimize average search size (number of
probes till query is satisfied) - Theoretical Optimal copy everything everywhere
- Limited node storage
21Replication Strategies
- Uniform Replication
- pi 1/m
- Simple, resources are divided equally
- Proportional Replication
- pi qi
- Fair, resources per item proportional to demand
- Reflects current P2P practices
22Square-Root Replication
- pi is proportional to square-root(qi)
- Lies In-between Uniform and Proportional
23Achieving Square-Root Replication I
- Assuming that each query keeps track the number
of probes needed - Store an object at a number of nodes that is
proportional to the number of probes - Two implementations
- Path replication store the object along the path
of a successful walk - Random replication store the object randomly
among nodes visited by the agents
24Achieving Square-Root Replication II
25Evaluation of Replication Methods I
- Metrics
- Overall message traffic
- Search delay
- Dynamic simulation
- Assume Zipf-like object query probability
- 5 query/sec Poisson arrival
- Results are during 5000sec-9000sec
- Search method 32-walkers random walk with state
keeping and check every 4 steps
26Evaluation of Replication Methods II
Square-Root Replication reduces search traffic
27Evaluation of Replication Methods III
28Conclusions
- Multi-walker random walk scales much better than
flooding - Can find data more quickly
- Reduces the traffic overload
- Square-root replication distribution is desirable
- Minimizes search delay
- Minimizes the overall search traffic