Title: On Unbiased Sampling for Unstructured PeertoPeer Networks
1On Unbiased Sampling for Unstructured
Peer-to-Peer Networks
- Daniel Stutzbach, Rea Rejaie, Nick Duffield,
Subhabrata Sen, Walter Willinger
IMC 2006
2Outline
- Introduction
- Existing Sampling Techniques
- MRWB Sampling Technique
- Evaluation
- Discussions
3Sampling in P2P Systems
- Difficult to capture global behavior of p2p
systems - Large scale - Millions of peers
- Dynamics
- Sampling in p2p systems
- Explore part of system
- Select representative samples of peer properties
4Challenges in Sampling P2P Systems
- Obtain representative samples
- Unbiased sampling
- Select samples uniformly at random
- Biased sampling in p2p systems due to
- Temporal dynamics
- Topological structure
5Sampling Bias Temporal Dynamics
- General model of p2p system
- Overlay network G(V, E)
- Time t Gt(Vt, Et)
- Sampling time ?
- Sample set
- Peer sessions are highly skewed
- Bias towards short-lived peers (lifetime lt?)
- 50 long-lived peers stay all the time
- 50 short-lived peers lifetime lt?, replaced by
short-lived peers - Uniform sampling, times
- More than 50 short-lived peers in sample set
6Sampling Bias Temporal Dynamics
- Bias can be avoid!
- Sample peer property instead of peer
- It must be possible to sample from the same peer
more than once at different points in time - Vi(t1) ltgt Vi(t2)
7Sampling Bias Topological Structure
- Basic operations in p2p sampling
- Query known peers about neighbors
- Bias towards nodes with high degrees
8Existing Sampling Techniques-BFS
- Breath-first Search (BFS)
- Crawl a portion of overlay topology
- Samples are unique peers
- Problems
- Bias towards short-lived peers
- Bias towards high degree peers
- Sample peers are correlated by neighboring
relationship
9Existing Sampling Techniques-RW
- Random Walk (RW)
- P transition matrix
-
-
10Existing Sampling Techniques-RW
- Properties
- Bias towards high degree peers
- No correlation between selected peers
- Bias due to temporal dynamics is not significant
11MRWB Sampling Technique
- A variant of random walk
- No correlation between selected peers
- Less bias due to temporal dynamics
- Decrease bias toward high degree peers by
designing new transition matrix
12Adjusting Degree Bias
- Terminology
- P transition matrix of regular random walk
- Q new transition matrix
- Vector target stationary distribution
- Goal
- Design Q based on Metropolis-Hastings method
(Hastings 1970)
13Metropolis-Hastings Method
14Metropolized Random Walk (MRW)
- Node x selects neighbor y based on P(x,y)
- Query y for ys degree
- Accept y as the next step with Prob.
- How to handle dynamics?
- Node y may not response
- Backtracking (MRWB)
15Metropolized Random Walk with Backtracking (RMWB)
- Maintain stack of visited peers
- Latest visited peer is on top of stack
- If no neighbor of x response
- Refresh xs neighbor list
- If neighbor list is empty, pop x out of stack
- If stack is empty, random walk fails
No response from y
Query new neighbor z
Query y
x is last step
Push y to stack
Pop y out of stack
Push z to stack
16Performance Evaluation
- Fundamental properties that interact with walk
- Degree
- Session lengths
- Query latency
- Expected property get from crawling
- Summary statistic D
- S(x) CDF of sampled property x
- E(x) CDF of expected property x
- D max(S(x)-E(x))
- D gt 0.1 ? significant bias
- D lt 0.01 ? very little bias
17Performance Under Different Session Length
- Three distributions of session lengths
- Pareto
- Exponential
- Weibull
MRWB is affected primarily by the rate of local
variation in the degree ratio relative to the
time required to query peers
18Performance Under Different Topologies
- Peer discovery mechanisms
- Random Oracle
- FIFO
- Soft State
- History
- 90 invalid info.
19Empirical Results
20Determine Walk Length
r O(log V )
Suggest r 25
21Discussions
- Low efficiency
- 4 if r 25
- Bias correlated to large scale property
- Clustering
- Can MRWB discover global properties?
- Network diameter?
- Number of peers?
22Thanks!