On Unbiased Sampling for Unstructured PeertoPeer Networks - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

On Unbiased Sampling for Unstructured PeertoPeer Networks

Description:

Existing Sampling Techniques-RW. Properties: Bias towards high degree peers ... Query y for y's degree. Accept y as the next step with Prob. How to handle dynamics? ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 23
Provided by: YL3
Category:

less

Transcript and Presenter's Notes

Title: On Unbiased Sampling for Unstructured PeertoPeer Networks


1
On Unbiased Sampling for Unstructured
Peer-to-Peer Networks
  • Daniel Stutzbach, Rea Rejaie, Nick Duffield,
    Subhabrata Sen, Walter Willinger

IMC 2006
2
Outline
  • Introduction
  • Existing Sampling Techniques
  • MRWB Sampling Technique
  • Evaluation
  • Discussions

3
Sampling in P2P Systems
  • Difficult to capture global behavior of p2p
    systems
  • Large scale - Millions of peers
  • Dynamics
  • Sampling in p2p systems
  • Explore part of system
  • Select representative samples of peer properties

4
Challenges in Sampling P2P Systems
  • Obtain representative samples
  • Unbiased sampling
  • Select samples uniformly at random
  • Biased sampling in p2p systems due to
  • Temporal dynamics
  • Topological structure

5
Sampling Bias Temporal Dynamics
  • General model of p2p system
  • Overlay network G(V, E)
  • Time t Gt(Vt, Et)
  • Sampling time ?
  • Sample set
  • Peer sessions are highly skewed
  • Bias towards short-lived peers (lifetime lt?)
  • 50 long-lived peers stay all the time
  • 50 short-lived peers lifetime lt?, replaced by
    short-lived peers
  • Uniform sampling, times
  • More than 50 short-lived peers in sample set

6
Sampling Bias Temporal Dynamics
  • Bias can be avoid!
  • Sample peer property instead of peer
  • It must be possible to sample from the same peer
    more than once at different points in time
  • Vi(t1) ltgt Vi(t2)

7
Sampling Bias Topological Structure
  • Basic operations in p2p sampling
  • Query known peers about neighbors
  • Bias towards nodes with high degrees

8
Existing Sampling Techniques-BFS
  • Breath-first Search (BFS)
  • Crawl a portion of overlay topology
  • Samples are unique peers
  • Problems
  • Bias towards short-lived peers
  • Bias towards high degree peers
  • Sample peers are correlated by neighboring
    relationship

9
Existing Sampling Techniques-RW
  • Random Walk (RW)
  • P transition matrix

10
Existing Sampling Techniques-RW
  • Properties
  • Bias towards high degree peers
  • No correlation between selected peers
  • Bias due to temporal dynamics is not significant

11
MRWB Sampling Technique
  • A variant of random walk
  • No correlation between selected peers
  • Less bias due to temporal dynamics
  • Decrease bias toward high degree peers by
    designing new transition matrix

12
Adjusting Degree Bias
  • Terminology
  • P transition matrix of regular random walk
  • Q new transition matrix
  • Vector target stationary distribution
  • Goal
  • Design Q based on Metropolis-Hastings method
    (Hastings 1970)

13
Metropolis-Hastings Method
  • Given

14
Metropolized Random Walk (MRW)
  • Node x selects neighbor y based on P(x,y)
  • Query y for ys degree
  • Accept y as the next step with Prob.
  • How to handle dynamics?
  • Node y may not response
  • Backtracking (MRWB)

15
Metropolized Random Walk with Backtracking (RMWB)
  • Maintain stack of visited peers
  • Latest visited peer is on top of stack
  • If no neighbor of x response
  • Refresh xs neighbor list
  • If neighbor list is empty, pop x out of stack
  • If stack is empty, random walk fails

No response from y
Query new neighbor z
Query y

x is last step
Push y to stack
Pop y out of stack
Push z to stack
16
Performance Evaluation
  • Fundamental properties that interact with walk
  • Degree
  • Session lengths
  • Query latency
  • Expected property get from crawling
  • Summary statistic D
  • S(x) CDF of sampled property x
  • E(x) CDF of expected property x
  • D max(S(x)-E(x))
  • D gt 0.1 ? significant bias
  • D lt 0.01 ? very little bias

17
Performance Under Different Session Length
  • Three distributions of session lengths
  • Pareto
  • Exponential
  • Weibull

MRWB is affected primarily by the rate of local
variation in the degree ratio relative to the
time required to query peers
18
Performance Under Different Topologies
  • Peer discovery mechanisms
  • Random Oracle
  • FIFO
  • Soft State
  • History
  • 90 invalid info.

19
Empirical Results
20
Determine Walk Length
r O(log V )
Suggest r 25
21
Discussions
  • Low efficiency
  • 4 if r 25
  • Bias correlated to large scale property
  • Clustering
  • Can MRWB discover global properties?
  • Network diameter?
  • Number of peers?

22
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com