Looking at the Serverside of P2P Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Looking at the Serverside of P2P Systems

Description:

User-limited capacity, particularly, number of server threads ... Different connection type, server thread number, shared object number, request number ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 32
Provided by: bil79
Category:

less

Transcript and Presenter's Notes

Title: Looking at the Serverside of P2P Systems


1
Looking at the Server-side of P2P Systems
  • Yi Qiao, Dong Lu, Fabian E. Bustamante and
    Peter A. Dinda
  • Department of Computer Science
  • Northwestern University
  • www.cs.northwestern.edu

2
What is the Server-side?
  • No architecture distinction between client and
    server for a P2P system
  • Heterogeneity of peers
  • Some peers act more like servers Server Side
  • Some act more like clients Client Side
  • Server-side is important for P2P performance
  • Little attention has been given

3
Outline
  • Background and Motivation
  • Why scheduling the server-side?
  • Traces Collection and Study
  • Scheduling Methodology
  • Evaluation
  • Conclusions

4
Background
  • Peers in a P2P data-sharing system
  • Example - Gnutella
  • Query, query answer Phase 1
  • download, upload Phase 2
  • Role as a client
  • Send queries, downloading objects
  • Role as a server
  • Answer queries, uploading objects
  • Little research attention

5
Background (Cont.)
Phase 1 Queries and query replies in the P2P
file-sharing system
Query
Query
Shark Tale ?
Taxi ?
P2
Peer 3 got it!
P3
Query Reply
No idea!
P1
Query Reply
P4
6
Background (Cont.)
Phase 2 Download/Upload shared files
P2
Give me Taxi
P3
P1
Give me Shark Tale
P4
Little attention given to the server-side so far
7
Motivation
  • Server-side is a key performance bottleneck of
    P2P data-sharing system
  • 80 of download requests get rejected due to
    saturation of server capacity Saroiu 2002
  • User-limited capacity, particularly, number of
    server threads
  • 50 of all objects downloads take more than one
    day Gummadi 2003
  • Our goal
  • Server load characterization and analysis
  • New scheduling policies to shorten average
    response time for each download

8
Challenge
  • Introduction of SRPT into web server scheduling
    has been very successful, but are more tricky for
    P2P server side
  • Requests are often not for whole objects
  • P2P servers are conservative with resource
    consumption
  • Popular P2P servers often operate under
    overloaded conditions
  • Fetch-at-most-once behavior makes object
    popularity NOT Zipf distribution Gummadi 2003
  • New scheduling policies based on P2Ps own
    characteristics are needed

9
Outline
  • Background and Motivation
  • Why scheduling the server-side?
  • Traces Collection and Study
  • Scheduling Methodology
  • Evaluation
  • Conclusions

10
Trace Collection and Study
  • Trace Collection Methodology
  • Build honey pots
  • Passive monitoring of query strings
  • Download hot contents based on query popularity
  • Run honey pots
  • Make collected objects available to the community
  • Record incoming download requests
  • Arrival time, object name, requested size,
    downloaded size, service time,
  • Findings reported here based Gnutella traces

11
Traces in the Study
Different connection type, server thread number,
shared object number, request number
12
Server Workload
  • Distribution of job interarrival time?
  • Distribution of job size?
  • What is the performance bottleneck?
  • Why scheduling?

13
Job Interarrivals
  • Job interarrivals can be well modeled by an
    exponential distribution
  • Coefficient of determination
  • Almost straight line in the semi-log plot

14
Job Arrivals are Independent
  • Effectively nil
  • Jobs arrivals are independent of each other
  • Significant difference with web server

15
Job Sizes
  • Three different job sizes
  • Full object size
  • Requested data chunk size
  • Unique for P2P server
  • A request typically only for a small chunk size
  • Served data chunk size
  • Unique for P2P server
  • Abort transfer, switch to another one
  • Known only after job is done

16
Job Sizes (Cont.)
Object Size
Requested Chunk Size
Served Chunk Size
  • Three different job sizes
  • Differs by several orders of magnitude
  • Approximated by Bounded Pareto distribution

17
Server Resource Utilization
  • Resource utilization are conservative
  • Only run at background of normal computers
  • Set upper-bound for
  • Number of server threads
  • Aggregate bandwidth usage for upload
  • For our busiest honey-pot
  • 1.2 to 20.0 CPU utilization
  • Up to 20MBytes memory usage
  • Bottleneck resource
  • The set of server threads for uploading

18
Our Scheduling Problem
Given the total number of concurrent jobs that a
server can take, how to schedule incoming jobs so
that the mean response time is minimized?
19
Outline
  • Background and Motivation
  • Why scheduling the server-side?
  • Traces Collection and Study
  • Scheduling Methodology
  • Evaluation
  • Conclusions

20
Scheduling Policies
  • Shortest Remaining Processing Time (SRPT)
  • Always choose the process with the shortest
    remaining processing time to serve
  • First-Come-First-Served (FCFS)
  • Serve incoming download requests based on arrival
    order
  • Used by Gnutella for its job scheduling
  • Processor Sharing (PS)
  • Each job gets equal amount of service time in
    turn

21
SRPT
  • Studied since the 1960s Schrage 1968
  • Used for various applications
  • Packet network scheduling Bux 1983
  • Scheduling for web servers Harchol-Balter 2001
  • Optimal for mean response time of jobs for a
    general G/G/1 queuing system
  • Problem
  • In most cases, service time is unknown until the
    job is done

22
SRPT for P2P Servers
  • Main Challenge
  • How to estimate service time for a request is not
    that clear!
  • File size / Requested Chunk size / Served chunk
    size?
  • One possible approach
  • Use request chunk size as the scheduling metric
  • SRPT-CS Uses requested chunk size
  • Two optimal approaches
  • Use served chunk size as the scheduling metric
  • SRPT-SS Uses served chunk size
  • Ideal SRPT
  • How well can they do?

23
Approximating ideal SRPT
  • Depends on the correlations between Requested
    Chunk Size, Served Chunk Size and Service time
  • But these correlations are weak
  • Why?
  • Client can exit anytime during transmission
  • Client can switch to other servers for a data
    chunk
  • Bandwidth bottlenecks exist somewhere else

24
Outline
  • Background and Motivation
  • Why scheduling the server-side?
  • Traces Collection and Study
  • Scheduling Methodology
  • Evaluation
  • Conclusions

25
Evaluation
  • Evaluation Setup
  • Using a general purpose queuing simulator
  • Various scheduling policies
  • Trace driven simulations
  • Queue capacity 500
  • System load between 0.1 and 10
  • Time slice of 0.01 seconds for PS scheduling
  • Metric
  • Mean response time
  • Rejection rate
  • Mean slowdown

26
Improved Mean Response Time
FCFS
  • Ideal SRPT is the best
  • SRPT-CS does much better than FCFS and PS

PS
SRPT-CS
SRPT-SS
SRPT
27
With Lowest Rejection Rate
FCFS
SRPT
  • SRPT-based scheduling policies actually reject
    less jobs than FCFS and PS

SRPT-CS SRPT-SS
28
Without Compromising Fairness
  • SRPT-based scheduling policies dont starve large
    jobs
  • Mean slowdown for 10 largest jobs

29
Summary
  • Server-side of P2P is critical to overall system
    performance
  • Not much can be learned from web server
    scheduling
  • SRPT-based scheduling policies can help
  • Lowest mean response time
  • Lowest rejection rate
  • Without compromising fairness
  • Chunk size is a reasonable estimator for service
    time
  • SRPT-CS outperforms FCFS and PS

30
Ongoing Work
  • Large performance gaps between SRPT-CS, SRPT-SS,
    and SRPT
  • Only SRPT-CS can be directly implemented
  • Possible solution predicting served chunk size
    and service time using time series analysis
  • Traces representativeness
  • Performance in real implementation
  • Cooperative downloading/uploading?
  • Better estimator

31
For more information
  • www.aqualab.cs.northwestern.edu
  • Please also see our related work
  • Dong Lu, Huanyuan Sheng, Peter Dinda. "Size-Based
    Scheduling Policies with Inaccurate Scheduling
    Information. In Proc. of MASCOTS, 2004.
  • Dong Lu, Peter A. Dinda, Yi Qiao, Huanyuan Sheng
    and Fabián E. Bustamante. Applications of SRPT
    Scheduling with Inaccurate Information. in Proc.
    of MASCOTS, 2004.
Write a Comment
User Comments (0)
About PowerShow.com