Components of a Scalable Distributed Relational Information Service - PowerPoint PPT Presentation

About This Presentation
Title:

Components of a Scalable Distributed Relational Information Service

Description:

Surprisingly, it is fair for heavy-tail job size distribution [Bansal and ... Numerous papers show this and attempt to fix it. 48. Fixing Simple TCP Benchmarking ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 86
Provided by: don53
Category:

less

Transcript and Presenter's Notes

Title: Components of a Scalable Distributed Relational Information Service


1
Components of a Scalable Distributed Relational
Information Service
  • Dong Lu
  • June 14, 2005

2
Outline
  • Birds Eye View
  • What is RGIS?
  • Architecture
  • What components are studied in the thesis?
  • Size-Based Scheduling With Inaccurate Info
  • Fairness and efficiency as function of
    correlation
  • Other applications beyond RGIS
  • DualPats Characterizing and Predicting TCP
    Throughput on the Wide Area Network
  • Why TCP throughput prediction?
  • Flow size / TCP throughput correlation
  • Issues with simple benchmarking
  • DualPats algorithm and dynamic rate adjustment
  • Thesis Contributions

3
RGIS
  • Grid computing
  • Providing dependable, reliable, consistent,
    pervasive and unlimited computing resources
  • RGIS Relational Grid Information Service
  • Represents globally distributed resources,
    including the network
  • Relational Model allows complex compositional
    queries
  • Relational Model is well studied large user
    population
  • RGIS servers distributed among multiple
    organizations and sites

4
Query and Update Example
  • A query example
  • Find a set of 16 Linux machines on the same LAN,
    each has memory over 1GB, they have a total
    memory of at least 32 GB, and each has a link
    capacity gt100Mb
  • An update example
  • Host A has added 1GB memory, and will be
    available from 100 PM to 600 PM central time

5
RGIS Architecture
Users
Applications
Web Interface
SOAP Interface
Canned Queries
Canned Approximate Queries
Authenticated Direct Interface
Scoping Rewrite
Content Delivery Network Interface For loose
consistency
Update Manager
Query Manager and Rewriter
Nondeterminism Rewrite
Time Bounding (And Iteration Of Query)
Updates encrypted using asymmetric cryptography
on network. Only those with appropriate keys
have access
Oracle 9i Front End transactional inserts and
updates using stored procedures, queries using
select statements (uses databases access control)
RDBMS
Oracle 9i Back End Windows,Linux,Parallel
Server,etc
site-to-site
Schema, type hierarchy, indices, PL/SQL stored
procedures for each object
6
RGIS Web Interface
7
RGIS Architecture
Users
Applications
Web Interface
SOAP Interface
Canned Queries
Canned Approximate Queries
Authenticated Direct Interface
Scoping Rewrite
Content Delivery Network Interface For loose
consistency
Update Manager
Query Manager and Rewriter
Nondeterminism Rewrite
Time Bounding (And Iteration Of Query)
Updates encrypted using asymmetric cryptography
on network. Only those with appropriate keys
have access
Oracle 9i Front End transactional inserts and
updates using stored procedures, queries using
select statements (uses databases access control)
RDBMS
Oracle 9i Back End Windows,Linux,Parallel
Server,etc
site-to-site
Schema, type hierarchy, indices, PL/SQL stored
procedures for each object
8
Query Components
  • GridG the first synthetic grid generator
  • Topology Sigmetrics Performance Evaluation
    Review, Vol 30, No. 4, 2003
  • Annotation SC03-1
  • Query rewriting techniques to trade off query
    time and the result set size
  • Nondeterministic query SC03-2
  • Scoped and approximate queries GRID03

9
Update and CDN Components
  • Size-Based Scheduling with inaccurate info to
    minimize mean update time
  • Fairness and efficiency as function of
    correlation MASCOTS04-1
  • P2P scheduling LCR04, one in submission
  • Web server scheduling, in submission
  • Other applications MASCOTS04-2
  • Characterizing and predicting TCP throughput on
    the WAN to determine update transfer time
  • ICDCS05

10
Update and CDN Components
  • Modeling and taming parallel TCP on the WAN to
    transfer updates faster
  • IPDPS05
  • Fat-tree based end-system multicast to
    disseminate update scalably
  • WCW04, one in submission

11
Outline
  • Birds Eye View
  • What is RGIS?
  • Architecture
  • What components are studied in the thesis?
  • Size-Based Scheduling With Inaccurate Info
  • Fairness and efficiency as function of
    correlation
  • Other applications beyond RGIS
  • DualPats Characterizing and Predicting TCP
    Throughput on the Wide Area Network
  • Why TCP throughput prediction?
  • Flow size / TCP throughput correlation
  • Issues with simple benchmarking
  • DualPats algorithm and dynamic rate adjustment
  • Thesis Contributions

12
Scheduling Section Outline
  • Review of Size-Based Scheduling
  • Motivation
  • Simulation Setup
  • Simulation Results
  • New Applications

13
The scheduling problem
Scheduling a general problem Goal minimize the
mean response time be fair
Updates come from CDN
Scheduler
10K
8K
6K
3K
Which update to run next?
Database
Response time the time from job arrival to its
completion
14
Review of Non-size-based scheduling
  • FCFS, PS, etc.
  • FCFS First Come First Serve
  • Intuitive
  • Easiest to implement
  • PS Processor Sharing
  • Fair all jobs accept equal resources
  • Also easy to implement

Problem Unaware of job size information, which
results in high mean response time
15
Review of size-based scheduling
  • SRPT, FSP, etc.
  • Use the job size (processing time, service time)
    information for scheduling
  • Optimal in mean response time
  • Fair?
  • Easy to implement?

We use Job Size to refer to the Processing Time
(Service Time) of the job
16
Shortest Remaining Processing Time (SRPT)
  • Always serve the job with minimum remaining
    processing time first, Preemptive scheduling
  • Yields minimum mean response time Schrage,
    Operations Research, 1968
  • Surprisingly, it is fair for heavy-tail job size
    distribution Bansal and Harchol-Balter,
    Sigmetrics 01
  • Easy to implement?
  • With accurate a priori job size information, YES
  • Otherwise, NO

17
Fair Sojourn Protocol (FSP)
  • Combined SRPT with PS, preemptive scheduling
  • Mean response time is close to that of SRPT and
    more fair than SRPT and PS Friedman, et al,
    Sigmetrics 03
  • Easy to implement?
  • With accurate a priori job size information, YES
  • Otherwise, NO

18
Scheduling Section Outline
  • Review of Size-Based Scheduling
  • Motivation
  • Simulation Setup
  • Simulation Results
  • New Applications

19
Motivation
  • Size-based scheduling requires accurate knowledge
    of job sizes
  • In practice, a priori job size information is not
    always available
  • All the previous work assumes perfect knowledge
    of job sizes a priori
  • How does performance depend on quality of job
    size information?

20
Correlation
We study the performance of Size-based
schedulers as a function of the correlation
coefficient (Pearsons R) between actual job
sizes and estimated job sizes.
21
Scheduling Section Outline
  • Review of Size-Based Scheduling
  • Motivation
  • Simulation Setup
  • Simulation Results
  • New Applications

22
Trace generator
Correlation (Pearsons R)
Distribution A
Distribution B
Trace Generator
  • X Y
  • 100
  • 300
  • . .
  • . .
  • . .
  • Correlated random pairs of X and Y
  • X has distribution A
  • Y has distribution B
  • X and Y are correlated to R

23
Trace generator algorithm
  • Algorithm Normal-To-Anything
  • First developed by Cario and Nelson, on INFORMS
    Journal on Computing 10, 1 (1998).
  • We simplified the algorithm and first introduced
    it into the simulation studies of computer systems

24
Scatter plot of example traces
Y
Y
X
X
R0.78
R0.13
25
Performance metrics
  • Mean response time Sojourn time, Turn-around
    time
  • Slowdown the ratio of response time to its size.
    Fairness metric

26
Simulator
  • Simulator
  • Supports M/G/1 and G/G/n/m queuing model
  • Simulator validation
  • Littles law
  • Repeat the simulations in the FSP paper
    Friedman, et al, Sigmetrics 03
  • Compare with available theoretical results
    Bansal and Harchol-Balter, Sigmetrics 01

27
Scheduling Policies
  • PS Processor sharing
  • Size-based scheduling policies
  • SRPT Ideal SRPT scheduler
  • SRPT-E SRPT scheduler using estimated job size
  • FSP Ideal Fair Sojourn Protocol
  • FSP-E FSP scheduler using estimated job size

Each simulation is repeated 20 times and we
present the average
28
Scheduling Section Outline
  • Review of Size-Based Scheduling
  • Motivation
  • Simulation Setup
  • Simulation Results
  • New Applications

29
Mean response time as function of R
30
Slowdown (R0.0224)
31
Slowdown (R0.239)
32
Slowdown (R0.4022)
33
Slowdown (R0.5366)
34
Slowdown (R0.7322)
35
Slowdown (R0.9779)
36
Simulation Results Conclusions
  • Performance heavily depends on correlation
  • SRPT-E and FSP-E can outperform PS given an
    effective job size estimator
  • Crossover point of performance metrics is a
    function of correlation
  • Also of job size distributions (See TR
    NWU-CS-04-33)

37
Scheduling Section Outline
  • Review of Size-Based Scheduling
  • Motivation
  • Simulation Setup
  • Simulation Results
  • New Applications

38
New Applications Web server scheduling (TR
NWU-CS-04-33)
  • Is file size a good estimator of a jobs service
    time (processing time)? Not Really (R ? 0.14)

File Size
Service time (wall clock time)
39
New Applications Web server scheduling
  • Domain-based estimator much more accurate
    prediction of the service time at low overhead

40
New Applications P2P server side scheduling (LCR
04)
  • Server side of current file sharing P2P
    applications superficially similar to web server
  • Both send back files upon requests.
  • However, P2P application cant even know the file
    size accurately a priori
  • Partial downloads
  • Our ongoing work shows that SRPT-E performs well
    using our time-series based job size estimators.

41
Scheduling Section Summary
  • Performance of size-based scheduling policies
    depends on correlation between size estimates and
    actual sizes
  • Fairness, mean response time, etc.
  • Estimator must preserve ordering of job sizes for
    high performance
  • Performance degrades as correlation degrades
  • Effective new estimators for Web and P2P

42
Outline
  • Birds Eye View
  • What is RGIS?
  • Architecture
  • What components are studied in the thesis?
  • Size-Based Scheduling With Inaccurate Info
  • Fairness and efficiency as function of
    correlation
  • Other applications beyond RGIS
  • DualPats Characterizing and Predicting TCP
    Throughput on the Wide Area Network
  • Why TCP throughput prediction?
  • Flow size / TCP throughput correlation
  • Issues with simple benchmarking
  • DualPats algorithm and dynamic rate adjustment
  • Thesis Contributions

43
DualPats Overview
  • Algorithm for predicting the TCP throughput as
    function of flow size
  • Minimal active probing
  • Dynamic probe rate adjustment
  • Explaining flow size / throughput correlation
  • Explaining why simple active probing fails
  • Large scale empirical study

44
DualPats Section Outline
  • Why TCP Throughput Prediction?
  • Particulars of Study
  • Flow Size / TCP Throughput Correlation
  • Issues with Simple Benchmarking
  • DualPats Algorithm
  • Stability and Dynamic Rate Adjustment

45
Goal
  • A library call
  • BW PredictTransfer(src,dst,numbytes)
  • Expected Time numbytes/BW
  • Ideally, we want a confidence interval
  • (BWLow,BWHigh) PredictTransfer(src,dst,numbytes,
    p)

46
Available Bandwidth
  • Maximum rate a path can offer a flow without
    slowing other flows
  • pathchar, cprobe, nettimer, delphi, IGI,
    pathchirp, pathload
  • mainly for traffic engineering
  • Available bandwidth can differ significantly from
    TCP throughput
  • Not real time, takes at least tens of seconds to
    run

47
Simple TCP Benchmarking
  • Benchmark paths with a single small probe
  • BW ProbeSize/Time
  • Widely used Network Weather Service (NWS) and
    others (Remos benchmarking collector)
  • Not accurate for large transfers on the current
    high speed Internet
  • Numerous papers show this and attempt to fix it

48
Fixing Simple TCP Benchmarking
  • Logs Sundharshan correlate real transfer
    measurements with benchmarking measurements
  • Recent transfers needed
  • Similar size transfers needed
  • Measurements at application chosen times
  • CDF-matching Swany correlate CDF of real
    transfer measurements with CDF of benchmarking
    measurements
  • Recent transfers still needed
  • Measurements at application chosen times

49
Analysis of TCP
  • Extensive research on TCP throughput modeling in
    networking community
  • Really intended to build better TCPs
  • Difficult to use models online because of hard to
    measure parameters
  • Future loss rate and RTT

50
DualPats Section Outline
  • Why TCP Throughput Prediction?
  • Particulars of Study
  • Flow Size / TCP Throughput Correlation
  • Issues with Simple Benchmarking
  • DualPats Algorithm
  • Stability and Dynamic Rate Adjustment

51
Our Study
  • PlanetLab and additional machines
  • Located all over the world
  • Measurements of throughput
  • Wide open socket buffers (1-3 MB)
  • Simple client/server
  • scp
  • GridFTP
  • Four separate sets of measurements

52
Four sets of measurements
  • Distribution set for analysis of TCP throughput
    stability and distributions
  • Correlation set for studying correlation between
    throughput and flow size, initial testing of
    algorithm
  • Verification Set test our benchmarking mechanism
  • Online Evaluation Set test our online algorithm

53
Distribution Set
  • For analysis of TCP throughput stability and
    distributions
  • 60 randomly chosen paths among PlanetLab machines
  • 1.6 million transfers (client/server)
  • 100 KB, 200 KB, 400 KB, 10 MB flows
  • 3000 consecutive transfers per pathflow size

54
Correlation Set
  • For studying correlation between throughput and
    flow size, initial testing of algorithm
  • 60 randomly chosen paths among PlanetLab machines
  • 2.4 million transfers, 270 thousand runs,
    client/server
  • 100 KB, 200 KB, 400 KB, 10 MB flows
  • Run sweep flow size for path

55
Verification Set
  • Test algorithm
  • 30 randomly chosen paths among PlanetLab machines
    and others
  • 4800 transfers, 300 runs, scp and GridFTP
  • 5 KB to 1 GB flows
  • Run sweep flow size for path

56
Online Evaluation Set
  • Test online algorithm
  • 50 randomly chosen paths among PlanetLab machines
    and others
  • 14000 transfers, scp and GridFTP
  • 40 MB or 160 MB file, randomly chosen
  • 10 days

57
DualPats Section Outline
  • Why TCP Throughput Prediction?
  • Particulars of Study
  • Flow Size / TCP Throughput Correlation
  • Issues with Simple Benchmarking
  • DualPats Algorithm
  • Stability and Dynamic Rate Adjustment

58
Strong Correlation Between Throughput and Flow
Size
Correlation and Verification Sets
59
An example of Strong Correlation
60
Why Does The Correlation Exist?
  • Slow start and user effects Zhang
  • Non-negligible startup overheads
  • Control messages in scp and GridFTP
  • Residual slow start effect
  • SACK results in slow convergence to equilibrium

61
DualPats Section Outline
  • Why TCP Throughput Prediction?
  • Particulars of Study
  • Flow Size / TCP Throughput Correlation
  • Issues with Simple Benchmarking
  • DualPats Algorithm
  • Stability and Dynamic Rate Adjustment

62
Why Simple Benchmarking Fails
Need more than one probe to capture correlation
Probes are too small
63
DualPats Section Outline
  • Why TCP Throughput Prediction?
  • Particulars of Study
  • Flow Size / TCP Throughput Correlation
  • Issues with Simple Benchmarking
  • DualPats Algorithm
  • Stability and Dynamic Rate Adjustment

64
Our Approach
Two consecutive probes, both larger than the
noise region
65
Our Approach
  • Two consecutive probes are integrated into a
    single probe
  • 400KB, 800 KB in single 800 KB probe

Probe two
Probe one
0
T1
T2
66
Our Approach
Flow size
Transfer Time
Solve For A and B
Predict Throughput For Some Other Transfer
67
Model Fit is Excellent
Low and Normally Distributed Relative Errors At
All Flow Sizes
Correlation Set
68
DualPats Section Outline
  • Why TCP Throughput Prediction?
  • Particulars of Study
  • Flow Size / TCP Throughput Correlation
  • Issues with Simple Benchmarking
  • DualPats Algorithm
  • Stability and Dynamic Rate Adjustment

69
Stability
  • How long does the TCP throughput function remain
    stable?
  • How frequently should we probe the path?
  • Whats the distribution of throughput around the
    function (i.e., the error)?

70
Throughput is Stable For Long Periods
Increasing Max/Min Throughput in Interval
Correlation Set
71
Throughput For a Given Flow Size Is Normally
Distributed In An Interval
Distribution Set
72
Online DualPats Algorithm
  • Fetch probe sequence for destination
  • Start probing process if no data exists
  • Project probe sequence ahead
  • 20 point moving average over values with current
    sampling interval
  • Apply model using projected data
  • Return result
  • confidence interval computed using normality
    assumptions

73
Dynamic Sampling Rate
  • Adjust sampling interval to correspond to the
    paths stable intervals
  • Limit rate (20 to 1200 seconds)
  • Additive increase / Additive decrease of interval
    based on difference between last two probes
  • lt 5 gt increase interval
  • gt 15 gt decrease interval

74
Evaluation
  • Slight conservative bias
  • About 90 of predictions have lt 20 error

1
Pmean error lt X
Mean abs(relative error)
Mean relative error
0.4
-0.4
0
Relative error
Online Evaluation Set
75
Section Summary
  • Algorithm for predicting the TCP throughput as
    function of flow size
  • Minimal active probing
  • Dynamic probe rate adjustment
  • Explaining flow size / throughput correlation
  • Explaining why simple active probing fails
  • Large scale empirical study

76
Outline
  • Birds Eye View
  • What is RGIS?
  • Architecture
  • What components are studied in the thesis?
  • Size-Based Scheduling With Inaccurate Info
  • Fairness and efficiency as function of
    correlation
  • Other applications beyond RGIS
  • DualPats Characterizing and Predicting TCP
    Throughput on the Wide Area Network
  • Why TCP throughput prediction?
  • Flow size / TCP throughput correlation
  • Issues with simple benchmarking
  • DualPats algorithm and dynamic rate adjustment
  • Thesis Contributions

77
Thesis Contributions
  • It is feasible to build a scalable distributed
    Relational Grid Information Service
  • RGIS architecture
  • Query rewriting
  • Trade off query time with the size of result set
  • GridG
  • First synthetic grid generator
  • Relationship between power-laws

78
Thesis Contributions
  • Size-based scheduling with imperfect info
  • DualPats monitoring and predicting TCP
    throughput
  • TameParallelTCP modeling and taming parallel TCP
  • FatNemo fat-tree based end-system multicast

79
Future work
  • Integration of research components with RGIS
    system
  • Highly dynamic grid information how to
    incorporate data from services such as RPS, NWS,
    DualPats, Remos
  • Passive monitoring of TCP throughput
  • Understanding size-based scheduling in the
    presence of backfilling

80
Acknowledgements
  • Collaborators
  • P2P scheduling Yi Qiao, Fabian Bustamante
  • FatNemo Stefan Birrer, Fabian Bustamante
  • RGIS implementation
  • Andrew Weinrich, Jack Lange, Andrew Simpson

81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
Current State
  • Prototyped RGIS system
  • Schema, stored SQL
  • Query manager/rewriter
  • Soap interface, web interface
  • Publish/subscribe based CDN
  • To be done
  • Integration of update scheduling, TCP throughput
    monitoring, end-system multicast based CDN

85
Finding Sufficiently Large Probe Size
  • Default values 400 KB / 800 KB
  • Upper bound
  • Additive increase until prediction error are less
    than threshold, all with same sign.
Write a Comment
User Comments (0)
About PowerShow.com