Components of a Scalable Distributed Relational Information Service

About This Presentation

Title:

Components of a Scalable Distributed Relational Information Service

Description:

Surprisingly, it is fair for heavy-tail job size distribution [Bansal and ... Numerous papers show this and attempt to fix it. 48. Fixing Simple TCP Benchmarking ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 86

Provided by: don53

Learn more at: https://users.cs.northwestern.edu

Category:

more less

Transcript and Presenter's Notes

Title: Components of a Scalable Distributed Relational Information Service

1
Components of a Scalable Distributed Relational
Information Service

Dong Lu
June 14, 2005

2
Outline

Birds Eye View
What is RGIS?
Architecture
What components are studied in the thesis?
Size-Based Scheduling With Inaccurate Info
Fairness and efficiency as function of
correlation
Other applications beyond RGIS
DualPats Characterizing and Predicting TCP
Throughput on the Wide Area Network
Why TCP throughput prediction?
Flow size / TCP throughput correlation
Issues with simple benchmarking
DualPats algorithm and dynamic rate adjustment
Thesis Contributions

3
RGIS

Grid computing
Providing dependable, reliable, consistent,
pervasive and unlimited computing resources
RGIS Relational Grid Information Service
Represents globally distributed resources,
including the network
Relational Model allows complex compositional
queries
Relational Model is well studied large user
population
RGIS servers distributed among multiple
organizations and sites

4
Query and Update Example

A query example
Find a set of 16 Linux machines on the same LAN,
each has memory over 1GB, they have a total
memory of at least 32 GB, and each has a link
capacity gt100Mb
An update example
Host A has added 1GB memory, and will be
available from 100 PM to 600 PM central time

5
RGIS Architecture
Users
Applications
Web Interface
SOAP Interface
Canned Queries
Canned Approximate Queries
Authenticated Direct Interface
Scoping Rewrite
Content Delivery Network Interface For loose
consistency
Update Manager
Query Manager and Rewriter
Nondeterminism Rewrite
Time Bounding (And Iteration Of Query)
Updates encrypted using asymmetric cryptography
on network. Only those with appropriate keys
have access
Oracle 9i Front End transactional inserts and
updates using stored procedures, queries using
select statements (uses databases access control)
RDBMS
Oracle 9i Back End Windows,Linux,Parallel
Server,etc
site-to-site
Schema, type hierarchy, indices, PL/SQL stored
procedures for each object
6
RGIS Web Interface
7
RGIS Architecture
Users
Applications
Web Interface
SOAP Interface
Canned Queries
Canned Approximate Queries
Authenticated Direct Interface
Scoping Rewrite
Content Delivery Network Interface For loose
consistency
Update Manager
Query Manager and Rewriter
Nondeterminism Rewrite
Time Bounding (And Iteration Of Query)
Updates encrypted using asymmetric cryptography
on network. Only those with appropriate keys
have access
Oracle 9i Front End transactional inserts and
updates using stored procedures, queries using
select statements (uses databases access control)
RDBMS
Oracle 9i Back End Windows,Linux,Parallel
Server,etc
site-to-site
Schema, type hierarchy, indices, PL/SQL stored
procedures for each object
8
Query Components

GridG the first synthetic grid generator
Topology Sigmetrics Performance Evaluation
Review, Vol 30, No. 4, 2003
Annotation SC03-1
Query rewriting techniques to trade off query
time and the result set size
Nondeterministic query SC03-2
Scoped and approximate queries GRID03

9
Update and CDN Components

Size-Based Scheduling with inaccurate info to
minimize mean update time
Fairness and efficiency as function of
correlation MASCOTS04-1
P2P scheduling LCR04, one in submission
Web server scheduling, in submission
Other applications MASCOTS04-2
Characterizing and predicting TCP throughput on
the WAN to determine update transfer time
ICDCS05

10
Update and CDN Components

Modeling and taming parallel TCP on the WAN to
transfer updates faster
IPDPS05
Fat-tree based end-system multicast to
disseminate update scalably
WCW04, one in submission

11
Outline

Birds Eye View
What is RGIS?
Architecture
What components are studied in the thesis?
Size-Based Scheduling With Inaccurate Info
Fairness and efficiency as function of
correlation
Other applications beyond RGIS
DualPats Characterizing and Predicting TCP
Throughput on the Wide Area Network
Why TCP throughput prediction?
Flow size / TCP throughput correlation
Issues with simple benchmarking
DualPats algorithm and dynamic rate adjustment
Thesis Contributions

12
Scheduling Section Outline

Review of Size-Based Scheduling
Motivation
Simulation Setup
Simulation Results
New Applications

13
The scheduling problem
Scheduling a general problem Goal minimize the
mean response time be fair
Updates come from CDN
Scheduler
10K
8K
6K
3K
Which update to run next?
Database
Response time the time from job arrival to its
completion
14
Review of Non-size-based scheduling

FCFS, PS, etc.
FCFS First Come First Serve
Intuitive
Easiest to implement
PS Processor Sharing
Fair all jobs accept equal resources
Also easy to implement

Problem Unaware of job size information, which
results in high mean response time
15
Review of size-based scheduling

SRPT, FSP, etc.
Use the job size (processing time, service time)
information for scheduling
Optimal in mean response time
Fair?
Easy to implement?

We use Job Size to refer to the Processing Time
(Service Time) of the job
16
Shortest Remaining Processing Time (SRPT)

Always serve the job with minimum remaining
processing time first, Preemptive scheduling
Yields minimum mean response time Schrage,
Operations Research, 1968
Surprisingly, it is fair for heavy-tail job size
distribution Bansal and Harchol-Balter,
Sigmetrics 01
Easy to implement?
With accurate a priori job size information, YES
Otherwise, NO

17
Fair Sojourn Protocol (FSP)

Combined SRPT with PS, preemptive scheduling
Mean response time is close to that of SRPT and
more fair than SRPT and PS Friedman, et al,
Sigmetrics 03
Easy to implement?
With accurate a priori job size information, YES
Otherwise, NO

18
Scheduling Section Outline

Review of Size-Based Scheduling
Motivation
Simulation Setup
Simulation Results
New Applications

19
Motivation

Size-based scheduling requires accurate knowledge
of job sizes
In practice, a priori job size information is not
always available
All the previous work assumes perfect knowledge
of job sizes a priori
How does performance depend on quality of job
size information?

20
Correlation
We study the performance of Size-based
schedulers as a function of the correlation
coefficient (Pearsons R) between actual job
sizes and estimated job sizes.
21
Scheduling Section Outline

Review of Size-Based Scheduling
Motivation
Simulation Setup
Simulation Results
New Applications

22
Trace generator
Correlation (Pearsons R)
Distribution A
Distribution B
Trace Generator

Correlated random pairs of X and Y
X has distribution A
Y has distribution B
X and Y are correlated to R

23
Trace generator algorithm

Algorithm Normal-To-Anything
First developed by Cario and Nelson, on INFORMS
Journal on Computing 10, 1 (1998).
We simplified the algorithm and first introduced
it into the simulation studies of computer systems

24
Scatter plot of example traces
Y
Y
X
X
R0.78
R0.13
25
Performance metrics

Mean response time Sojourn time, Turn-around
time
Slowdown the ratio of response time to its size.
Fairness metric

26
Simulator

Simulator
Supports M/G/1 and G/G/n/m queuing model
Simulator validation
Littles law
Repeat the simulations in the FSP paper
Friedman, et al, Sigmetrics 03
Compare with available theoretical results
Bansal and Harchol-Balter, Sigmetrics 01

27
Scheduling Policies

PS Processor sharing
Size-based scheduling policies
SRPT Ideal SRPT scheduler
SRPT-E SRPT scheduler using estimated job size
FSP Ideal Fair Sojourn Protocol
FSP-E FSP scheduler using estimated job size

Each simulation is repeated 20 times and we
present the average
28
Scheduling Section Outline

Review of Size-Based Scheduling
Motivation
Simulation Setup
Simulation Results
New Applications

29
Mean response time as function of R
30
Slowdown (R0.0224)
31
Slowdown (R0.239)
32
Slowdown (R0.4022)
33
Slowdown (R0.5366)
34
Slowdown (R0.7322)
35
Slowdown (R0.9779)
36
Simulation Results Conclusions

Performance heavily depends on correlation
SRPT-E and FSP-E can outperform PS given an
effective job size estimator
Crossover point of performance metrics is a
function of correlation
Also of job size distributions (See TR
NWU-CS-04-33)

37
Scheduling Section Outline

Review of Size-Based Scheduling
Motivation
Simulation Setup
Simulation Results
New Applications

38
New Applications Web server scheduling (TR
NWU-CS-04-33)

Is file size a good estimator of a jobs service
time (processing time)? Not Really (R ? 0.14)

File Size
Service time (wall clock time)
39
New Applications Web server scheduling

Domain-based estimator much more accurate
prediction of the service time at low overhead

40
New Applications P2P server side scheduling (LCR
04)

Server side of current file sharing P2P
applications superficially similar to web server
Both send back files upon requests.
However, P2P application cant even know the file
size accurately a priori
Partial downloads
Our ongoing work shows that SRPT-E performs well
using our time-series based job size estimators.

41
Scheduling Section Summary

Performance of size-based scheduling policies
depends on correlation between size estimates and
actual sizes
Fairness, mean response time, etc.
Estimator must preserve ordering of job sizes for
high performance
Performance degrades as correlation degrades
Effective new estimators for Web and P2P

42
Outline

Birds Eye View
What is RGIS?
Architecture
What components are studied in the thesis?
Size-Based Scheduling With Inaccurate Info
Fairness and efficiency as function of
correlation
Other applications beyond RGIS
DualPats Characterizing and Predicting TCP
Throughput on the Wide Area Network
Why TCP throughput prediction?
Flow size / TCP throughput correlation
Issues with simple benchmarking
DualPats algorithm and dynamic rate adjustment
Thesis Contributions

43
DualPats Overview

Algorithm for predicting the TCP throughput as
function of flow size
Minimal active probing
Dynamic probe rate adjustment
Explaining flow size / throughput correlation
Explaining why simple active probing fails
Large scale empirical study

44
DualPats Section Outline

Why TCP Throughput Prediction?
Particulars of Study
Flow Size / TCP Throughput Correlation
Issues with Simple Benchmarking
DualPats Algorithm
Stability and Dynamic Rate Adjustment

45
Goal

A library call
BW PredictTransfer(src,dst,numbytes)
Expected Time numbytes/BW
Ideally, we want a confidence interval
(BWLow,BWHigh) PredictTransfer(src,dst,numbytes,
p)

46
Available Bandwidth

Maximum rate a path can offer a flow without
slowing other flows
pathchar, cprobe, nettimer, delphi, IGI,
pathchirp, pathload
mainly for traffic engineering
Available bandwidth can differ significantly from
TCP throughput
Not real time, takes at least tens of seconds to
run

47
Simple TCP Benchmarking

Benchmark paths with a single small probe
BW ProbeSize/Time
Widely used Network Weather Service (NWS) and
others (Remos benchmarking collector)
Not accurate for large transfers on the current
high speed Internet
Numerous papers show this and attempt to fix it

48
Fixing Simple TCP Benchmarking

Logs Sundharshan correlate real transfer
measurements with benchmarking measurements
Recent transfers needed
Similar size transfers needed
Measurements at application chosen times
CDF-matching Swany correlate CDF of real
transfer measurements with CDF of benchmarking
measurements
Recent transfers still needed
Measurements at application chosen times

49
Analysis of TCP

Extensive research on TCP throughput modeling in
networking community
Really intended to build better TCPs
Difficult to use models online because of hard to
measure parameters
Future loss rate and RTT

50
DualPats Section Outline

Why TCP Throughput Prediction?
Particulars of Study
Flow Size / TCP Throughput Correlation
Issues with Simple Benchmarking
DualPats Algorithm
Stability and Dynamic Rate Adjustment

51
Our Study

PlanetLab and additional machines
Located all over the world
Measurements of throughput
Wide open socket buffers (1-3 MB)
Simple client/server
scp
GridFTP
Four separate sets of measurements

52
Four sets of measurements

Distribution set for analysis of TCP throughput
stability and distributions
Correlation set for studying correlation between
throughput and flow size, initial testing of
algorithm
Verification Set test our benchmarking mechanism
Online Evaluation Set test our online algorithm

53
Distribution Set

For analysis of TCP throughput stability and
distributions
60 randomly chosen paths among PlanetLab machines
1.6 million transfers (client/server)
100 KB, 200 KB, 400 KB, 10 MB flows
3000 consecutive transfers per pathflow size

54
Correlation Set

For studying correlation between throughput and
flow size, initial testing of algorithm
60 randomly chosen paths among PlanetLab machines
2.4 million transfers, 270 thousand runs,
client/server
100 KB, 200 KB, 400 KB, 10 MB flows
Run sweep flow size for path

55
Verification Set

Test algorithm
30 randomly chosen paths among PlanetLab machines
and others
4800 transfers, 300 runs, scp and GridFTP
5 KB to 1 GB flows
Run sweep flow size for path

56
Online Evaluation Set

Test online algorithm
50 randomly chosen paths among PlanetLab machines
and others
14000 transfers, scp and GridFTP
40 MB or 160 MB file, randomly chosen
10 days

57
DualPats Section Outline

Why TCP Throughput Prediction?
Particulars of Study
Flow Size / TCP Throughput Correlation
Issues with Simple Benchmarking
DualPats Algorithm
Stability and Dynamic Rate Adjustment

58
Strong Correlation Between Throughput and Flow
Size
Correlation and Verification Sets
59
An example of Strong Correlation
60
Why Does The Correlation Exist?

Slow start and user effects Zhang
Non-negligible startup overheads
Control messages in scp and GridFTP
Residual slow start effect
SACK results in slow convergence to equilibrium

61
DualPats Section Outline

Why TCP Throughput Prediction?
Particulars of Study
Flow Size / TCP Throughput Correlation
Issues with Simple Benchmarking
DualPats Algorithm
Stability and Dynamic Rate Adjustment

62
Why Simple Benchmarking Fails
Need more than one probe to capture correlation
Probes are too small
63
DualPats Section Outline

Why TCP Throughput Prediction?
Particulars of Study
Flow Size / TCP Throughput Correlation
Issues with Simple Benchmarking
DualPats Algorithm
Stability and Dynamic Rate Adjustment

64
Our Approach
Two consecutive probes, both larger than the
noise region
65
Our Approach

Two consecutive probes are integrated into a
single probe
400KB, 800 KB in single 800 KB probe

Probe two
Probe one
0
T1
T2
66
Our Approach
Flow size
Transfer Time
Solve For A and B
Predict Throughput For Some Other Transfer
67
Model Fit is Excellent
Low and Normally Distributed Relative Errors At
All Flow Sizes
Correlation Set
68
DualPats Section Outline

Why TCP Throughput Prediction?
Particulars of Study
Flow Size / TCP Throughput Correlation
Issues with Simple Benchmarking
DualPats Algorithm
Stability and Dynamic Rate Adjustment

69
Stability

How long does the TCP throughput function remain
stable?
How frequently should we probe the path?
Whats the distribution of throughput around the
function (i.e., the error)?

70
Throughput is Stable For Long Periods
Increasing Max/Min Throughput in Interval
Correlation Set
71
Throughput For a Given Flow Size Is Normally
Distributed In An Interval
Distribution Set
72
Online DualPats Algorithm

Fetch probe sequence for destination
Start probing process if no data exists
Project probe sequence ahead
20 point moving average over values with current
sampling interval
Apply model using projected data
Return result
confidence interval computed using normality
assumptions

73
Dynamic Sampling Rate

Adjust sampling interval to correspond to the
paths stable intervals
Limit rate (20 to 1200 seconds)
Additive increase / Additive decrease of interval
based on difference between last two probes
lt 5 gt increase interval
gt 15 gt decrease interval

74
Evaluation

Slight conservative bias
About 90 of predictions have lt 20 error

1
Pmean error lt X
Mean abs(relative error)
Mean relative error
0.4
-0.4
0
Relative error
Online Evaluation Set
75
Section Summary

Algorithm for predicting the TCP throughput as
function of flow size
Minimal active probing
Dynamic probe rate adjustment
Explaining flow size / throughput correlation
Explaining why simple active probing fails
Large scale empirical study

76
Outline

Birds Eye View
What is RGIS?
Architecture
What components are studied in the thesis?
Size-Based Scheduling With Inaccurate Info
Fairness and efficiency as function of
correlation
Other applications beyond RGIS
DualPats Characterizing and Predicting TCP
Throughput on the Wide Area Network
Why TCP throughput prediction?
Flow size / TCP throughput correlation
Issues with simple benchmarking
DualPats algorithm and dynamic rate adjustment
Thesis Contributions

77
Thesis Contributions

It is feasible to build a scalable distributed
Relational Grid Information Service
RGIS architecture
Query rewriting
Trade off query time with the size of result set
GridG
First synthetic grid generator
Relationship between power-laws

78
Thesis Contributions

Size-based scheduling with imperfect info
DualPats monitoring and predicting TCP
throughput
TameParallelTCP modeling and taming parallel TCP
FatNemo fat-tree based end-system multicast

79
Future work

Integration of research components with RGIS
system
Highly dynamic grid information how to
incorporate data from services such as RPS, NWS,
DualPats, Remos
Passive monitoring of TCP throughput
Understanding size-based scheduling in the
presence of backfilling

80
Acknowledgements

Collaborators
P2P scheduling Yi Qiao, Fabian Bustamante
FatNemo Stefan Birrer, Fabian Bustamante
RGIS implementation
Andrew Weinrich, Jack Lange, Andrew Simpson

81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
Current State

Prototyped RGIS system
Schema, stored SQL
Query manager/rewriter
Soap interface, web interface
Publish/subscribe based CDN
To be done
Integration of update scheduling, TCP throughput
monitoring, end-system multicast based CDN

85
Finding Sufficiently Large Probe Size