Title: GridFTPAPT: Automatic Parallelism Tuning Mechanism for Data Transfer Protocol GridFTP
1GridFTP-APTAutomatic Parallelism Tuning
Mechanism for Data Transfer Protocol GridFTP
- Takeshi ItoHiroyuki OhsakiMakoto
ImaseGraduate School of Information Science
andTechnology,Osaka University, Japan
2Contents
- Background
- GridFTP
- GridFTP-APT (GridFTP with Automatic Parameter
Tuning) - Basic ideas
- Adjusting the number of parallel TCP connections
- Determining chunk size
- Simulation
- Conclusions
3Background
- GridFTP
- Is designed to effectively transfer large volume
data - Is designed to solve the existing TCP problems
- Has various features
- Automatic negotiation of TCP socket buffer size
- Parallel data transfer
- Third-party control of file transfer
- Partial file transfer
- Security
- Reliable file transfer
4Parallel Data Transfer
- Multiple TCP connections can be established in
parallel - Higher throughput can be expected than when a
single TCP connection is established - Throughput drops if the number of parallel TCP
connections is too large - It has not been sufficiently studied
- How to optimize the number of parallel TCP
connections
5Objectives
- Maximize the GridFTP goodput
- By automatically adjusting the number of parallel
TCP connections of GridFTP - Propose a GridFTP-APT mechanism
GridFTP-APT operateson GridFTP clients
TCP connections
GridFTP server
GridFTP clientwith GridFTP-APT
6Design Principles
- Provide compatibility with existing GridFTP
servers - Enable interconnection with existing GridFTP
servers - GridFTP is implemented in the Globus Toolkit
- A number of GridFTP servers have already been in
operation - Avoid using any function specific to certain
operating systems or network devices on the
computers - This enables GridFTP-APT to be easily installed
in Grid computing environment
7GridFTP goodput in steady state
9 T. Ito, H. Ohsaki, and M. Imase, On
parameter tuning of data transfer protocol
GridFTP in wide-area Grid computing, in
Proceedings of Second InternationalWorkshop on
Networks for Grid Applications (GridNets 2005),
pp. 415421, Oct. 2005.
- GridFTP goodput in steady state is derived in 9
- GridFTP goodput is a convex function for the
number of parallel TCP connections
GridFTP-APT utilizesthis fact
convex function
Simulation result
8Basic Ideas
data channel
control channel
Divides a file to transferinto blocks called
chunk
Measures the GridFTP goodputat every chunk
transfer
a chunk
file
GridFTP client
GridFTP server
Adjusts the number of parallel TCP connectionsat
the end of every chunk transferusing a numerical
computation algorithm
9Golden Section Search Algorithm
13 W. H. Press, B. P. Flannery, S. A.
Teukolsky, and W. T. Vetterling, Numerical
Recipes in C The Art of Scientific Computing.
Cambridge University Press, 1992.
- GridFTP-APT uses the Golden Section Search (GSS)
algorithm as a numerical computation algorithm - Golden Section Search algorithm
- One of numerical computation algorithms for a
maximization problem 13 - Can be used when f(x) is a convex function in a
certain range - GridFTP goodput is a convex function for the
number of parallel TCP connections
10Adjusting the Number of Parallel TCP Connections
- Searches for a range of the number of parallel
TCP connections (bracket), in which GridFTP
goodput take a convex form
GridFTP goodput decreases here
Increase the number ofparallel TCP
connectionsmultiplicativelyuntil GridFTP
goodputdecreases
?
?
?
Determine the bracketas (2, 4, 8)
?
Initialize the number ofparallel TCP connections
11Adjusting the Number of Parallel TCP Connections
Update the number of parallel TCP connections
- Searches for the optimal number of parallel TCP
connections using the GSS algorithm
Change the bracketbased on GridFTP goodputof
the last chunk transfer
?
?
?
?
?
?
?
Repeating these procedure,GridFTP-APT searches
forthe optimal number of parallelTCP connections
12Determining Chunk Size
- GridFTP-APT uses the extended block mode of
GridFTP for chunk transfers - The chunk size must be specified before starting
the chunk transfer - It is important to appropriately determine the
chunk size
GridFTP goodput
Cannot measurethe GridFTP goodputaccurately
Number of parallel TCP connectionscannot be
converged fast
Chunk size
13Determining Chunk Size
- GridFTP-APT dynamically configures the chunk size
- GridFTP-APT predicts the GridFTP goodput of the
next chunk transfer - GridFTP-APT tries to make chunk transfer time as
fixed as possible
GridFTP goodput
When GridFTP-APT searchesthe optimal number of
parallel TCP connections
GridFTP-APT predicts the GridFTPgoodput of the
next chunk transferby the interpolation of two
samples
Number of TCP connections
14Network Model Used in Simulation
64
100
100
64
10, 20 ms
15Simulation Result (t 10 ms)
Since the buffer size of RED routers is
small,maximum goodput was 85 Mbit/s
GridFTP-APT can utilize the network resource
effectivelyafter approximately 15 s from
starting file transfer
16Simulation Result (t20 ms)
Since the buffer size of RED routers is
small,maximum goodput was 67.5 Mbit/s
GridFTP-APT can utilize the network resource
effectivelyregardless of the propagation delay
of the bottleneck link
17Conclusions
- Proposed GridFTP-APT
- An automatic parallelism tuning for GridFTP
- Automatically adjusts the number of parallel TCP
connections - Showed that GridFTP with GridFTP-APT realizes
high throughput in several network environments
18Future Works
- Evaluate the effectiveness of GridFTP-APT
- In more general network environments
- Network with background traffic and multiple
GridFTP sessions - Implement GridFTP-APT
- Demonstrate its effectiveness in real networks
19- Thank you for your kind attention
20(No Transcript)
21- When does GridFTP-APT determine the chunk size?
- GridFTP-APT updates its chunk size at every chunk
transfer. - How does GridFTP-APT transfer a chunk?
- Using extended block mode of GridFTP, GridFTP-APT
can determine which parts of the file will be
transferred.
22- Can GridFTP-APT be used during third-party file
transfer? - Yes. Our GridFTP-APT can be used with third-party
transfer with a simple modification. - GridFTP-APT measure the round-trip time to
determine the chunk size at first transfer. - During third-party file transfer, it is difficult
for GridFTP clients to know the round-trip time
between GridFTP servers. - If we modify GridFTP-APT to determine the chunk
size without relation to chunk size, GridFTP-APT
could be used during third-party transfer.
23- Why does GridFTP-APT predict the GridFTP goodput
of the next chunk transfer by the interpolation
of two samples? - To keep the chunk transfer time as fixed as
possible for faster convergence.
24- Can GridFTP-APT realize high throughput if
traffic of a network varies? - To some extent, yes.
- Since TCP itself can adapt to the variation in
the available bandwidth. - If the change in the available bandwidth in
parallelism optimization is significant,
GridFTP-APT cannot cope with such situation
appropriately. However, we believe such
situation is quite rare.