GridFTPAPT: Automatic Parallelism Tuning Mechanism for Data Transfer Protocol GridFTP

1 / 24
About This Presentation
Title:

GridFTPAPT: Automatic Parallelism Tuning Mechanism for Data Transfer Protocol GridFTP

Description:

Is designed to effectively transfer large volume data ... multiplicatively. until GridFTP goodput. decreases. GridFTP goodput decreases here ... –

Number of Views:25
Avg rating:3.0/5.0
Slides: 25
Provided by: hiroyuk7
Category:

less

Transcript and Presenter's Notes

Title: GridFTPAPT: Automatic Parallelism Tuning Mechanism for Data Transfer Protocol GridFTP


1
GridFTP-APTAutomatic Parallelism Tuning
Mechanism for Data Transfer Protocol GridFTP
  • Takeshi ItoHiroyuki OhsakiMakoto
    ImaseGraduate School of Information Science
    andTechnology,Osaka University, Japan

2
Contents
  • Background
  • GridFTP
  • GridFTP-APT (GridFTP with Automatic Parameter
    Tuning)
  • Basic ideas
  • Adjusting the number of parallel TCP connections
  • Determining chunk size
  • Simulation
  • Conclusions

3
Background
  • GridFTP
  • Is designed to effectively transfer large volume
    data
  • Is designed to solve the existing TCP problems
  • Has various features
  • Automatic negotiation of TCP socket buffer size
  • Parallel data transfer
  • Third-party control of file transfer
  • Partial file transfer
  • Security
  • Reliable file transfer

4
Parallel Data Transfer
  • Multiple TCP connections can be established in
    parallel
  • Higher throughput can be expected than when a
    single TCP connection is established
  • Throughput drops if the number of parallel TCP
    connections is too large
  • It has not been sufficiently studied
  • How to optimize the number of parallel TCP
    connections

5
Objectives
  • Maximize the GridFTP goodput
  • By automatically adjusting the number of parallel
    TCP connections of GridFTP
  • Propose a GridFTP-APT mechanism

GridFTP-APT operateson GridFTP clients

TCP connections
GridFTP server
GridFTP clientwith GridFTP-APT
6
Design Principles
  • Provide compatibility with existing GridFTP
    servers
  • Enable interconnection with existing GridFTP
    servers
  • GridFTP is implemented in the Globus Toolkit
  • A number of GridFTP servers have already been in
    operation
  • Avoid using any function specific to certain
    operating systems or network devices on the
    computers
  • This enables GridFTP-APT to be easily installed
    in Grid computing environment

7
GridFTP goodput in steady state
9 T. Ito, H. Ohsaki, and M. Imase, On
parameter tuning of data transfer protocol
GridFTP in wide-area Grid computing, in
Proceedings of Second InternationalWorkshop on
Networks for Grid Applications (GridNets 2005),
pp. 415421, Oct. 2005.
  • GridFTP goodput in steady state is derived in 9
  • GridFTP goodput is a convex function for the
    number of parallel TCP connections

GridFTP-APT utilizesthis fact
convex function
Simulation result
8
Basic Ideas
data channel
control channel
Divides a file to transferinto blocks called
chunk
Measures the GridFTP goodputat every chunk
transfer
a chunk

file
GridFTP client
GridFTP server
Adjusts the number of parallel TCP connectionsat
the end of every chunk transferusing a numerical
computation algorithm
9
Golden Section Search Algorithm
13 W. H. Press, B. P. Flannery, S. A.
Teukolsky, and W. T. Vetterling, Numerical
Recipes in C The Art of Scientific Computing.
Cambridge University Press, 1992.
  • GridFTP-APT uses the Golden Section Search (GSS)
    algorithm as a numerical computation algorithm
  • Golden Section Search algorithm
  • One of numerical computation algorithms for a
    maximization problem 13
  • Can be used when f(x) is a convex function in a
    certain range
  • GridFTP goodput is a convex function for the
    number of parallel TCP connections

10
Adjusting the Number of Parallel TCP Connections
  • Searches for a range of the number of parallel
    TCP connections (bracket), in which GridFTP
    goodput take a convex form

GridFTP goodput decreases here
Increase the number ofparallel TCP
connectionsmultiplicativelyuntil GridFTP
goodputdecreases
?
?
?
Determine the bracketas (2, 4, 8)
?
Initialize the number ofparallel TCP connections
11
Adjusting the Number of Parallel TCP Connections
Update the number of parallel TCP connections
  • Searches for the optimal number of parallel TCP
    connections using the GSS algorithm

Change the bracketbased on GridFTP goodputof
the last chunk transfer
?
?
?
?
?
?
?
Repeating these procedure,GridFTP-APT searches
forthe optimal number of parallelTCP connections
12
Determining Chunk Size
  • GridFTP-APT uses the extended block mode of
    GridFTP for chunk transfers
  • The chunk size must be specified before starting
    the chunk transfer
  • It is important to appropriately determine the
    chunk size

GridFTP goodput
Cannot measurethe GridFTP goodputaccurately
Number of parallel TCP connectionscannot be
converged fast
Chunk size
13
Determining Chunk Size
  • GridFTP-APT dynamically configures the chunk size
  • GridFTP-APT predicts the GridFTP goodput of the
    next chunk transfer
  • GridFTP-APT tries to make chunk transfer time as
    fixed as possible

GridFTP goodput
When GridFTP-APT searchesthe optimal number of
parallel TCP connections
GridFTP-APT predicts the GridFTPgoodput of the
next chunk transferby the interpolation of two
samples
Number of TCP connections
14
Network Model Used in Simulation
64
100
100
64
10, 20 ms
15
Simulation Result (t 10 ms)
Since the buffer size of RED routers is
small,maximum goodput was 85 Mbit/s
GridFTP-APT can utilize the network resource
effectivelyafter approximately 15 s from
starting file transfer
16
Simulation Result (t20 ms)
Since the buffer size of RED routers is
small,maximum goodput was 67.5 Mbit/s
GridFTP-APT can utilize the network resource
effectivelyregardless of the propagation delay
of the bottleneck link
17
Conclusions
  • Proposed GridFTP-APT
  • An automatic parallelism tuning for GridFTP
  • Automatically adjusts the number of parallel TCP
    connections
  • Showed that GridFTP with GridFTP-APT realizes
    high throughput in several network environments

18
Future Works
  • Evaluate the effectiveness of GridFTP-APT
  • In more general network environments
  • Network with background traffic and multiple
    GridFTP sessions
  • Implement GridFTP-APT
  • Demonstrate its effectiveness in real networks

19
  • Thank you for your kind attention

20
(No Transcript)
21
  • When does GridFTP-APT determine the chunk size?
  • GridFTP-APT updates its chunk size at every chunk
    transfer.
  • How does GridFTP-APT transfer a chunk?
  • Using extended block mode of GridFTP, GridFTP-APT
    can determine which parts of the file will be
    transferred.

22
  • Can GridFTP-APT be used during third-party file
    transfer?
  • Yes. Our GridFTP-APT can be used with third-party
    transfer with a simple modification.
  • GridFTP-APT measure the round-trip time to
    determine the chunk size at first transfer.
  • During third-party file transfer, it is difficult
    for GridFTP clients to know the round-trip time
    between GridFTP servers.
  • If we modify GridFTP-APT to determine the chunk
    size without relation to chunk size, GridFTP-APT
    could be used during third-party transfer.

23
  • Why does GridFTP-APT predict the GridFTP goodput
    of the next chunk transfer by the interpolation
    of two samples?
  • To keep the chunk transfer time as fixed as
    possible for faster convergence.

24
  • Can GridFTP-APT realize high throughput if
    traffic of a network varies?
  • To some extent, yes.
  • Since TCP itself can adapt to the variation in
    the available bandwidth.
  • If the change in the available bandwidth in
    parallelism optimization is significant,
    GridFTP-APT cannot cope with such situation
    appropriately. However, we believe such
    situation is quite rare.
Write a Comment
User Comments (0)
About PowerShow.com