Massive HighPerformance Global File Systems for Grid Computing

1 / 31

About This Presentation

Title:

Massive HighPerformance Global File Systems for Grid Computing

Description:

The Original Mode of Operation for Grid Computing. To submit the user's job to ... sites, some of which may not have the resources to ingest the dataset whole. ... –

Number of Views:29

Avg rating:3.0/5.0

Slides: 32

Provided by: hanse8

Category:

more less

Transcript and Presenter's Notes

Title: Massive HighPerformance Global File Systems for Grid Computing

1
Massive High-Performance Global File Systems for
Grid Computing

By Phil Andrews, Patricia Kovatch, Christopher
Jordan
Presented by Han S Kim

2
Outline
I
Introduction
II
GFS via Hardware Assist SC02
III
Native WAN-GFS SC03
True Grid Prototype SC04
IV
V
Production Facility 2005
VI
Future Work
3
I
Introduction
4
Introduction- The Original Mode of Operation for
Grid Computing

To submit the users job to the ubiquitous grid.
The job would run on the most appropriate
computational platform available.
Any data required for the computation would be
moved to the chosen compute facilitys local
disk.
Output data would be written to the same disk.
The normal utility used for the data transfer
would be GridFTP.

5
Introduction- In Grid Supercomputing,

The very large size of the data sets used.
The National Virtual Observatory
consists of approximately 50 Terabytes,
is used as input by several applications.
Some applications write very large amounts of
data
The Southern California Earthquake Center
simulation
Writes close to 250 Terabytes in a single run
Other applications require extremely high I/O
rates
The Enzo application-AMR Cosmological Simulation
code
Multiple Terabytes per hour is routinely written
and read.

6
Introduction- Concerns about Grid Supercomputing

The normal approach of moving data back and forth
may not translate well to a supercomputing grid,
mostly relating to the very large size of the
data sets used.
These size and required transfer rates are not
conducive to routine migration of wholesale input
and output data between grid sites.
The computation system may not have enough room
for a required dataset or output data.
The necessary transfer rates may not be
achievable.

7
Introduction- In this paper..

Show
How a Global File System, where direct file I/O
operations can be performed across a WAN can
obviate these concerns.
A series of large-scale demonstrations

8
II
GFS via Hardware Assist SC02
9
2. GFS via Hardware Assist SC02 - At That
Time

Global File Systems were still in the concept
stage.
Two Concerns
The latencies involved in a widespread network
such as the TeraGrid
The file systems did not yet have the capability
of exportation across a WAN

10
2. GFS via Hardware Assist SC02 - Approach

Used hardware capable of encoding Fibre Channel
frames within IP packets (FCIP)
Internet Protocol-based storage networking
technology developed by IETF
FCIP mechanisms enable the transmission of Fiber
Channel information by tunneling data between
storage area network facilities over IP networks.

11
2. GFS via Hardware AssistSC02- The Goal of
This Demo

In that year, the annual Supercomputing
conference was Baltimore.
The distance between show floor and San Diego is
greater than any within the TeraGrid.
The perfect opportunity to demonstrate whether
latency effects would eliminate any chance of a
successful GFS at that distance.

12
2. GFS via Hardware Assist SC02 - Hardware
Configuration btw San Diego and Baltimore
Two 4GbE channels
Two 4GbE channels
Two 4GbE channels
Two 4GbE channels
TeraGrid backbone, ScieNet 10Gb/s WAN
Encoded and decoded Fiber Channel frames into IP
packets for transmission and reception
13
2. GFS via Hardware Assist SC02 - SC02 GFS
Performance btw SDSC and Baltimore

720 MB/s, 80ms round trip SDSC-Baltimore
Demonstrated the a GFS could provide some of the
most efficient data transfers possible over TCP/IP

14
III
Native WAN-GFS SC03
15
3. Native WAN-GFS SC03 - Issue and Approach

Issue Whether Global File Systems were possible
without hardware FCIP encoding.
SC03 was the chance to use pre-release software
from IBMs General Parallel File System (GPFS)
A true wide area-enabled file system
Shared-Disk Architecture
Files are striped across all disks in the file
system
Parallel access to file data and metadata

16
3. Native WAN-GFS SC03 - WAN-GPFS
Demonstration
The Central GFS,40 Two-processor IA64 nodes
which provides sufficient bandwidth to saturate
the 10GbE link Each server had a single FC HBA
and GbE connecters Serves the file system across
the WAN to SDSC and NCSA
The mode of operation was to copy data produced
at SDSC across the WAN to the disk systems on the
show floor To visualize it at both SDSC and NCSA
10GbE to TeraGrid
17
3. Native WAN-GFS SC03 - Bandwidth Results
at SC03
The visualization application terminated normally
as it ran out of data and was restarted.
18
3. Native WAN-GFS SC03 - Bandwidth Results
at SC03

Over a maximum bandwidth 10 Gb/s link, the peak
transfer rate was almost 9Gb/s and over 1GB/s was
easily sustained.

19
IV
True Grid Prototype SC04
20
4. True Grid Prototype SC04 - The Goal of
This Demonstration

To implement a true grid prototype of what a GFS
node on the TeraGrid would look like.
The possible dominant modes of operation for grid
supercomputing
The output of a very large dataset to a central
GFS repository, followed by its examination and
visualization at several sites, some of which may
not have the resources to ingest the dataset
whole.
The Enzo application
Writes on the order of a Terabyte per hour
enough for 30Gb/s TeraGrid connection
With the post processing visualization they could
check how quickly the GFS could provide data in a
scenario.
Ran at SDSC, writing its output directly the GPFS
disks in Pittsburgh.

21
4. True Grid Prototype SC04 - Prototype
Grid Supercomputing at SC04
40Gb/s
40Gb/s
30Gb/s
22
4. True Grid Prototype SC04- Transfer Rates

The aggregate performance 24Gb/s
The momentary peak over 27Gb/s
The rates were remarkably constant.

Three 10Gb/s connections between the show floor
and the TeraGrid backbone
23
V
Production Facility 2005
24
5. Production Facility 2005- The needs for
Large Disk

By this time, the size of datasets had become
large.
The NVO dataset was 50 Terabytes per location,
which was a noticeable strain on storage
resources.
If a single, central, site could maintain the
dataset this would be extremely helpful to all
the sites who could access it in an efficient
manner.
Therefore, a very large amount of spinning disk
would be required.
Approximately 0.5 Petabytes of Serial ATA disk
drives was acquired by SDSC.

25
5. Production Facility 2005 - Network
Organization
The Network Shared Disk server 64 two-way IBM
IA64 systems with a single GbE interface and
Fibre Channel 2Gb/s Host Bus Adapter
NCSA, ANL
The disks are 32 IBM FastT100 DS4100 RAID systems
with 67 250GB drivers in each. The total raw
storage is 32 x 67 x 250GB 536 TB
.5 PetabyteFastT100 Disk
26
5. Production Facility 2005 - Serial ATA
Disk Arrangement
2 Gb/s FC connection
2 Gb/s FC connection
8P RAID
27
5. Production Facility 2005- Performance Scaling
Maximum of almost 6GB/s out of theoretical
maximum of 8GB/s
28
5. Production Facility 2005- Performance Scaling

The observed discrepancy between read and write
rates is not yet understood
However, the dominant usage of the GFS is to be
remote reads.

29
VI
Future Work
30
6. Future Work

Next year (2006), the authors hope to connect to
the DEISA computational Grid in Europe which is
planning a similar approach to Grid computing,
allowing them to unite the TeraGrid and DEISA
Global File Systems in a multi-continent system.
The key contribution of this approach is a
paradigm.
At least in the supercomputing regime, data
movement and access mechanisms will be the most
important delivered capability of Grid computing,
outweighing even the sharing or combination of
compute resources.

31
Thank you !

Write a Comment

User Comments (0)