Title: Massive HighPerformance Global File Systems for Grid Computing
1Massive High-Performance Global File Systems for
Grid Computing
- By Phil Andrews, Patricia Kovatch, Christopher
Jordan - Presented by Han S Kim
2Outline
I
Introduction
II
GFS via Hardware Assist SC02
III
Native WAN-GFS SC03
True Grid Prototype SC04
IV
V
Production Facility 2005
VI
Future Work
3I
Introduction
4Introduction- The Original Mode of Operation for
Grid Computing
- To submit the users job to the ubiquitous grid.
- The job would run on the most appropriate
computational platform available. - Any data required for the computation would be
moved to the chosen compute facilitys local
disk. - Output data would be written to the same disk.
- The normal utility used for the data transfer
would be GridFTP.
5Introduction- In Grid Supercomputing,
- The very large size of the data sets used.
- The National Virtual Observatory
- consists of approximately 50 Terabytes,
- is used as input by several applications.
- Some applications write very large amounts of
data - The Southern California Earthquake Center
simulation - Writes close to 250 Terabytes in a single run
- Other applications require extremely high I/O
rates - The Enzo application-AMR Cosmological Simulation
code - Multiple Terabytes per hour is routinely written
and read.
6Introduction- Concerns about Grid Supercomputing
- The normal approach of moving data back and forth
may not translate well to a supercomputing grid,
mostly relating to the very large size of the
data sets used. - These size and required transfer rates are not
conducive to routine migration of wholesale input
and output data between grid sites. - The computation system may not have enough room
for a required dataset or output data. - The necessary transfer rates may not be
achievable.
7Introduction- In this paper..
- Show
- How a Global File System, where direct file I/O
operations can be performed across a WAN can
obviate these concerns. - A series of large-scale demonstrations
8II
GFS via Hardware Assist SC02
92. GFS via Hardware Assist SC02 - At That
Time
- Global File Systems were still in the concept
stage. - Two Concerns
- The latencies involved in a widespread network
such as the TeraGrid - The file systems did not yet have the capability
of exportation across a WAN
102. GFS via Hardware Assist SC02 - Approach
- Used hardware capable of encoding Fibre Channel
frames within IP packets (FCIP) - Internet Protocol-based storage networking
technology developed by IETF - FCIP mechanisms enable the transmission of Fiber
Channel information by tunneling data between
storage area network facilities over IP networks.
112. GFS via Hardware AssistSC02- The Goal of
This Demo
- In that year, the annual Supercomputing
conference was Baltimore. - The distance between show floor and San Diego is
greater than any within the TeraGrid. - The perfect opportunity to demonstrate whether
latency effects would eliminate any chance of a
successful GFS at that distance.
122. GFS via Hardware Assist SC02 - Hardware
Configuration btw San Diego and Baltimore
Two 4GbE channels
Two 4GbE channels
Two 4GbE channels
Two 4GbE channels
TeraGrid backbone, ScieNet 10Gb/s WAN
Encoded and decoded Fiber Channel frames into IP
packets for transmission and reception
132. GFS via Hardware Assist SC02 - SC02 GFS
Performance btw SDSC and Baltimore
- 720 MB/s, 80ms round trip SDSC-Baltimore
- Demonstrated the a GFS could provide some of the
most efficient data transfers possible over TCP/IP
14III
Native WAN-GFS SC03
153. Native WAN-GFS SC03 - Issue and Approach
- Issue Whether Global File Systems were possible
without hardware FCIP encoding. - SC03 was the chance to use pre-release software
from IBMs General Parallel File System (GPFS) - A true wide area-enabled file system
- Shared-Disk Architecture
- Files are striped across all disks in the file
system - Parallel access to file data and metadata
163. Native WAN-GFS SC03 - WAN-GPFS
Demonstration
The Central GFS,40 Two-processor IA64 nodes
which provides sufficient bandwidth to saturate
the 10GbE link Each server had a single FC HBA
and GbE connecters Serves the file system across
the WAN to SDSC and NCSA
The mode of operation was to copy data produced
at SDSC across the WAN to the disk systems on the
show floor To visualize it at both SDSC and NCSA
10GbE to TeraGrid
173. Native WAN-GFS SC03 - Bandwidth Results
at SC03
The visualization application terminated normally
as it ran out of data and was restarted.
183. Native WAN-GFS SC03 - Bandwidth Results
at SC03
- Over a maximum bandwidth 10 Gb/s link, the peak
transfer rate was almost 9Gb/s and over 1GB/s was
easily sustained.
19IV
True Grid Prototype SC04
204. True Grid Prototype SC04 - The Goal of
This Demonstration
- To implement a true grid prototype of what a GFS
node on the TeraGrid would look like. - The possible dominant modes of operation for grid
supercomputing - The output of a very large dataset to a central
GFS repository, followed by its examination and
visualization at several sites, some of which may
not have the resources to ingest the dataset
whole. - The Enzo application
- Writes on the order of a Terabyte per hour
enough for 30Gb/s TeraGrid connection - With the post processing visualization they could
check how quickly the GFS could provide data in a
scenario. - Ran at SDSC, writing its output directly the GPFS
disks in Pittsburgh.
214. True Grid Prototype SC04 - Prototype
Grid Supercomputing at SC04
40Gb/s
40Gb/s
30Gb/s
224. True Grid Prototype SC04- Transfer Rates
- The aggregate performance 24Gb/s
- The momentary peak over 27Gb/s
- The rates were remarkably constant.
Three 10Gb/s connections between the show floor
and the TeraGrid backbone
23V
Production Facility 2005
245. Production Facility 2005- The needs for
Large Disk
- By this time, the size of datasets had become
large. - The NVO dataset was 50 Terabytes per location,
which was a noticeable strain on storage
resources. - If a single, central, site could maintain the
dataset this would be extremely helpful to all
the sites who could access it in an efficient
manner. - Therefore, a very large amount of spinning disk
would be required. - Approximately 0.5 Petabytes of Serial ATA disk
drives was acquired by SDSC.
255. Production Facility 2005 - Network
Organization
The Network Shared Disk server 64 two-way IBM
IA64 systems with a single GbE interface and
Fibre Channel 2Gb/s Host Bus Adapter
NCSA, ANL
The disks are 32 IBM FastT100 DS4100 RAID systems
with 67 250GB drivers in each. The total raw
storage is 32 x 67 x 250GB 536 TB
.5 PetabyteFastT100 Disk
265. Production Facility 2005 - Serial ATA
Disk Arrangement
2 Gb/s FC connection
2 Gb/s FC connection
8P RAID
275. Production Facility 2005- Performance Scaling
Maximum of almost 6GB/s out of theoretical
maximum of 8GB/s
285. Production Facility 2005- Performance Scaling
- The observed discrepancy between read and write
rates is not yet understood - However, the dominant usage of the GFS is to be
remote reads.
29VI
Future Work
306. Future Work
- Next year (2006), the authors hope to connect to
the DEISA computational Grid in Europe which is
planning a similar approach to Grid computing,
allowing them to unite the TeraGrid and DEISA
Global File Systems in a multi-continent system. - The key contribution of this approach is a
paradigm. - At least in the supercomputing regime, data
movement and access mechanisms will be the most
important delivered capability of Grid computing,
outweighing even the sharing or combination of
compute resources.
31Thank you !