Title: SEMPLAR: HighPerformance Remote Parallel IO over SRB
1SEMPLAR High-Performance Remote Parallel I/O
over SRB
- Nawab Ali and Mario Lauria
- Department of Computer Science and Engineering
- The Ohio State University
- Columbus, OH 43210
- alin, lauria_at_cse.ohio-state.edu
2Presentation Outline
- Introduction
- Remote I/O
- Storage Resource Broker
- SEMPLAR
- Design
- Implementation
- Asynchronous I/O over WANs
- Experimental Setup
- Results
- Conclusions
3Introduction
- Application Trends
- Big Science projects increasingly require
processing of large data sets - Sloan Digital Sky Survey, Large Hadron Collider,
National Earthquake Engineering Simulation Grid,
NIH GenBank. - Large data sets stored in repositories at
specialized facilities (supercomputer centers) - San Diego Supercomputer Center (SDSC)
- Technological Trends
- Bandwidth of WAN and Internet backbones growing
at a rate that makes it comparable to local
interconnect speed - TeraGrid, LambdaRail 40Gb/s
- Infiniband 10 Gb/s
4Problem Definition
- Data Storage
- How do we store the large amounts of data
generated by HPC applications? - Data Retrieval
- How do we effectively retrieve the data for local
processing? - Research Focus
- High-Performance I/O over WANs
- How do we reduce the performance penalty of
remote data access?
5Remote I/O Constraints
- I/O Bandwidth CCGRID 2005
- I/O Latency HPDC 2006
6Motivation
- Common approach for retrieving large data sets
- Staging
- FTP, GridFTP, Wget
- Problems with staging
- Overlapping of data transfer and computation not
possible - Dynamic data sets require frequent refreshes of
the local copy
7Presentation Outline
- Introduction
- Remote I/O
- Storage Resource Broker
- SEMPLAR
- Design
- Implementation
- Asynchronous I/O over WANs
- Experimental Setup
- Results
- Conclusions
8Storage Resource Broker
- SRB was developed at SDSC to provide access to
massive volumes of data in a production
environment - It provides transparent access to heterogeneous
storage resources - Filesystems
- Database Systems
- Archival Storage Systems
- Other services offered
- Authentication, location transparency
9SRB Architecture
- SRB Servers
- Control distinct set of physical resources
- Metadata Catalog Service
- Stores file metadata
- Access Control
- File location
- SRB Clients
- Connect to the SRB servers using client API
- C high-level API
- C low-level API
10Presentation Outline
- Introduction
- Remote I/O
- Storage Resource Broker
- SEMPLAR
- Design
- Implementation
- Asynchronous I/O over WANs
- Experimental Setup
- Results
- Conclusions
11SEMPLAR SRB Enabled MPI I/O Library for Access
to Remote Storage
- I/O over the Internet
- Storage Virtualization
- SRB
- High I/O bandwidth
- Multiple TCP Streams
- Multiple I/O nodes
- Standard Application Interface
- MPI I/O
12SEMPLAR Implementation
- MPI I/O implementations such as ROMIO use the
portable ADIO interface - ADIO implementations are optimized for a
particular filesystem - We have provided an ADIO implementation for the
SRB filesystem
13Remote Asynchronous I/O
- Asynchronous I/O has been shown to be a flexible
programming model - For some reason, never yet applied to remote I/O
- Traditional advantages of Asynchronous I/O
- Overlapping of I/O and computation
- Efficient use of system resources
- Enhanced I/O performance
- Additional benefits specific to remote I/O
- Multiple concurrent TCP streams
- On-the-fly data compression
14Asynchronous I/O Implementation
- Dual-threaded implementation
- Main Compute Thread
- Auxiliary I/O thread
- Shared I/O queue
- Asynchronous calls place I/O requests in queue
and return immediately - I/O thread dequeues I/O queue in FIFO order
15Asynchronous I/O Primitives
- POSIX pthread library
- Asynchronous Primitives
- MPI_File_iread
- MPI_File_iwrite
- MPIO_Wait
- MPIO_Test
16Presentation Outline
- Introduction
- Remote I/O
- Storage Resource Broker
- SEMPLAR
- Design
- Implementation
- Asynchronous I/O over WANs
- Experimental Setup
- Results
- Conclusions
17Experimental Setup
- SRB server v3.2.1 running on orion.sdsc.edu
- Our reference server in San Diego
- NCSA TeraGrid cluster
- High bandwidth, Low latency
- DAS-2 at VU, Amsterdam
- Low bandwidth, High Latency
- Intel Pentium 4 Xeon cluster at OSC
- High bandwidth, Low latency
- Private I/P addresses
18Benchmarks
- ROMIO perf
- Measures the read and write performance
- Synchronous and Contiguous I/O
- Upper-bound on the MPI I/O performance
- NAS btio
- Non-contiguous data access pattern
- Class C full version
- Collective I/O
- MPI-BLAST
- BLAST Searches protein and nucleotide databases
for local alignment - MPI-BLAST MPI wrapper for BLAST
19Presentation Outline
- Introduction
- Remote I/O
- Storage Resource Broker
- SEMPLAR
- Design
- Implementation
- Asynchronous I/O over WANs
- Experimental Setup
- Results
- Conclusions
20Synchronous Remote I/O Performance Results
21perf I/O Performance
NCSA TeraGrid Cluster
DAS-2 Cluster
22perf I/O Performance
NCSA TeraGrid Read 290.88Mbps. Write
139.44Mbps. Ttcp 46.34Mbps DAS 2 Cluster Read
68.32Mbps. Write 97.68Mbps. Iperf
4.82Mbps OSC Xeon Cluster Read 82.96Mbps.
Write 76.48Mbps. Iperf 10.81Mbps
Results Summary
OSC P4 Cluster
23btio Class C Write Performance
NCSA TeraGrid Cluster
DAS-2 Cluster
24btio Class C Write Performance
NCSA TeraGrid btio Class C Write 74.04Mbps.
Ttcp 46.34Mbps DAS 2 Cluster btio Class C
Write 56.49Mbps. Iperf 4.82Mbps OSC Xeon
Cluster btio Class C Write 70.28Mbps. Iperf
10.81Mbps
OSC P4 Cluster
Results Summary
25Asynchronous Remote I/O Performance Results
26Asynchronous I/O Experiments
- In our experiments we evaluated the performance
benefits achievable by - Restructuring of application code to achieve
overlap of computation and I/O - Doubling the number of TCP connections between
each node and the SRB server - Compressing/decompressing data on the fly
27MPI-BLAST pseudocode
28MPI-BLAST I/O Performance
NCSA TeraGrid Cluster
DAS-2 Cluster
29MPI-BLAST I/O Performance
OSC P4 Cluster
30perf I/O Performance
DAS-2 Cluster
NCSA TeraGrid Cluster
31Related Work
- Synchronous Remote I/O
- MPI-IO Remote I/O
- RIO Single client-server connection
- Multiple connections
- GridFTP Striped connections out of a single
client - Asynchronous Remote I/O
- MTIO
- Multi-threaded MPI based I/O library More et
al. - RFS
- Active Buffering with threads (ABT)
32Presentation Outline
- Introduction
- Remote I/O
- Storage Resource Broker
- SEMPLAR
- Design
- Implementation
- Asynchronous I/O over WANs
- Experimental Setup
- Results
- Conclusions
33Conclusions
- End-to-end Parallel Remote I/O
- Multiple, parallel TCP streams increase the
available I/O bandwidth - SRB provides a consistent interface to
heterogeneous storage resources - By integrating SRB with MPI I/O, we have
developed a scalable, high-performance remote I/O
library based on widely deployed tools - Asynchronous Remote I/O
- Asynchronous primitives enable the deployment of
different performance enhancing measures for
remote I/O - Overlap of computation with I/O
- Asynchronous Split-TCP
- On-the-fly data compression
34Future Work
- Caching in the network using public
infrastructure (IBP) - Dynamic degree of data stream parallelism
- Adjust the number of connections based on the
network load - Coordination between multiple data streams
35Acknowledgments
- Thanks are due to the following
- Reagan Moore, Marcio Faerman and Arcot Rajasekar
of the Data Intensive Group (DICE) at the San
Diego Supercomputer Center for giving us access
to the SRB source. - Henri Bal of Vrije Universiteit, Amsterdam for
giving us access to the DAS-2 cluster. - Rob Pennington and Ruth Aydt at the National
Center for Supercomputing Applications (NCSA) for
allowing us to use the NCSA TeraGrid cluster for
our experiments. - This work is supported in part by the National
Partnership for Advanced Computational
Infrastructure, by the Ohio Supercomputer Center
through grants PAS0036 and PAS0121, and by NSF
grant CNS-0403342. Mario Lauria is partially
supported by NSF grant DBI-0317335. Support from
Hewlett-Packard is also gratefully acknowledged.
36Thank You