Title: High Performance Storage System
1High Performance Storage System
- Harry Hulen
- 281-488-2473
- hulen_at_us.ibm.com
2HSM Hierarchical storage management
- Purposes of HSM
- Extend disk space
- Back up disk files to tape
- Managed permanent archive
- User sees virtually unlimited file system
- Data migrates down the hierarchy
- Migrated files may be asychronously purged from
higher level (e.g. disk) to free up space - Multiple classes of service in a single name
space, for example - Disk to tape
- Tape only
- High speed disk to low-cost disk to MAID to tape
library to shelf tape
3HPSS 6.2 Architecture
- Hierarchical global file system
- Distributed, cluster architecture provides
horizontal growth - SAN and/or LAN connected
- Metadata engine is IBM DB2
- Multiple storage classes
- Striped disks and tapes for higher data rates
- Multi-petabyte capability in a single name space
- Fully Linux and AIX capable, even mix and match
- Supports new generation tapes and libraries
- New GPFS-HPSS, reverse SAN, NFSv4 and Linux VFS
interfaces - New lower HPSS Light prices
4New Features in HPSS 6.2
- More Computer and Operating System Options
- The HPSS Core Server can now run on either AIX or
Linux - The Linux platform may be either Intel-based
xSeries computers, IBM p5-based Open Power
computers, or IBM pSeries computers - Movers and 32 or 64-bit clients may be Linux,
AIX, Solaris, or Irix - HPSS is a cluster solution, and components of
different vendors and different operating systems
may be mixed together - Fewer Prerequisites
- HPSS 5.1 requires Distributed Computing
Environment as a prerequisite - HPSS 6.2 eliminates this dependency
- Simpler, faster installation
- Simplified HPSS setup and initial configuration,
reducing startup costs - Direct SAN access to disks
- In addition to the original Mover-based disk
sharing, disks may now be accessed directly over
a SAN - This may be done with the HPSS Client API and
PFTP and with the Linux Virtual File System
interface described on the next slide - Reverse SAN access to other file systems SANs
- The HPSS Local File Mover feature has been
updated and tested to allow HPSS to transfer data
between another file system and HPSS over the
other files systems SAN - HPSS Movers serve as the transfer agent.
- This capability works with any cluster file
systems offering a Unix/Posix read/write
interface, and it has been tested with IBM GPFS,
Lustre, and ADIC SNFS
5New Features in HPSS 6.2
- Linux Virtual File System Interface (VFS)
- Linux applications will benefit from a true POSIX
standard read/write interface - This interface will allow many standard
commercial programs that do file I/O to use HPSS
as file space, essentially turning them into
hierarchical disk-tape applications - Windows access to HPSS files
- HPSS now supports Windows access to HPSS files
using Samba to create a Windows SMB share - Linux-based Samba runs on a core server, mover,
or dedicated computer and uses VFS interface as
its interface to HPSS - DB2/HPSS Active Tablespace Archive
- IBM DB2 Universal Database can be set up to use
Linux-based HPSS to manage DB2 tablespace - This is a useful way to create active, online
archival databases that can migrate to tape yet
be accessed normally - GPFS/HPSS Transparent HSM
- This capability enables HPSS to migrate files
from GPFS - Together, GPFS and HPSS provide a virtually
infinite high performance file system with robust
disaster protection - GridFTP
- Globus GridFTP is a high-performance, secure,
reliable data transfer protocol optimized for
high-bandwidth wide-area networks - LBNL and ANL have developed and are testing a
high performance GridFTP interface for HPSS
6HPSS 6.2 Hardware Supported
- IBM LTO-3 Tape Drives on AIX and Linux Movers
- HP LTO-3 Tape Drives on Linux Movers
- IBM 3592 Gen2 Tape Drives
- StorageTek (Sun) Titanium Tape Drives
- IBM 3584 Library
- StorageTek (Sun) SL500 and SL8500 Libraries
- SpectraLogic T Series with Sony AIT and IBM LTO-3
Drives - Generic SCSI tape libraries
- IBM 4000 Series Disks
- Direct Data Networks (DDN) Disks by special bid
7How a Cluster File System Works
1. Client issues READ to Metadata Controller
3. MDC sends lock and ticket back to client
4. Client reads data from shared disk over SAN
2. MDC accesses metadata on disk
- IBM SAN FS
- ADIC SNFS
- IBM GPFS is functionally equivalent but has a
distributed metadata
architecture - HPSS is similar but adds tape (see next slide)
Examples
8How HPSS Works
3. CS commands Mover to stage file from tape to
disk
1. Client issues READ to Core Server
5. CS sends lock and ticket back to client
4. Mover stages file from tape to disk
2. CS accesses metadata on disk
6a. Client reads data from shared disk over SAN
6b. Or Mover reads data and sends to client over
LAN
9HPSS Enterprise HSM Services HPSS 6.2 Client
Access Overview
10HPSS Write or Put Over TCP/IP Network
- HPSS Client API
- Hpss_write( ) etc.
- Optional list form to access discon-tiguous
segments - Parallel, gigabyte/s capability
- Use for performance-critical applications
- HPSS PFTP
- Parallel FTP
- FTP-like get-put semantics
- Parallel, gigabyte/s capability
- Most-used HPSS interface
11SAN-Enabled HPSS Write or Put
- Data transferred directly between client and HPSS
disk over HPSS SAN - Control is over TCP/IP network (separation of
control and data) - Supported by HPSS Client API and PFTP
- Currently supported on AIX and Linux
- Used internally to HPSS to move data between disk
and tape
12HPSS Local File Mover or Reverse SAN
- HPSS accesses data on client SAN
- Examples of client SAN IBM SAN FS, ADIC SNFS,
IBM GPFS - Activated by PFTP LFPUT-LFGET with more options
coming - CPU overhead entirely offloaded to HPSS Movers
- Parallel capability and/or direct tape access via
Class of Service options
13Posix VFS Interface for Linux
- HPSS accessed usingstandard UNIX/Posix
semantics - Run standard products on HPSS such as IBM DB2,
IBM TSM, NFSv4, and Samba - VFS currently only on Linux
Unix/Posix Application
HPSS VFS Extensions Daemons
Posix File System Interface
DataBuffer
HPSS Client API
Linux Client
Control Data Optional SAN Data Path
HPSS Data Movers
HPSS Core Server
HPSS ClusterAIX or Linux
14Windows Interface via Samba Agent
WindowsApplication
Unix/Posix Application
HPSS VFS Extensions Daemons
CIFS Client
SambaServer
Windows File System Interface
Posix File System Interface
Data Buffer
DataBuffer
HPSS Client API
Windows Client
Linux Agent
Control Data Optional SAN Data Path
HPSS Data Movers
HPSS Core Server
HPSS ClusterAIX or Linux
15GPFS-HPSS Data Flow
MigrateGPFS Filesto HPSS
Stage FilesBack to GPFS
Sites may administratively enable interfaces to
access the HPSS copy of GPFS files
Root/ (HPSS)
Root/ (GPFS)
gpfs
logs
admin
home
GPFS File Systems
Or
gpfs1 gpfs2 gpfs3
bbb
ccc
aaa
GPFS file systems are copied in the HPSS namespace
HPSS Namespace
16Demo Configuration
17DB2/HPSS Active Tablespace Archive
Active DB2 uses dedicated local disk arrays for
tablespace
Archiving databases in HPSS allows virtually
unlimited accessible data
Archive DB2 uses HPSS for tablespace allowing
direct access to archived data
Virtual FSInterface
MAID is disk emulating tape to provide faster
retrieval than tape
MAID
18(No Transcript)
19Capacity scaling examplesAll are single name
spaces
- 4.2 PB Los Alamos National Laboratory (LANL)
Secure Computing Facility, in 46M files, growing
at a rate of 250 to 300 TB per month - 3.8 PB The European Centre for Medium-Range
Weather Forecasts (ECMWF) in the UK, in 24M files - 3.3 PB Brookhaven National Laboratory (BNL) in
19M files - 2.4 PB San Diego Supercomputer Center (SDSC) in
25M files - 2.2 PB National Centers for Environmental
Prediction (NCEP) in 3M files - 2.0 PB Lawrence Livermore National Lab (LLNL)
Secure Computing Facility (SCF) in 31M files - 2.0 PB LLNL Open Computing Facility (OCF) in 24M
files - 2.2 PB Stanford Linear Accelerator Center (SLAC)
in 2.1M files, a tape-only system - 1.9 PB Commissariat à l'Energie Atomique/Division
des Applications Militaires (CEA/DAM) Compute
Center in France in 1.8M files - 1.5 PB National Energy Research Scientific
Computing Center (NERSC), in 40M files - 1.3 PB Institut National de Physique Nucléaire et
de Physique des Particules (IN2P3) in 12M files - 1 PB National Climatic Data Center (NCDC) in 35
M files
20HPSS PFTP transfers data at over 200 MB/sec
between Dept of Energy Labs
21SC2004 2 GB/Sec Read After Write Demo
Four FAStT storage controllers, 16 fibre channel
ports, 16 RAID 5 LUNsAs each 500MB block is
written, it is immediately read by the other
computer.Aggregate 1 GB/Sec Write and 1 GB/Sec
Read, Total 2 GB/Sec
22The HPSS Collaboration
- Development, Deployment and Industry Members
Co-Develop and Test HPSS - Lawrence Livermore National Lab. - Sandia
National Laboratories - Los Alamos National Laboratory - Oak
Ridge National Laboratory - Lawrence Berkeley National Lab - ECMWF
- CEA/DAM - SDSC
- BAE - SLAC
- StorageTek - SpectraLogic
- DDN - Gleicher Enterprises
- IBM Global Services in Houston, Texas
- Access to IBM technology (DB2, GPFS, TSM, CM for
example) - Project management
- Quality assurance and testing (SEI CMM Level 3)
- Commercial licensing and service
- Advantages of Collaborative Development
- Developers are users focus on what is needed
and what works - Keeps focus on the high end the largest data
stores - Software is open and source code is available to
collaboration members and users - Since 1992