High Performance Storage System - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

High Performance Storage System

Description:

... for Medium-Range Weather Forecasts (ECMWF) in the ... IBM Global Services in Houston, Texas. Access to IBM technology (DB2, GPFS, TSM, CM for example) ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 23
Provided by: Hul56
Category:

less

Transcript and Presenter's Notes

Title: High Performance Storage System


1
High Performance Storage System
  • Harry Hulen
  • 281-488-2473
  • hulen_at_us.ibm.com

2
HSM Hierarchical storage management
  • Purposes of HSM
  • Extend disk space
  • Back up disk files to tape
  • Managed permanent archive
  • User sees virtually unlimited file system
  • Data migrates down the hierarchy
  • Migrated files may be asychronously purged from
    higher level (e.g. disk) to free up space
  • Multiple classes of service in a single name
    space, for example
  • Disk to tape
  • Tape only
  • High speed disk to low-cost disk to MAID to tape
    library to shelf tape

3
HPSS 6.2 Architecture
  • Hierarchical global file system
  • Distributed, cluster architecture provides
    horizontal growth
  • SAN and/or LAN connected
  • Metadata engine is IBM DB2
  • Multiple storage classes
  • Striped disks and tapes for higher data rates
  • Multi-petabyte capability in a single name space
  • Fully Linux and AIX capable, even mix and match
  • Supports new generation tapes and libraries
  • New GPFS-HPSS, reverse SAN, NFSv4 and Linux VFS
    interfaces
  • New lower HPSS Light prices

4
New Features in HPSS 6.2
  • More Computer and Operating System Options
  • The HPSS Core Server can now run on either AIX or
    Linux
  • The Linux platform may be either Intel-based
    xSeries computers, IBM p5-based Open Power
    computers, or IBM pSeries computers
  • Movers and 32 or 64-bit clients may be Linux,
    AIX, Solaris, or Irix
  • HPSS is a cluster solution, and components of
    different vendors and different operating systems
    may be mixed together
  • Fewer Prerequisites
  • HPSS 5.1 requires Distributed Computing
    Environment as a prerequisite
  • HPSS 6.2 eliminates this dependency
  • Simpler, faster installation
  • Simplified HPSS setup and initial configuration,
    reducing startup costs
  • Direct SAN access to disks
  • In addition to the original Mover-based disk
    sharing, disks may now be accessed directly over
    a SAN
  • This may be done with the HPSS Client API and
    PFTP and with the Linux Virtual File System
    interface described on the next slide
  • Reverse SAN access to other file systems SANs
  • The HPSS Local File Mover feature has been
    updated and tested to allow HPSS to transfer data
    between another file system and HPSS over the
    other files systems SAN
  • HPSS Movers serve as the transfer agent.
  • This capability works with any cluster file
    systems offering a Unix/Posix read/write
    interface, and it has been tested with IBM GPFS,
    Lustre, and ADIC SNFS

5
New Features in HPSS 6.2
  • Linux Virtual File System Interface (VFS)
  • Linux applications will benefit from a true POSIX
    standard read/write interface
  • This interface will allow many standard
    commercial programs that do file I/O to use HPSS
    as file space, essentially turning them into
    hierarchical disk-tape applications
  • Windows access to HPSS files
  • HPSS now supports Windows access to HPSS files
    using Samba to create a Windows SMB share
  • Linux-based Samba runs on a core server, mover,
    or dedicated computer and uses VFS interface as
    its interface to HPSS
  • DB2/HPSS Active Tablespace Archive
  • IBM DB2 Universal Database can be set up to use
    Linux-based HPSS to manage DB2 tablespace
  • This is a useful way to create active, online
    archival databases that can migrate to tape yet
    be accessed normally
  • GPFS/HPSS Transparent HSM
  • This capability enables HPSS to migrate files
    from GPFS
  • Together, GPFS and HPSS provide a virtually
    infinite high performance file system with robust
    disaster protection
  • GridFTP
  • Globus GridFTP is a high-performance, secure,
    reliable data transfer protocol optimized for
    high-bandwidth wide-area networks
  • LBNL and ANL have developed and are testing a
    high performance GridFTP interface for HPSS

6
HPSS 6.2 Hardware Supported
  • IBM LTO-3 Tape Drives on AIX and Linux Movers
  • HP LTO-3 Tape Drives on Linux Movers
  • IBM 3592 Gen2 Tape Drives
  • StorageTek (Sun) Titanium Tape Drives
  • IBM 3584 Library
  • StorageTek (Sun) SL500 and SL8500 Libraries
  • SpectraLogic T Series with Sony AIT and IBM LTO-3
    Drives
  • Generic SCSI tape libraries
  • IBM 4000 Series Disks
  • Direct Data Networks (DDN) Disks by special bid

7
How a Cluster File System Works
1. Client issues READ to Metadata Controller
3. MDC sends lock and ticket back to client
4. Client reads data from shared disk over SAN
2. MDC accesses metadata on disk
  • IBM SAN FS
  • ADIC SNFS
  • IBM GPFS is functionally equivalent but has a
    distributed metadata
    architecture
  • HPSS is similar but adds tape (see next slide)

Examples
8
How HPSS Works
3. CS commands Mover to stage file from tape to
disk
1. Client issues READ to Core Server
5. CS sends lock and ticket back to client
4. Mover stages file from tape to disk
2. CS accesses metadata on disk
6a. Client reads data from shared disk over SAN
6b. Or Mover reads data and sends to client over
LAN
9
HPSS Enterprise HSM Services HPSS 6.2 Client
Access Overview
10
HPSS Write or Put Over TCP/IP Network
  • HPSS Client API
  • Hpss_write( ) etc.
  • Optional list form to access discon-tiguous
    segments
  • Parallel, gigabyte/s capability
  • Use for performance-critical applications
  • HPSS PFTP
  • Parallel FTP
  • FTP-like get-put semantics
  • Parallel, gigabyte/s capability
  • Most-used HPSS interface

11
SAN-Enabled HPSS Write or Put
  • Data transferred directly between client and HPSS
    disk over HPSS SAN
  • Control is over TCP/IP network (separation of
    control and data)
  • Supported by HPSS Client API and PFTP
  • Currently supported on AIX and Linux
  • Used internally to HPSS to move data between disk
    and tape

12
HPSS Local File Mover or Reverse SAN
  • HPSS accesses data on client SAN
  • Examples of client SAN IBM SAN FS, ADIC SNFS,
    IBM GPFS
  • Activated by PFTP LFPUT-LFGET with more options
    coming
  • CPU overhead entirely offloaded to HPSS Movers
  • Parallel capability and/or direct tape access via
    Class of Service options

13
Posix VFS Interface for Linux
  • HPSS accessed usingstandard UNIX/Posix
    semantics
  • Run standard products on HPSS such as IBM DB2,
    IBM TSM, NFSv4, and Samba
  • VFS currently only on Linux

Unix/Posix Application
HPSS VFS Extensions Daemons
Posix File System Interface
DataBuffer
HPSS Client API
Linux Client
Control Data Optional SAN Data Path
HPSS Data Movers
HPSS Core Server
HPSS ClusterAIX or Linux
14
Windows Interface via Samba Agent
WindowsApplication
Unix/Posix Application
HPSS VFS Extensions Daemons
CIFS Client
SambaServer
Windows File System Interface
Posix File System Interface
Data Buffer
DataBuffer
HPSS Client API
Windows Client
Linux Agent
Control Data Optional SAN Data Path
HPSS Data Movers
HPSS Core Server
HPSS ClusterAIX or Linux
15
GPFS-HPSS Data Flow
MigrateGPFS Filesto HPSS
Stage FilesBack to GPFS
Sites may administratively enable interfaces to
access the HPSS copy of GPFS files
Root/ (HPSS)
Root/ (GPFS)
gpfs
logs
admin
home
GPFS File Systems
Or
gpfs1 gpfs2 gpfs3
bbb
ccc
aaa
GPFS file systems are copied in the HPSS namespace
HPSS Namespace
16
Demo Configuration
17
DB2/HPSS Active Tablespace Archive
Active DB2 uses dedicated local disk arrays for
tablespace
Archiving databases in HPSS allows virtually
unlimited accessible data
Archive DB2 uses HPSS for tablespace allowing
direct access to archived data
Virtual FSInterface
MAID is disk emulating tape to provide faster
retrieval than tape
MAID
18
(No Transcript)
19
Capacity scaling examplesAll are single name
spaces
  • 4.2 PB Los Alamos National Laboratory (LANL)
    Secure Computing Facility, in 46M files, growing
    at a rate of 250 to 300 TB per month
  • 3.8 PB The European Centre for Medium-Range
    Weather Forecasts (ECMWF) in the UK, in 24M files
  • 3.3 PB Brookhaven National Laboratory (BNL) in
    19M files
  • 2.4 PB San Diego Supercomputer Center (SDSC) in
    25M files
  • 2.2 PB National Centers for Environmental
    Prediction (NCEP) in 3M files
  • 2.0 PB Lawrence Livermore National Lab (LLNL)
    Secure Computing Facility (SCF) in 31M files
  • 2.0 PB LLNL Open Computing Facility (OCF) in 24M
    files
  • 2.2 PB Stanford Linear Accelerator Center (SLAC)
    in 2.1M files, a tape-only system
  • 1.9 PB Commissariat à l'Energie Atomique/Division
    des Applications Militaires (CEA/DAM) Compute
    Center in France in 1.8M files
  • 1.5 PB National Energy Research Scientific
    Computing Center (NERSC), in 40M files
  • 1.3 PB Institut National de Physique Nucléaire et
    de Physique des Particules (IN2P3) in 12M files
  • 1 PB National Climatic Data Center (NCDC) in 35
    M files

20
HPSS PFTP transfers data at over 200 MB/sec
between Dept of Energy Labs
21
SC2004 2 GB/Sec Read After Write Demo
Four FAStT storage controllers, 16 fibre channel
ports, 16 RAID 5 LUNsAs each 500MB block is
written, it is immediately read by the other
computer.Aggregate 1 GB/Sec Write and 1 GB/Sec
Read, Total 2 GB/Sec
22
The HPSS Collaboration
  • Development, Deployment and Industry Members
    Co-Develop and Test HPSS
  • Lawrence Livermore National Lab. - Sandia
    National Laboratories
  • Los Alamos National Laboratory - Oak
    Ridge National Laboratory
  • Lawrence Berkeley National Lab - ECMWF
  • CEA/DAM - SDSC
  • BAE - SLAC
  • StorageTek - SpectraLogic
  • DDN - Gleicher Enterprises
  • IBM Global Services in Houston, Texas
  • Access to IBM technology (DB2, GPFS, TSM, CM for
    example)
  • Project management
  • Quality assurance and testing (SEI CMM Level 3)
  • Commercial licensing and service
  • Advantages of Collaborative Development
  • Developers are users focus on what is needed
    and what works
  • Keeps focus on the high end the largest data
    stores
  • Software is open and source code is available to
    collaboration members and users
  • Since 1992
Write a Comment
User Comments (0)
About PowerShow.com