Title: HPSS
1HPSS
The High Performance Storage System
Developed by IBM, LANL, LLNL, ORNL, SNL, NASA
Langley, NASA Lewis, Cornell, MHPCC, SDSC, UW
with funding from DoE, NASA NSF Presented by
Christopher Ho, CSci 599
2Motivation
- In last 10 years, processor speeds have increased
50-fold - Disk transfer rates have increased lt 4 X
- RAID now successful, inexpensive
- Tape speeds have increased lt 4 X
- tape striping not widespread
- Performance gap is widening!
- Bigger bigger files (10s, 100s of GB, soon TB)
- gt Launch scalable storage initiative
3IEEE Mass Storage Reference Model
- Defines layers of abstraction transparency
- device, location independence
- Separation of policy and mechanism
- Logical separation of control and data flow
- Defines common terminology
- compliance does not imply inter-operability
- Scalable, Hierarchical Storage Management
- see http//www.ssswg.org/sssdocs.html
4Introduction Hierarchical Storage
Decreasing cost speed,Increasing capacity
Memory
Disk
Optical disk
Magnetic Tape
5HPSS Objectives
- Scalable
- transfer rate, file size, name space, geography
- Modular
- software subsystems replaceable,network/tape
technologies updateable, API access - Portable
- multiple vendor platforms, no kernel
modifications,multiple storage technologies,
standards-based, leverage commercial products
6HPSS Objectives (cont)
- Reliable
- distributed software and hardware components
- atomic transactions
- mirror metadata
- failed/restarted servers can reconnect
- storage units can be varied on/offline
7Access into HPSS
- FTP
- protocol already supports 3rd party transfers
- new partial file transfer (offset size)
- Parallel FTP
- pget, pput, psetblocksize, psetstripewidth
- NFS version 2
- most like traditional file system, slower than
FTP - PIOFS
- parallel distributed FS on IBM SP2 MPP
- futures AFS/DCE DFS, DMIG-API
8HPSS architecture
Processing node
Processing node
HPSS server
MPP interconnect
Storage System Mgmt
I/O node
Control Network
HiPPI/FC/ATM data network
Network Attached Disk
NFS FTP DMIG-API
-NETWORK
Network Attached Tape
9Software infrastructure
- Encina transaction processing manager
- two-phase commit, nested transactions
- guarantees consistency of metadata, server state
- OSF Distributed Computing Environment
- RPC calls for control messages
- Thread library
- Security (registry privilege service)
- Kerberos authentication
- 64 bit Arithmetic functions
- file sizes up to 264 bytes
- 32 bit platforms, big/little endian architectures
10Software components
- Name server
- map POSIX filenames to internal file, directory
or link - Migration/Purge policy manager
- when/where to migrate to next level in hierarchy
- after migrated, when to purge copy on this level
- purge initiated when usage exceeds
administrator-configured high-water mark - each file evaluated by size, time since last read
- migration, purge can also be manually initiated
11Software components (cont)
- Bitfile server
- provides abstraction of bitfiles to client
- provides scatter/gather capability
- supports access by file offset, length
- supports random and parallel reads/writes
- works with file segment abstraction (see Storage
server)
12Software components (cont)
- Storage server
- map segments onto virtual volumes, virtual
volumes onto physical volumes - virtual volumes allow tape striping
- Mover
- transfers data from a source to a sink
- tape, disk, network, memory
- device control seek, load/unload, write tape
mark, etc.
13Software components (cont)
- Physical Volume Library
- map physical volume to cartridge, cartridge to
PVR - Physical Volume Repository
- control cartridge mount/dismount functions
- modules for Ampex D2, STK 4480/90 SD-3,IBM
3480 3590 robotic libraries - Repack server
- deletions leave gaps on sequential media
- read live data, rewrite on new sequential
volume,free up previous volume
14Software components (cont)
- Storage system management
- GUI to monitor/control HPSS
- stop/start software servers
- monitor events and alarms, manual mounts
- vary devices on/offline
15Parallel transfer protocol - goals
- Provide parallel data exchange between
heterogeneous systems and devices - Support different combinations of parallel and
sequential source/sink - Support gather/scatter and random access
- combinations of stripe width, both regular and
irregular data block size - Scalable I/O bandwidth
- Transport independent (TCP/IP, HiPPI, FCS, ATM)
16Gather/scatter lists
logical window
D1
D2
D3
17Parallel transport architecture
control connections
client
control connections
S1
Sn
D1
Dn
parallel data flow
18Parallel FTP transfer (pget)
1
Name server
Parallel FTPd
Parallel FTP client
2
6
Client mover
6
Bitfile server
Client mover
3
Storage server
4
4
5
Mover
5
Mover
19Summary
- High performance
- up to 1 GB/s aggregate transfer rates
- Scalable storage
- parallel architecture
- terabyte-sized files
- petabytes in archive
- Robust
- transaction processing manager
- Portable
- IBM, Sun implementations available
20Conclusion
- Feasability has been demonstrated for large,
scalable storage - Software exists, is shipping, and is actively
used in the national labs on a daily basis - Distributed architecture and parallel
capabilities mesh well with grid computing