Title: SRM in GGF
1 Storage Resource Management Providing Uniform
Access to Storage Systems (including
MSSs) Arie Shoshani LBNL (on behalf of the SRM
collaboration) http//sdm.lbl.gov/srm-wg
2Storage Resource Managers (SRMs)
Definition SRMs are middleware components whose
function is to provide dynamic space allocation
file management of shared storage components on
the Grid
3History
- 4 year of Storage Resource (SRM) Management
activity - Experience with system implementations v.1.x -
2001 - MSS HPSS (LBNL, ORNL, BNL), Enstore (Fermi),
JasMINE (Jlab), Castor (CERN), MSS (NCAR), SE
(RAL) - Disk systems DRM(LBNL), dCache(Fermi), jSRM
(Jlab), - SRM v2.x spec was finalized - 2003
- Several implementations of v2.x completed or
in-progress - Jlab, Fermi, CERN, LBNL
- Started GSM GGF-BOF at GGF8 (June 2003)
- Last SRM collaboration meeting Sept. 2004
- SRM v3.x spec (for GGF) being finalized - 2005
4Uniformity of Interface ?Compatibility of SRMs
Client USER/APPLICATIONS
Grid Middleware
SRM
SRM
SRM
SRM
SRM
SRM
SRM
Enstore
dCache
JASMine
Castor
Unix-based disks
SE
5Current Storage Resource ManagementActive
Working Group
CERN Olof Barring, Jean-Philippe Baud, James
Casey, Peter Kunszt Rutherford
lab Jens Jensen, Owen Synge Jefferson Lab
Bryan Hess, Andy Kowalski, Chip Watson Fermilab
Don Petravick, Timur Perelmutov LBNL Junmin Gu
, Arie Shoshani, Alex Sim, Kurt
Stockinger Univa Rich Wellner
6Basic Issues
- Suppose you want to run a job on your local
machine - Need to allocate space
- Need to bring all input files from a storage
system - Need to ensure correctness of files transferred
- Need to monitor and recover from errors
- What if files dont fit space? Need to manage
file streaming - Need to remove files to make space for more files
- Now, suppose that the machine and storage space
is a shared resource - Need to to the above for many users
- Need to enforce quotas
- Need to ensure fairness of space allocation and
scheduling
7Basic Issues
- Now, suppose you want to do that on the WAN
- Need to access a variety of storage systems
- mostly remote systems, need at have access
permission - Need to have special software to access mass
storage systems - Now, suppose you want to run distributed jobs on
the WAN - Need to allocate remote spaces
- Need to move (stream) files to remote sites
- Need to manage file outputs and their movement to
destination site(s)
8Types of storage resource managers
- Disk Resource Manager (DRM)
- Manages one or more disk resources
- Tape Resource Manager (TRM)
- Manages access to a tertiary storage system (e.g.
HPSS) - Hierarchical Resource Manager (HRMTRM DRM)
- An SRM that stages files from tertiary storage
into its disk cache
9Peer-to-Peer Uniform Interface
10General Analysis Scenario
11Standards for Grid Storage Management
- Main concepts
- Allocate spaces
- Get/put files from/into spaces
- Pin files for a lifetime
- Release files and spaces
- Get files into spaces from remote sites
- Manage directory structures in spaces
- SRMs communicate as peer-to-peer
- Negotiate transfer protocols
- No logical name space management (rely of GGF-
GFS)
12Where do SRMs belongin the Grid architecture?
.
2
N
G
O
R
S
T
O
R
E
O
Request
Workflow or
I
E
V
O
C
T
Community
Application-
Consistency Services
I
C
N
I
Interpretation
Request
A
L
T
I
I
F
C
Authorization
Specific Data
(e.g., Update Subscription,
A
C
V
I
A
I
and Planning
Management
C
R
U
M
E
L
Services
Discovery Services
Versioning, Master Copies)
E
T
L
E
P
O
E
Services
Services
L
P
S
R
P
V
D
I
S
O
I
A
V
T
C
C
E
L
G
R
L
1
S
N
O
O
I
E
L
E
E
C
F
T
V
L
C
Data Filtering or
A
General Data
Storage
Compute
Data
Monitoring/
Data
I
A
S
P
R
T
R
N
I
E
Transformation
Discovery
Management
Scheduling
Transport
Auditing
Federation
E
T
U
C
I
C
D
L
N
E
O
I
Services
Services
(Brokering)
(Brokering)
Services
Services
Services
L
E
V
R
U
S
L
M
G
R
O
E
O
E
O
R
C
S
C
Resource
Storage
Compute
Data Filtering or
Database
File Transfer
Monitoring/
Resource
Resource
Transformation
Management
Service
Auditing
Manager
Management
Services
Services
(GridFTP)
Y
T
I
V
I
Communication
Authentication and
T
C
Protocols (e.g.,
Authorization
E
TCP/IP stack)
Protocols (e.g., GSI)
N
N
O
C
C
Other Storage
I
Mass Storage System (HPSS)
Compute
R
Networks
B
This figure based on the Grid Architecture paper
by Globus Team
Systems
A
systems
F
13SRMs supports data movement betweenstorage
systems
.
2
N
G
O
R
S
T
O
R
E
O
Request
Workflow or
I
E
V
O
C
T
Community
Application-
Consistency Services
I
C
N
I
A
Interpretation
Request
L
T
I
I
F
C
A
Authorization
Specific Data
(e.g., Update Subscription,
C
V
I
A
I
and Planning
Management
C
R
U
M
E
L
Services
Discovery Services
Versioning, Master Copies)
E
T
L
E
P
O
E
Services
Services
L
P
S
R
P
V
D
I
S
O
I
A
V
T
C
C
E
L
G
R
L
1
S
N
O
O
I
E
L
E
E
C
F
T
Storage
V
L
C
Data Filtering or
A
Compute
Data
Monitoring/
Data
General Data
I
A
S
P
R
T
R
N
I
E
Transformation
Scheduling
Transport
Auditing
Federation
E
Discovery
Data
T
U
C
I
C
D
L
N
E
O
I
Services
(Brokering)
Services
Services
Services
Services
L
E
V
R
U
Movement
S
L
M
G
R
O
E
O
E
O
R
C
S
C
E
L
S
G
E
E
N
C
Resource
Storage
I
C
Compute
Data Filtering or
Database
File Transfer
R
S
R
Monitoring/
U
Resource
Resource
Transformation
Management
Service
U
G
O
N
O
Auditing
Manager
Management
Services
Services
(GridFTP)
S
I
S
R
E
E
A
R
R
H
S
Y
T
I
V
I
Communication
Authentication and
T
C
Protocols (e.g.,
Authorization
E
TCP/IP stack)
Protocols (e.g., GSI)
N
N
O
C
C
Other Storage
I
Mass Storage System (HPSS)
Compute
R
Networks
B
This figure based on the Grid Architecture paper
by Globus Team
Systems
A
systems
F
14SRM Functional Concepts
- Manage Spaces dynamically
- Reservation, lifetime
- Negotiation
- Manage files in spaces
- Request to put files in spaces
- Request to get files from spaces
- Lifetime, pining of files, release of files
- No logical name space management (done by replica
location services) - Access remote sites for files
- Bring files from other sites and SRMs as
requested - Use existing transport services (GridFTP, https,
) - Transfer protocol negotiation
- Manage multi-file requests
- Manage request queues
- Manage caches
- Manage garbage collection
- Directory Management
- Uxix semantics srmLs, srmMkdir, srmMv, srmRm,
srmRmdir
15Concepts Types of Files
- Volatile temporary files with a lifetime
guarantee - Files are pinned and released
- Files can be removed by SRM when released or when
lifetime expires - Permanent
- No lifetime
- Files can only be removed by creator (owner)
- Durable files with a lifetime that CANNOT be
removed by SRM - Files are pinned and released
- Files can only be removed by creator (owner)
- If lifetime expires invoke administrative
action (e.g. notify owner, archive and release)
16Concepts Types of Spaces
- Types
- Volatile
- Space can be reclaimed by SRM when lifetime
expires - durable
- Space can be reclaimed by SRM only if it does NOT
contain files - Can choose to archive files and release space
- Permanent
- Space can only be released by owner or
administrator - Assignment of files to spaces
- Files can only be assigned to spaces of the same
type - Spaces can be reserved
- No limit on number of spaces
- Space reference handle is returned to client
- Total space of each type are subject to SRM
and/or VO policies - Default spaces
- Files can be put into SRM spaces without explicit
reservation - Defaults are not visible to client
- Compacting space
- Release all unused space space that has no
files or files whose lifetime expired
17Concepts Directory Management
- Usual unix semantics
- srmLs, srmMkdir, srmMv, srmRm, srmRmdir
- A single directory for all file type
- No directories for each type
- File assignment to types is virtual
- File can be placed in SRM-managed directories by
maitaining mapping to clients directory - Access control services
- Support owner/group/world permission
- Can only be assigned by owner
- When file requested by user, SRM should check
permission with source site
18Examples of Directory Structures(user defined)
D1
D1
D3
D2
D3
D2
D4
D4
F2 (P)
F1 (D)
F3 (V)
F1 (V)
F2 (V)
F3 (V)
F4 (D)
F5 (D)
F6 (D)
F7 (P)
F8 (P)
F4 (P)
F5 (D)
(1) Mixed file types
(2) By file type
- Supported function ChangeFileType
- Advantage of (1) no need to move files when
file types are changed
19Concepts Space Reservations
- Negotiation
- Client asks for space C-guaranteed, MaxDesired
- SRM return S-guaranteed lt C-guaranteed,
best effort lt MaxDesired - Type of space
- Can be specified
- Subject to limits per client (SRM or VO policies)
- Default volatile
- Lifetime
- Negotiated C-lifetime requested
- SRM return S-lifetime lt C-lifetime
- Reference handle
- SRM returns space reference handle
- User can provide srmSpaceTokenDescription to
recover handles
20Concepts Transfer Protocol Negotiation
- Negotiation
- Client provides an ordered list
- SRM return highest possible protocol it supports
- Example
- Protocols list bbftp, gridftp, ftp
- SRM returns gridftp
- Advantages
- Easy to introduce new protocols
- User controls which protocol to use
- Default SRM policy choice
- How it is returned?
- The protocol of the Transfer URL (TURL)
- Example bbftp//dm.slac.edu/temp/run11/File678.tx
t
21Concepts Multi-file requests
- Can srmRequestToGet multiple files
- Required Files URLs
- Optional space file type, space handle, Protocol
list - Optional total retry time
- Provide Site URL (SURL)
- URL known externally e.g. in Rep Catalogs
- e.g. srm//sleepy.lbl.gov4000/tmp/foo-123
- Get back transfer URL (TURL)
- Path can be different that in SURL SRM internal
mapping - Protocol chosen by SRM
- e.g. gridftp//dm.lbl.gov4000/home
/level1/foo-123 - Managing request queue
- Allocate space according to policy, system load,
etc. - Bring in as many files as possible
- Provide information on each file brought in or
pinned - Bring additional files as soon as files are
released - Support file streaming
22SRM Methods
Space management srmReserveSpace srmReleaseSpace s
rmUpdateSpace srmCompactSpace  FileType
management srmChangeFileType Status/metadata srm
GetRequestStatus srmGetFileStatus srmGetRequestSum
mary srmGetRequestID srmGetFilesMetaData srmGetSpa
ceMetaData
File Movement srmPrepareToGet srmPrepareToPut srmC
opy  Lifetime management srmReleaseFiles srmPutDo
ne srmExtendFileLifeTime Terminate/resume srmAbor
tRequest srmAbortFile srmSuspendRequest srmResumeR
equest Â
23SRM v3.x Basic vs. Advanced Features
BASIC ADVANCED
- File movement
- PrepareToGet
- PrepareToPut
- Copy
- Request capabilities
- Multi-file Streaming
- Trans. Prot. Negotiation
- File lifetime negotiation
- File types
- Volatile
- Permanent
- durable
yes
yes
yes
yes
no
yes
yes
yes
yes
yes
no
yes
yes
yes
yes (for MSS)
yes
no
yes
24Features in Basic vs. Advanced SRM
BASIC ADVANCED
- Space reservations
- Space-time negotiation
- Space types
- Remote access
- gridFTP
- Other SRMs
- User-specified Directory
- Volatile
- Permanent
- Durable
- Terminate/suspend
- Abort file
- Abort request
- Suspend/resume request
no
yes
no
yes
no
yes
no
yes
no
yes
yes
yes
no
yes
yes
yes
yes
yes
no
yes
25Use of SRMs forRobust directory-to-directoryfil
e replication
Use Case
26Massive Robust File Replication
- Multi-File Replication why is it a problem?
- Tedious task many files, repetitious
- Lengthy task long time, can take hours, even
days - Error prone need to monitor transfers
- Error recovery need to restart file transfers
- Stage and archive from MSS limited concurrency,
down time, transient failures - Use of FTP no large windows / multiple streams
- Security both for local MSS and the network
- Firewalls transfer from/to MSS must be internal
to the site - Specialized MSS HPSS at NERSC, ORNL, ,
- Legacy MSS MSS at NCAR
27Main Idea
- Leverage off Storage Resource Managers (SRMs)
Technology - Supported by SRM middleware project
- Leverage from experience with other SciDAC
projects PPDG - What do you get?
- SRMs queue multi-file requests
- SRMs allocate space and release space
automatically - SRMs request files from remote SRMs
- Recover from network failures
- SRMs invoke GridFTP use large windows
parallel streams
28DataMover HRMs use in ESG for Robust Muti-file
replication
Anywhere
DataMover
BNL
HRM (performs writes)
HRM (performs reads)
LBNL/ ORNL
29(No Transcript)
30Web-Based File Monitoring Tool
- Shows
- Files already transferred- Files during
transfer - Files to be transferred
- Also shows for
- each file
- Source URL
- Target URL
- Transfer rate
31File tracking helps to identify bottlenecks
Shows that archiving is the bottleneck
32File tracking shows recovery from transient
failures
Total 45 GBs
33File tracking shows network slowdown and recovery
Total 53 GBs
34Multi-file Transfer plot from BNL to LBNL
(27/02/04)
1 Request ACCEPTED 2 File SpaceReserved 3
Grid FTPStart 4 Grid FTPEnd 5 HPSS
MIGRATION_REQUEST 6 HPSS ARCHIVE_START 7
HPSS ARCHIVED 8 File Released
35Multi-file Transfer plot from BNL to LBNL
(10/02/04)
1 Request ACCEPTED 2 File SpaceReserved 3
Grid FTPStart 4 Grid FTPEnd 5 HPSS
MIGRATION_REQUEST 6 HPSS ARCHIVE_START 7
HPSS ARCHIVED 8 File Released 9 File
SpaceClaimed 10 HPSS Archivig_Error
36Summary
- Storage Resource Management essential for Grid
- SRM is a functional definition
- Adaptable to different frameworks (WS, OGSA,
WSRF, ) - Multiple implementations interoperate
- Permit special purpose implementations for unique
products - Permits interchanging one SRM product by another
- SRM implementations exist and some in production
use - Particle Physics Data Grid
- Earth System Grid
- More coming
- Cumulative experience in GGF-WG
- Specifications SRM v3.0 complete
37Extra Slides
38Space Reservation Functional Spec
- srmReserveSpace
- In TUserID userID,
- TSpaceType typeOfSpace,
- String userSpaceTokenDescription,
- TSizeInBytes sizeOfTotalSpaceDesired,
- TSizeInBytes sizeOfGuaranteedSpaceDesired,
- TLifeTimeInSeconds lifetimeOfSpaceToReserve,
- TStorageSystemInfo storageSystemInfo
- Out TSpaceType typeOfReservedSpace,
- TSizeInBytes sizeOfTotalReservedSpace,
- TSizeInBytes sizeOfGuaranteedReservedSpace,
- TLifeTimeInSeconds lifetimeOfReservedSpace,
- TSpaceToken, referenceHandleOfReservedSpace,
- TReturnStatus returnStatus
39Request-to-Get Files Functional Spec
- srmPrepareToGet
- In TUserID userID,
- TGetFileRequest arrayOfFileRequest,
- string arrayOfTransferProtocols,
- string userRequestDescription,
- TStorageSystemInfo storageSystemInfo,
- TLifeTimeInSeconds TotalRetryTime
- Out TRequestToken requestToken,
- TReturnStatus returnStatus,
- TGetRequestFileStatus arrayOfFileStatus
40TGetFileRequest typedef Functional Spec
- typedef struct TSURLInfo fromSURLInfo,
- TLifeTimeInSeconds lifetime, // pin time
- TFileStorageType fileStorageType,
- TSpaceToken spaceToken,
- TDirOption dirOption
- TGetFileRequest
41Detailed sequence of actions For each file being
replicated
Anywhere
srmCopy (sourceURLhpss.lbnl.gov/xyz/file_x,
targetURL mss.ncar.gov/uvw/file_y)
Get list of files from directory
Request files