Title: Storage and Data
1Storage and Data
- Grid Middleware 6
- David Groep, lecture series 2005-2006
2Outline
- Data management concepts
- metadata, logical filename, SURL, TURL, object
store - Protocols
- GridFTP, SRM
- RFT/FTS, FPS scheduled transfers with GT4
(LIGO) - End-to-end integrated systems
- SRB
- Structured data and databases
- OGSA-DAI
- Data curation issues
- media migration
- content conversion (emulation or translation?)
3Grid data management
- Data in a grid need to be
- located
- replicated
- life-time managed
- accessed (sequentially and at random)
- and the user does not know where the data is
4Types of storage
- File oriented storage
- cannot support content-based queries
- needs annotation metadata to be useful(note
that a file system and name is a type of
meta-data) - most implementations can handle any-sized
object(but MSS tape systems cannot handle very
small files) - Databases
- structured data representation
- supports content queries well via indexed
searches - good for small data objects (with BLOBs of
MBytes, not GBytes)
5Grid storage structure
- For file oriented storage
6File storage layers (file system analogy)
- Separation the storage concepts
- helps for both better interoperation and
scalability - Semantic view
- description of data in words and phrases
- Meta-data view
- describe data by attribute-value pairs (filename
is also an A-V pair) - like filesystems like HPFS, EXT2, AppleFS with
extended attributes - Object view
- refers to a blob of data by a meaningless handle
(unique ID) - e.g. in typical Unix FSs inode
- FAT directory entry alloc table (mixes
filename and object view) - Physical view
- block devices series of blocks on a disk, or a
specific tape offset
7Storage layers (grid naming terminology)
- LFN (Logical File Name) level 2
- like the filename in the traditional file system
- may have hierarchical structure
- is not directly suitable for access, as it is
site independent - GUID (Globally Unique ID) level 3
- opaque handle to reference a specific data object
- still independent of the site
- GUID-LFN mapping in 1-n
- SURL (Storage URL, of physical file name PFN)
level 3 - SE specific reference to a file
- understood by the storage management interface
- GUID-SURL mapping is 1-n
- TURL (Transfer URL) griddy level 4
- current physical location of a file inside a
specific SE - is transient (i.e. only exists after being
returned by the SE management interface) - has a specific lifetime
- SURL-TURL mapping is 1-(small number, typically 1)
terminology from EDG, gLite and Globus
8Data Management Services Overview
9Storage concepts
- using the OSG-EDG-gLite terminology
- Storage Element
- management interface
- transfer interface(s)
- Catalogues
- File Catalogue (meta-data catalogues)
- Replica Catalogue (location services indices)
- Transfer Service
- File Placement
- Data Scheduler
10Grid Storage Concepts Storage Element
- Storage Element
- responsible for manipulating files, on anything
from disk to tape-backed mass storage - contains services up to the filename level
- the filename typically an opaque handle for
files, - as a higher-level file catalogue serves the
meta-data, and - the same physical file will be replicated to
several SEs with different local file names - SE is a site function (not a VO function)
- Capabilities
- Storage space for files
- Storage Management interface (staging, pinning)
- Space management (reservation)
- Access (read/write, e.g. via gridFTP, HTTP(s),
Posix (like)) - File Transfer Service (controlling influx of data
from other SEs)
11Storage Element grid transfer services
- Possiblities
- GridFTP
- de-facto standard protocol
- supports GSI security
- features striping parallel transfers,
third-party transfers (TPTs, like regular FTP)
part of protocol - issue firewalls dont like open port ranges
needed by FTP(neither active nor passive) - HTTPs
- single port, so more firewall-friendly
- implementation of GSI and delegation required
(mod_gridsite) - TPTs not part of protocol
12GridFTP
- secure, robust, fast, efficient, standards
based, widely accepted data transfer protocol - Protocol based
- Multiple Independent implementation can
interoperate - Globus Toolkit supplies reference implementation
- Server, Client tools (globus-url-copy),
Development Libraries
13GridFTP The Protocol
- FTP protocol is defined by several IETF RFCs
- Start with most commonly used subset
- Standard FTP get/put etc., 3rd-party transfer
- Implement standard but often unused features
- GSS binding, extended directory listing, simple
restart - Extend in various ways, while preserving
interoperability with existing servers - Striped/parallel data channels, partial file,
automatic manual TCP buffer setting, progress
monitoring, extended restart
source Bill Allcock, ANL, Overview of GT4 Data
Services, 2004
14GridFTP The Protocol (cont)
- Existing standards
- RFC 959 File Transfer Protocol
- RFC 2228 FTP Security Extensions
- RFC 2389 Feature Negotiation for the File
Transfer Protocol - Draft FTP Extensions
- GridFTP Protocol Extensions to FTP for the Grid
- Grid Forum Recommendation
- GFD.20
- http//www.ggf.org/documents/GWD-R/GFD-R.020.pdf
source Bill Allcock, ANL, Overview of GT4 Data
Services, 2004
15Striped Server Mode
- Multiple nodes work together on a single file
and act as a single GridFTP server - An underlying parallel file system allows all
nodes to see the same file system and must
deliver good performance (usually the limiting
factor in transfer speed) - I.e., NFS does not cut it
- Each node then moves (reads or writes) only the
pieces of the file that it is responsible for. - This allows multiple levels of parallelism, CPU,
bus, NIC, disk, etc. - Critical if you want to achieve better than 1 Gbs
without breaking the bank
source Bill Allcock, ANL, Overview of GT4 Data
Services, 2004
16source Bill Allcock, ANL, Overview of GT4 Data
Services, 2004
17Disk to Disk Striping Performance
source Bill Allcock, ANL, Overview of GT4 Data
Services, 2004
18GridFTP Caveats
- Protocol requires that the sending side do the
TCP connect (possible Firewall issues) - Working on V2 of the protocol
- Add explicit negotiation of streams to relax the
directionality requirement above() - Optionally adds block checksums and resends
- Add a unique command ID to allow pipelining of
commands - Client / Server
- Currently, no server library, therefore Peer to
Peer type apps VERY difficult - Generally needs a pre-installed server
- Looking at a dynamically installable server
()DG like a kind of application-level BEEP
protocol
source Bill Allcock, ANL, Overview of GT4 Data
Services, 2004
19SE transfers random access
- wide-area R/A for files is new
- typically address by adding GSI to existing
cluster protocols - dcap -gt GSI-dcap
- rfio -gt GSI-RFIO
- xrootd -gt ??
- One (new) OGSA-style service
- WS-ByteIO
- Bulk interface
- RandomIO interface
- posix-like
- needs negotiation of actual transfer protocol
- attachment, DIME,
20SE transfer local back-end access
- backend of a grid store is not always just a disk
- distributed storage systems without native posix
- even if posix emulation is provided, that is
always slower! - for grid use, need to also provide GridFTP
- and a management interface SRM
- local access might be through the native protocol
- but the application may not know
- and it is usually not secure enough to run over
WAN - so no use for non-LAN use by others in the grid
21Storage Management (SRM)
- common management interface on top of many
backend storage solutions - a GGF draft standard (from the GSM-WG)
22Standards for Storage Resource Management
- Main concepts
- Allocate spaces
- Get/put files from/into spaces
- Pin files for a lifetime
- Release files and spaces
- Get files into spaces from remote sites
- Manage directory structures in spaces
- SRMs communicate other SRMs as peer-to-peer
- Negotiate transfer protocols
- No logical name space management (can come from
GGF- GFS)
source A. Sim, CRD, LBNL 2005
23SRM Functional Concepts
- Manage Spaces dynamically
- Reservation, allocation, lifetime
- Release, compact
- Negotiation
- Manage files in spaces
- Request to put files in spaces
- Request to get files from spaces
- Lifetime, pining of files, release of files
- No logical name space management (rely on GFS)
- Access remote sites for files
- Bring files from other sites and SRMs as
requested - Use existing transport services (GridFTP, http,
https, ftp, bbftp, ) - Transfer protocol negotiation
- Manage multi-file requests
- Manage request queues
- Manage caches, pre-caching (staging) when
possible - Manage garbage collection
- Directory Management
- Manage directory structure in spaces
source A. Sim, CRD, LBNL 2005
24SRM Methods by the features
Space management srmCompactSpace
srmGetSpaceMetaData srmGetSpaceToken srmReleaseFi
lesFromSpace srmReleaseSpace srmReserveSpace srmUp
dateSpace Authorization Functions srmCheckPermi
ssion srmGetStatusOfReassignment srmReassignToUser
srmSetPermission Request Administration srmAbor
tRequestedFiles srmRemoveRequestedFiles srmResumeR
equest srmSuspendRequest
Core (Basic) srmChangeFileStorageType srmExtendFil
eLifetime srmGetFeatures srmGetRequestSummary srmG
etRequestToken srmGetSRMStorageInfo srmGetSURLMeta
Data srmGetTransferProtocols srmPrepareToGet srmPr
epareToPut srmPutFileDone srmPutRequestDone srmRel
easeFiles srmStatusOfGetRequest srmStatusOfPutRequ
est srmTerminateRequest
Copy Function srmCopy srmStatusOfCopyRequest
Directory Function srmCp srmLs srmMkdir srmMv
srmRm srmRmdir srmStatusOfCpRequest srmStatusOfLsR
equest
source A. Sim, CRD, LBNL 2005
25SRM interactions
26SRM Interactions
27SRM Interactions
28SRM Interactions
29SRM Interactions
30SRM Interactions
31Storage infra example with SRM
graphic Mark van de Sanden, SARA
32SRM Summary
- SRM is a functional definition
- Adaptable to different frameworks for operation
(WS, WSRF, ) - Multiple implementations interoperate
- Permit special purpose implementations for unique
products - Permits interchanging one SRM product by another
- SRM implementations exist and some in production
use - Particle Physics Data Grid
- Earth System Grid
- More coming
- Cumulative experiences
- SRM v3.0 specifications to complete
source A. Sim, CRD, LBNL 2005
33Replicating Data
- Data on the grid may, will and should exist in
multiple copies - Replicas may be temporary
- for the duration of the job
- opportunistically stored on cheap but unreliable
storage - contain output cached near a compute site for
later scheduled replication - Replicas may also provide redundancy
- application level instead of site-local RAID or
backup
34Replication issues
- Replicas are difficult to manage
- if the data is modifiable
- and consistency is required
- Grid DM today does not address modifiable data
sets - as soon as more than one copy of the data exists
- otherwise, result would be either inconsistency
- or requires close coordination between storage
locations (slow) - or almost guarantees a deadlock
- Some wide-area distributed file systems do this
(AFS,DFS) - but are not scalable
- or require a highly available network
35Grid Storage concepts Catalogues
- Catalogues
- index of files that link to a single object
(referenced by GUID) - Catalogues logically a VO function, with local
instances per site - Capabilities
- expose mappings, not actual data
- File or Meta-data Catalogue names, metadata -gt
GUID - Replica Catalogue and Index GUID - SURLs for
all SEs containing the file
36File Catalogues
37graphic Peter Kunszt, EGEE DJRA1.4 gLite
Architecture
38Alternatives to the File Catalogue
- Store SURLs with data in application DB schema
- better adapted to the application needs
- easier integration in existing frameworks
39Grid Storage Concepts Transfer Service
- Transfer service
- responsible for moving (replicating) data between
SEs - transfers are scheduled, as data movement
capacity is scarce(not because of WAN network
bandwidth, but because of CPU capacity and
disk/tape bandwidth in data movement nodes!) - logically a per VO function, hosted at the site
- builds on top of the SE abstraction and a data
movement protocoland is co-ordinated with a
specific SE - Capabilities
- transfer SURL at SE1 to new SURL at SE2
- using SE mechanisms such as SRM-COPY, or directly
GridFTP - either push or pull
- subject to a set of policies, e.g.
- max. number of simultaneous transfers between SE1
and SE2 - with specific timeout or retries
- asynchronous
- states like SUBMITTED, PENDING, ACTIVE,
CANCELLING, CANCELLED, DONE_INCOMPLETE,
DONE_COMPLETE - update replica catalogues (GUID-gtSURL mappings)
40File Transfer Service
graphic gLite Architecture v1.0 (EGEE-I DJRA1.1)
41FTS Channels
- Scheduled number of transfers from one site to a
(set of) other sites - below CERNCI to sites on the OPN (next slide)
42FTS channels
- for scaling reasons
- one transfer agent for each channel, i.e. each
SRClt-gtTGT pair - agents can be spread over multiple boxes
43LHC OPN
44in network terms
- Cricket graph 2006 CERN-gtSARA via OPN
- link speed is 10 Gb/s
45FTS complex services
- Protocol translation
- although many will, not all SEs support GridFTP
- FTS in that case needs protocol translation
- translation through memory excludes third-party
transfers - Other Issues
- credential handling
- files on the source and target SE are readable
for specific users and specific VO (groups) - SEs are site services, and sites want to be
access by the end-user credential for tracability
(not a generic VO account) - continued access to the user credential needed
(like in any compute broker)
46Grid Storage Concept File Placement
- Placement Service
- manage transfers for which the host site is the
destination - coordinate updates up the VO file catalogue and
the actual transfers (via the FTS, a site-managed
service) - Capabilities
- transfer GUID or LFN from A to B(note the FTS
could only operate on SURLs) - needs access to the VO catalogues, and thus
needs sufficient privileges to do the job(i.e.
update the catalogues) - API can be the same as for the FTS
47Data Scheduler
- Like the placement service, but can direct
requests to different sites
48DM Putting it all together
graphic gLite Architecture v1.0 (EGEE-I DJRA1.1)
49GT4 view on the same issues
- Similar functionalitybut more closely linked to
the VO than the site - based on soft-state registrations(like the
information system) - treats files as the basic resource abstraction
next two slides Ann Chervenak, ISI/USC Overview
of GT4 Data Management Services, 2004
50RLS Framework
Replica Location Indexes
- Local Replica Catalogs (LRCs) contain consistent
information about logical-to-target mappings
RLI
RLI
LRC
LRC
LRC
LRC
LRC
Local Replica Catalogs
- Replica Location Index (RLI) nodes aggregate
information about one or more LRCs - LRCs use soft state update mechanisms to inform
RLIs about their state relaxed consistency of
index - Optional compression of state updates reduces
communication, CPU and storage overheads - Membership service registers participating LRCs
and RLIs and deals with changes in membership
51Replica Location Service In Context
- The Replica Location Service is one component in
a layered data management architecture - Provides a simple, distributed registry of
mappings - Consistency management provided by higher-level
services
52Access Control Lists
- Catalogue level
- protects access to meta-data
- is only advisory for actual file accessunless
the storage system only accepts connections from
a trusted agent that does itself a catalogue
lookup - SE level
- either natively (i.e. supported by both the SRM
and transfer services) or via an agent-system
like gLiteIO - SRM/transfer level
- SRM and GridFTp server need to lookup in local
ACL store access rights for each transfer - need all files owned by SRM unless underlying
FS supports ACLs - OS level
- native POSIX-ACL support in OS needed
- only available for limited number of systems
(mainly disk based) - not (yet) in popular HSM solutions
53Grid ACL considerations
- Semantics
- Posix semantics require that you traverse up the
tree to find all constraints - behaviour both costly and possibly undefined in a
distributed context - VMS and NTFS container semantics are
self-contained - taken as a basis for the ACL semantics in many
grid services - ACL syntax local semantics typically Posix-style
54Catalogue ACL method in GT4 with WS-RF
graphic Ann Chervenak, ISI/USC, from
presentation to the Design Team, Argonne, 2005
55Stand-alone solutionsSRB
- the SDSC Storage Request Broker
56SRB Data Management Objectives
- Automate all aspects of data management
- Discovery (without knowing the file name)
- Access (without knowing its location)
- Retrieval (using your preferred API)
- Control (without having a personal account at the
remote storage system) - Performance (use latency management mechanisms to
minimize impact of wide-area-networks)
source Maurice Bouwhuis, SARA, based on data by
Reagan Moore, SDSC
57Federated SRB server model
Peer-to-peer Brokering
Application
Parallel Data Access
Logical Name Or Attribute Condition
1
6
5/6
SRB server
SRB server
3
4
5
SRB agent
SRB agent
2
Server(s) Spawning
R1
MCAT
1.Logical-to-Physical mapping 2.Identification of
Replicas 3.Access Audit Control
R2
source Maurice Bouwhuis, SARA, based on data by
Reagan Moore, SDSC
58Features
- Authentication
- encrypted password
- GSI, certificate based
- Metadata has it all
- storage in a (definable) flat file system
- Data put into Collections (unix directories),
access and control operation possible - parallel transport of files
- Physical Resources combine to Logical Resource
- Encrypted data and/or encrypted metadata
- Free-ish (educational) commercial version of an
old SRB at http//www.nirvanastorage.com
source Maurice Bouwhuis, SARA, based on data by
Reagan Moore, SDSC
59SDSC Storage Resource Broker Meta-data Catalog
Application
Linux I/O
OAI WSDL
Access APIs
DLL / Python
Java, NT Browsers
GridFTP
Consistency Management / Authorization-Authenticat
ion
Prime Server
Logical Name Space
Latency Management
Data Transport
Metadata Transport
Storage Abstraction
Catalog Abstraction
Databases DB2, Oracle, Postgres, SQLServer,
Informix
HRM ORB
Servers
source Maurice Bouwhuis, SARA, based on data by
Reagan Moore, SDSC
60Production Data Grid
- SDSC Storage Resource Broker
- Federated client-server system, managing
- Over 70 TBs of data at SDSC
- Over 10 million files
- Manages data collections stored in
- Archives (HPSS, UniTree, ADSM, DMF)
- Hierarchical Resource Managers
- Tapes, tape robots
- File systems (Unix, Linux, Mac OS X, Windows)
- FTP sites
- Databases (Oracle, DB2, Postgres, SQLserver,
Sybase, Informix) - Virtual Object Ring Buffers
source Maurice Bouwhuis, SARA, based on data by
Reagan Moore, SDSC
61Mappings on Name Space
- Define logical resource name
- List of physical resources
- Replication
- Write to logical resource completes when all
physical resources have a copy - Load balancing
- Write to a logical resource completes when copy
exist on next physical resource in the list - Fault tolerance
- Write to a logical resource completes when copies
exist on k of n physical resources
source Maurice Bouwhuis, SARA, based on data by
Reagan Moore, SDSC
62SRB Development
- Now at version 3.4 (as of November 2005)
- Peer-to-peer federation of ZONES
- Support multiple independent MCAT catalogs
- Replicate metadata
- mySQL/BerkeleyDB port
- OGSA/OGSI compliant interface
- GridFTP interfaces
source Maurice Bouwhuis, SARA, based on data by
Reagan Moore, SDSC
63User Interfaces
- Unix Command line tools S-commands (e.g. Sls,
Spwd, Sget, Sput) - Windows SRB browser InQ
- Web Interface mySRB
- java and C API.
- java admin tools
- DEMO
source Maurice Bouwhuis, SARA, based on data by
Reagan Moore, SDSC
64Administrative Interface
- Also available as Unix command
source Maurice Bouwhuis, SARA, based on data by
Reagan Moore, SDSC
65Unix Command-line Tool S
source Maurice Bouwhuis, SARA, based on data by
Reagan Moore, SDSC
66Windows Browser InQ
source Maurice Bouwhuis, SARA, based on data by
Reagan Moore, SDSC
67Web Interface
source Maurice Bouwhuis, SARA, based on data by
Reagan Moore, SDSC
68Nice and Not so Nice
- It works and is being used in production
- metadata based
- it knows GSI and will know gridFTP
- for S-commands password in plain text in file
(should not be necessary) - InQ does not know GSI
- Not all interfaces have same capabilities
source Maurice Bouwhuis, SARA
69Structured DataOGSA-DAI
70Access to structured data
- Several layers
- access layer
- do not virtualise schema and semantics, just get
there - OGSA-DAI, Spitfire (depricated)
- semantic layer
- interpret and attempt to merge schemas using
ontology discovery - a research topic today, with some interesting
results - see e.g. the April VL-e workshop for some nice
examples
71OGSA-DAI
- An extensible framework for data access and
integration. - Expose heterogeneous data resources to a grid
through web services. - Interact with data resources
- Queries and updates.
- Data transformation / compression
- Data delivery.
- Customise for your project using
- Additional Activities
- Client Toolkit APIs
- Data Resource handlers
- A base for higher-level services
- federation, mining, visualisation,
- http//www.ogsadai.org.uk/
source Amy Krause, EPCC Edinburgh OGSA-DAI
Overview, GGF17, Tokyo, 2006
72Considerations
- Efficient client-server communication
- One request specifies multiple operations
- No unnecessary data movement
- Move computation to the data
- Utilise third-party delivery
- Apply transforms (e.g., compression)
- Build on existing standards
- Fill-in gaps where necessary specifications from
DAIS WG - Do not hide underlying data model
- Users must know where to target queries, Data
virtualisation is hard - Extensible architecture
- Extensible activity framework
- Cannot anticipate all desired functionality
- Allow users to plug-in their own
based on Amy Krause, EPCC Edinburgh OGSA-DAI
Overview, GGF17, Tokyo, 2006
73OGSA-DAI services
- OGSA-DAI uses data services to represent and
provide access to a number of data resources
Data Service
accesses
represents
accesses
Data Resource
Data Resource
Data Resource
based on Amy Krause, EPCC Edinburgh OGSA-DAI
Overview, GGF17, Tokyo, 2006
74Services
- Services co-located with the data as much as
possible
based on Amy Krause, EPCC Edinburgh OGSA-DAI
Overview, GGF17, Tokyo, 2006
75Supported data sources
Relational XML Files
MySQL DB2 Oracle 10 SQLServer PostgreSQL eXist Xindice Text Files Binary Files CSV SwissProt OMIM
based on Amy Krause, EPCC Edinburgh OGSA-DAI
Overview, GGF17, Tokyo, 2006
76Service interaction
lt?xml?gt ltperformgt . lt/performgt
Client
lt?xml/gt ltresponsegt . lt/responsegt
Data Sink
011010011101100
based on Amy Krause, EPCC Edinburgh OGSA-DAI
Overview, GGF17, Tokyo, 2006
77Data Service internals
from Alexander Wöhrer, AustrianGrid OGSA-DAI
tutorial, GGF13 Seoul, 2005
78Request/response
ltperform xmlns" xmlnsxsi
xsischemaLocation"gt ltsqlQueryStatement
name"statement"gt ltexpressiongt
select from littleblackbookwhere id10
lt/expressiongt ltresultSetStream
nameoutput"/gt lt/sqlQueryStatementgt
ltdeliverToURLname"deliverOutput"gt
ltfromLocal fromoutput"/gt
lttoURLgtftp//anonfrog_at_ftp.example.com/homelt/toURL
gt lt/deliverToURLgt lt/performgt
ltgridDataServiceResponse xmlns"gt ltresult
name"deliverOutput" statusCOMPLETED"/gt
ltresult name"statement" statusCOMPLETED"/gt lt/gr
idDataServiceResponsegt
from Alexander Wöhrer, AustrianGrid OGSA-DAI
tutorial, GGF13 Seoul, 2005
79Client library interaction
you have to know the backend structure of the
data source
- SQLQuery
- SQLQuery query new SQLQuery("select from
littleblackbook - where id'3475'")
- XPathQuery
- XPathQuery query new XPathQuery(
"/entry_at_idlt10" ) - XSLTransform
- XSLTransform transform new XSLTransform()
- DeliverToGFTP
- DeliverToGFTP deliver new DeliverToGFTP("ogsadai
.org.uk", 8080, "myresults.txt" )
from Alexander Wöhrer, AustrianGrid OGSA-DAI
tutorial, GGF13 Seoul, 2005
80Simple requests
- Simple requests consist of only one activity
- Send the activity directly to the perform method
- SQLQuery query new SQLQuery(
- "select from littleblackbookwhere
id'3475'") - Response response service.perform( query )
from Alexander Wöhrer, AustrianGrid OGSA-DAI
tutorial, GGF13 Seoul, 2005
81Closing Remarks
82Miscellaneous tidbits
- Data Curationthe need to preserve data over time
- migrating media (preserve readablility) is only
one aspect - need also
- format conversion or
- emulation of the programs operating on the data
- Data Provenanceneed to know how this data has
come into being - association of meta-data and work flow
- recording of workflow and w/f instances in
essential - this is (today) application specific, but maybe,
one day,