Title: Architecture of gLite Data Management System
1Architecture of gLite Data Management System
- Tony Calanducci
- INFN Catania
- International Summer School on Grid Computing
2006 - Ischia (Naples), July 09-21th 2006
2Outline
- Grid Data Management Challenge
- Storage Elements and SRM
- File Catalogs and DM tools
- Metadata Services
- File Transfer Services
3The Grid DM Challenge
- Need common interface to storage resources
- Storage Resource Manager (SRM)
- Need to keep track where data are stored
- File and Replica Catalogs
- Need scheduled, reliable file transfer
- File transfer service
- Heterogeneity
- Data are stored on different storage systems
using different access technologies - Distribution
- Data are stored in different locations in most
cases there is no shared file system or common
namespace - Data need to be moved between different locations
4Introduction
- Assumptions
- Users and programs produce and require data
- the lowest granularity of the data is on the file
level (we deal with files rather than data
objects or tables) - Data files
- Files
- Mostly, write once, read many
- Located in Storage Elements (SEs)
- Several replicas of one file in different sites
- Accessible by Grid users and applications from
anywhere - Locatable by the WMS (data requirements in JDL)
- Also
- WMS can send (small amounts of) data to/from
jobs Input and Output Sandbox - Files may be copied from/to local filesystems
(WNs, UIs) to the Grid (SEs)
5Data services in gLite
- File Access Patterns
- Write once, read-many
- Rare append-only updates with one owner
- Frequently updated at one source - replicas
check/pull new version - (NOT frequent updates, many users, many sites)
- File naming
- Mostly, see the logical file name (LFN)
- LFN must be unique
- includes logical directory name
- in a VO namespace
- E.g. /gLite/myVOname.org/runs/12aug05/data1.res
- 3 service types for data
- Storage
- Catalogs
- Movement
6gLite Grid Storage Requirements
- Def The Storage Element is the service which
allows a user or an application to store data for
future retrieval - Manage local storage (disk) and/or interface to
complex Mass Storage Systems (disk arrays and
tape libraries) like - HPSS, CASTOR, DiskeXtender (UNITREE),
- Offer a unique virtual file system even if it
uses different storage techologies (array of
disks and tapes), hiding the details to the users
(providing an SRM interface) - Support basic file transfer protocols
- GridFTP mandatory (GSI enabled FTP)
- Others if available (https, ftp, etc)
- Support a native I/O (remote file) access
protocol - POSIX (like) I/O client library for direct access
of data
7SRM in an example
She is running a job which needs Data for
physics event reconstruction Simulated Data Some
data analysis files She will write files remotely
too
They are at CERN In dCache
They are at Fermilab In a disk array
They are at Nikhef in a classic SE
8SRM in an example
dCache Own system, own protocols and parameters
I talk to them on your behalf I will even
allocate space for your files And I will use
transfer protocols to send your files there
You as a user need to know all the systems!!!
classic SE Independent system from dCache or
Castor
SRM
Castor No connection with dCache or classic SE
9Storage Resource Management
- The SRM (Storage Resource Manager) is a protocol
for Storage Resource Management. - it does not do any data transfer.
- used to ask a Mass Storage System (MSS) to make a
file ready for transfer, or to create space in a
disk cache to which a file can be uploaded - The actual transfer is done using the file
transfer protocol supported by the backend MSS - Storage resource management needs to take into
account - Transparent access to files (migration to/from
disk pool) - File pinning
- Space reservation
- File status notification
- Life time management
- The SRM (Storage Resource Manager) is a single
interface that takes care of local storage
interaction and provides a Grid interface to the
outside world - In gLite, interactions with the SRM interface are
hidden by higher level tools (DM tools and APIs)
10gLite SE types
- gLite 3.0 data access protocols
- File Transfer GSIFTP (GridFTP)
- File I/O (Remote File access) gsidcap
- insecure RFIO
- secured RFIO (gsirfio)
- Classic SE (no official support anymore)
- GridFTP server
- Insecure RFIO daemon (rfiod) only LAN limited
file access - Single disk or disk array
- No quota management
- Does not support the SRM interface
11gLite SE types (II)
- Mass Storage Systems (Castor)
- Files migrated between front-end disk and
back-end tape storage hierarchies - GridFTP server
- Insecure RFIO (Castor)
- Provide a SRM interface with all the benefits
- Disk pool managers (dCache and LCG DPM)
- manage distributed storage servers in a
centralized way - Physical disks or arrays are combined into a
common (virtual) file system - Disks can be dynamically added to the pool
- GridFTP server
- Secure remote access protocols (gsidcap for
dCache, gsirfio for DPM) - SRM interface
12GridFTP
- Data transfer and access protocol for secure and
efficient data movement - Standardized in the Global Grid Forum
- extends the standard FTP protocol
- Public-key-based Grid Security Infrastructure
(GSI) or Kerberos support (both accessible via
GSS-API - Third-party control of data transfer
- Parallel data transfer
- Striped data transfer
- Partial file transfer
- Automatic negotiation of TCP buffer/window sizes
- Support for reliable and restartable data
transfer - Integrated instrumentation, for monitoring
ongoing transfer performance
13gLite Storage Element
14Files Naming conventions
- Logical File Name (LFN)
- An alias created by a user to refer to some item
of data, e.g. lfn/grid/gilda/tony/simple2.dat - Globally Unique Identifier (GUID)
- A non-human-readable unique identifier for an
item of data, e.g. - guid3a69a819-2023-4400-a2a1-f581ab942044
- Site URL (SURL)
- Gives indication on which place (Storage Element)
the file is actually found. - Understood by the SRM interface
- srm//aliserv6.ct.infn.it/dpm/ct.infn.it/home/gi
lda/generated/2006-07-10/filef7a916f7-159b-48df-91
59-877f2d3c6f58 - Transport URL (TURL)
- Temporary locator of a replicaaccess protocol
understood by the backend MSS - gsiftp//aliserv6.ct.infn.it/aliserv6.ct.infn.it
/gpfs/dpm/gilda/2006-07-10/filef7a916f7-159b-48df
-9159-877f2d3c6f58.46193.0
15SRM Interactions
Client
SRM
4
1
2
3
5
Storage
- The client asks the SRM for a file providing an
SURL (Site URL) - The SRM asks the storage system to provide the
file - The storage system notifies the availability of
the file and its location - The SRM returns a TURL (Transfer URL), i.e. the
location from where the file can be accessed - The client interacts with the storage using the
protocol specified in the TURL
16What is a file catalog
File Catalog
SE
SE
SE
17The LFC (LCG File Catalog)
- It keeps track of the location of copies
(replicas) of Grid files - LFN acts as main key in the database. It has
- Symbolic links to it (additional LFNs)
- Unique Identifier (GUID)
- System metadata
- Information on replicas
- One field of user metadata
18LFC Features
- Cursors for large queries
- Timeouts and retries from the client
- User exposed transactional API ( auto rollback
on failure) - Hierarchical namespace and namespace operations
(for LFNs) - Integrated GSI Authentication Authorization
- Access Control Lists (Unix Permissions and POSIX
ACLs) - Checksums
- Integration with VOMS (VirtualID and VirtualGID)
19LFC commands
Summary of the LFC Catalog commands
lfc-chmod Change access mode of the LFC file/directory
lfc-chown Change owner and group of the LFC file-directory
lfc-delcomment Delete the comment associated with the file/directory
lfc-getacl Get file/directory access control lists
lfc-ln Make a symbolic link to a file/directory
lfc-ls List file/directory entries in a directory
lfc-mkdir Create a directory
lfc-rename Rename a file/directory
lfc-rm Remove a file/directory
lfc-setacl Set file/directory access control lists
lfc-setcomment Add/replace a comment
20lfc-ls
- Listing the entries of a LFC directory
- lfc-ls -cdiLlRTu --class --comment
--deleted --display_side --ds path - where path specifies the LFN pathname (mandatory)
- Remember that LFC has a directory tree structure
- /grid/ltVO_namegt/ltyou create itgt
- All members of a VO have read-write permissions
under their directory - You can set LFC_HOME to use relative paths
- gt lfc-ls /grid/gilda/tony
- gt export LFC_HOME/grid/gilda
- gt lfc-ls -l tony
- gt lfc-ls -l -R /grid
LFC Namespace
Defined by the user
-l long listing -R list the contents of
directories recursively Dont use it!
21lfc-mkdir
- Creating directories in the LFC
- lfc-mkdir -m mode -p path...
- Where path specifies the LFC pathname
- Remember that while registering a new file (using
lcg-cr, for example) the corresponding
destination directory must be created in the
catalog beforehand. - Examples
- gt lfc-mkdir /grid/gilda/tony/demo
- You can just check the directory with
- gt lfc-ls -l /grid/gilda/tony
- drwxr-xrwx 0 19122 1077
0 Jun 14 1136 demo
22lfc-ln
- Creating a symbolic link
- lfc-ln -s file linkname
- lfc-ln -s directory linkname
- Create a link to the specified file or directory
with linkname - Examples
- gt lfc-ln -s /grid/gilda/tony/demo/test
/grid/gilda/tony/aLink - Lets check the link using lfc-ls with long
listing (-l) - gt lfc-ls -l
- lrwxrwxrwx 1 19122 1077 0 Jun 14 1158 aLink
-gt /grid/gilda/tony/demo/test - drwxr-xrwx 1 19122 1077 0 Jun 14 1139 demo
Original File
Symbolic link
23LFC C API
Low level methods (many POSIX-like)
lfc_setacl lfc_setatime lfc_setcomment lfc_seterrb
uf lfc_setfsize lfc_starttrans lfc_stat lfc_symlin
k lfc_umask lfc_undelete lfc_unlink lfc_utime send
2lfc
lfc_deleteclass lfc_delreplica lfc_endtrans lfc_en
terclass lfc_errmsg lfc_getacl lfc_getcomment lfc_
getcwd lfc_getpath lfc_lchown lfc_listclass lfc_li
stlinks
lfc_listreplica lfc_lstat lfc_mkdir lfc_modifyclas
s lfc_opendir lfc_queryclass lfc_readdir lfc_readl
ink lfc_rename lfc_rewind lfc_rmdir lfc_selectsrvr
lfc_access lfc_aborttrans lfc_addreplica lfc_apiin
it lfc_chclass lfc_chdir lfc_chmod lfc_chown lfc_c
losedir lfc_creat lfc_delcomment lfc_delete
24GFAL Grid File Access Library
- Interactions with SE require some components
- ? File catalog services to locate replicas
- ? SRM interfaces
- ? File access mechanism to access files from the
SE on the UI/WN - GFAL does all this tasks for you
- ? Hides all these operations
- ? Presents a POSIX interface for the I/O
operations - ? Single shared library in threaded and
unthreaded versions - libgfal.so, libgfal_pthr.so
- ? Single header file
- gfal_api.h
- ? User can create all commands needed for
storage management - ? It offers as well an interface to SRM
- Supported protocols
- ? file (local or nfs-like access)
- ? dcap, gsidcap and kdcap (dCache access)
- ? rfio (castor access) and gsirfio (dpm)
-
25GFAL File I/O API (I)
- int gfal_access (const char path, int amode)
- int gfal_chmod (const char path, mode_t mode)
- int gfal_close (int fd)
- int gfal_creat (const char filename, mode_t
mode) - off_t gfal_lseek (int fd, off_t offset, int
whence) - int gfal_open (const char filename, int flags,
mode_t mode) - ssize_t gfal_read (int fd, void buf, size_t
size) - int gfal_rename (const char old_name, const char
new_name) - ssize_t gfal_setfilchg (int, const void ,
size_t) - int gfal_stat (const char filename, struct stat
statbuf) - int gfal_unlink (const char filename)
- ssize_t gfal_write (int fd, const void buf,
size_t size)
26GFAL Catalog API
- int create_alias (const char guid, const char
lfn, long long size) - int guid_exists (const char guid)
- char guidforpfn (const char surl)
- char guidfromlfn (const char lfn)
- char lfnsforguid (const char guid)
- int register_alias (const char guid, const char
lfn) - int register_pfn (const char guid, const char
surl) - int setfilesize (const char surl, long long
size) - char surlfromguid (const char guid)
- char surlsfromguid (const char guid)
- int unregister_alias (const char guid, const
char lfn) - int unregister_pfn (const char guid, const char
surl)
27GFAL Storage API
- int deletesurl (const char surl)
- int getfilemd (const char surl, struct stat64
statbuf) - int set_xfer_done (const char surl, int reqid,
int fileid, char token, int oflag) - int set_xfer_running (const char surl, int
reqid, int fileid, char token) - char turlfromsurl (const char surl, char
protocols, int oflag, int reqid, int fileid,
char token) - int srm_get (int nbfiles, char surls, int
nbprotocols, char protocols, int reqid, char
token, struct srm_filestatus filestatuses) - int srm_getstatus (int nbfiles, char surls, int
reqid, char token, struct srm_filestatus
filestatuses)
28GFAL Java API
- GFAL API are available for C/C programmers
- Because of ISSGC06 exercise requirements, we
needed to have a Java version of them - We wrote a wrapper around the C APIs using Java
Native Interface and a the Java APIs on top of it - More information can be found here
- https//grid.ct.infn.it/twiki/bin/view/GILDA/APIG
FAL
29lcg-utils DM tools
- High level interface (CL tools and APIs) to
- Upload/download files to/from the Grid (UI,CE and
WN lt---gt SEs) - Replicate data between SEs and locate the best
replica available - Interact with the file catalog
- Definition A file is considered to be a Grid
File if it is both physically present in a SE and
registered in the File Catalog - lcg-utils ensure the consistency between files in
the Storage Elements and entries in the File
Catalog
30lcg-utils commands
lcg-cp Copies a grid file to a local destination
lcg-cr Copies a file to a SE and registers the file in the catalog
lcg-del Delete one file
lcg-rep Replication between SEs and registration of the replica
lcg-gt Gets the TURL for a given SURL and transfer protocol
lcg-sd Sets file status to Done for a given SURL in a SRM request
File Catalog Interaction
lcg-aa Add an alias in LFC for a given GUID
lcg-ra Remove an alias in LFC for a given GUID
lcg-rf Registers in LFC a file placed in a SE
lcg-uf Unregisters in LFC a file placed in a SE
lcg-la Lists the alias for a given SURL, GUID or LFN
lcg-lg Get the GUID for a given LFN or SURL
lcg-lr Lists the replicas for a given GUID, SURL or LFN
31LFC interfaces
SEs
LFC SERVER
LCG UTILS
GFAL
Python
LFC CLIENT C API
DLI
WMS
CLI lfc-ls, lfc-mkdir, lfc-setacl,
32LFC Interfaces (II)
- LFC client commands
- Provide administrative functionality
- Unix-like
- LFNs seen as a Unix filesystem (/grid/ltVOgt/ )
- LFC C API
- Alternative way to administer the catalog
- Python wrapper provided
- Integration with GFAL and lcg_util APIs complete
- ? lcg-utils access the catalog in a transparent
way - Integration with the WMS completed
- The RB can locate Grid files allows for data
based match-making - Using the Data Location Interface
33Data Management CLIs APIs
- lcg_utils lcg- commands lcg_ API calls
- Provide (all) the functionality needed by the
gLite user - Transparent interaction with file catalogs and
storage interfaces when needed - Abstraction from technology of specific
implementations - Grid File Access Library (GFAL) API
- Adds file I/O and explicit catalog interaction
functionality - Still provides the abstraction and transparency
of lcg_utils - edg-gridftp tools CLI
- Complete the lcg_utils with low level GridFTP
operations - Functionality available as API in GFAL
- May be generalized as lcg- commands
34Data Management Tools and APIs
- LFC C API and CLIs
- Administratotion tools for entries in the file
catalog - GFAL
- Interaction with files in SE and replicas in the
File Catalog - Lcg-utils API and CLIs
- High level wrapper of the GFAL APIs
35Data Movement (I)
- Many Grid applications will distribute a LOT of
data across the Grid sites - Need efficient and easy way to manage File
movement service - gLite File Transfer Service FTS
- Manage the network and the storage at both ends
- Define the concept of a CHANNEL a link between
two SEs - Channels can be managed by the channel
administrators, i.e. the people responsible for
the network link and storage systems - These are potentially different people for
different channels - Optimize channel bandwidth usage lots of
parameters that can be tuned by the administrator - VOs using the channel can apply their own
internal policies for queue ordering (i.e.
professors transfer jobs are more important than
students) - gLite File Placement Service
- It IS an FTS with the additional catalog lookup
and registration steps, i.e. LFNs and GUIDs can
be used to perform replication. Couldve been
called File Replication Service. (replica
managed/catalogued copy)
36Data Movement (II)
- File movement is asynchronous submit a job
- Held in file transfer queue
- Data scheduler
- Single service per VO can be distributed
- VO can apply policies (priorities, preferred
sites, recovery modes..) - Client interfaces
- Browser
- APIs
- Web service
- File transfer
- Uses SURL
- File placement
- Uses LFN or GUID, accesses Catalogues to resolve
them
37Data movement (II)
- File movement is asynchronous submit a job
- Held in file transfer queue
- FPS fetches job transfer requests, contact File
Catalogue obtaining source / destination SURLs - Task execution is demanded to FTS
- User can monitor job status through jobID
- FTS maintains state of job transfers
- When job is done, FPS updates file entry in the
catalogue adding the new replica
38Metadata on the Grid
- Metadata is data about data
- On the Grid mainly, information about files
- Describe files
- Locate files based on their contents
- They can also add details on running jobs
-
- But also simplified DB access on the Grid
- Many Grid applications need structured data
- Many applications require only simple schemas
- Can be modelled as metadata
- Main advantage better integration with the Grid
environment - Metadata Service is a Grid component
- Grid security
- Hide DB heterogeneity
- AMGA is the Metadata Component of gLite
39Example
- Suppose we have a set of movie trailers saved on
several storage elements
lfc-ls -l /grid/gilda/trailers -rw-rw-r-- 1
101 102 10188804 Apr 14 1721
BatmanBegins.mpg -rw-rw-r-- 1 109 102
3201028 Apr 14 1934 alien.mpg -rw-rw-r-- 1
101 102 3545092 Apr 14 1719
amelie.mpg -rw-rw-r-- 1 101 102
5277700 Apr 14 1727 american2.mpg -rw-rw-r-- 1
101 102 5828612 Apr 14 1728
fastfurious.mpg -rw-rw-r-- 1 192 102
20509586 Apr 20 1408 insideman.avi -rw-rw-r--
1 101 102 5912580 Apr 14 1731
madagascar.mpg -rw-rw-r-- 1 101 102
5812228 Apr 14 1730 matrix.mpg -rw-rw-r-- 1
192 102 12918756 Apr 20 1909
pinkpanther.mov -rw-rw-r-- 1 101 102
6240260 Apr 14 1730 spiderman.mpg
- We could add more details (Movie Title, Cast,
Runtime, PlotOutline, Genre, Director) on their
contents associating them Metadata. - We could then look for movies that satisfy some
desired search critiria (e.g. movies that are
commedies where our preferred actor perfomed or
are about animals and zoos)
40Metadata Concepts
- Basic Definitions
- Entries - List of items to which we want attach
metadata to - (ex each movie will rapresented as an entry in
AMGA) - Attribute key/value pair with type information
- Name/Key The name of the attribute
- (ex MovieTitle, Cast, PlotOutline, Runtime, )
- Type The type
- (ex varchar, int, float, text, numeric, )
- Value - Value of an entry's attribute
- (ex Spider Man 2, Tobey Maguire, Kirsten
Dunst, 127, ) - Metadata - List of attributes associated with
entries - Schema A set of attributes
- Collection A set of entries associated with a
schema - We can think of collections as DB tables, schema
as the list of fields (with their types),
attributes as columns, entries as rows
41AMGA Features
- Dynamic Schemas
- Schemas can be modified at runtime by client
- Create, delete schemas
- Add, remove attributes
- Metadata organised as an hierarchy
- Collections can contain sub-collections
- Analogy to file system
- Collection ? Directory Entry ? File
- Flexible Queries
- SQL-like query language
- Joins between schemas
- Example
selectattr /gLibraryFileName /gLAudioAuthor
/gLAudioAlbum '/gLibraryFILE/gLAudioFILE and
like(/gLibraryFileName, .mp3")
42Security
- Unix style permissions
- ACLs Per-collection or per-entry.
- Secure connections SSL
- Client Authentication based on
- Username/password
- General X509 certificates
- Grid-proxy and VOMS-proxy certificates
- Access control via a Virtual Organization
Management System (VOMS)
43AMGA Implementation
- C multiprocess server
- Runs on any Linux flavour
- Backends
- Oracle, MySQL, PostgreSQL, SQLite
- Two frontends
- TCP Streaming
- High performance
- Client API for C, Java, Python, Perl, Ruby
- SOAP
- Interoperability
- Also implemented as standalone Python library
- Data stored on filesystem
44AMGA Datatype
- AMGA Datatypes
- Using the above datatypes you are sure that your
metadata can be easily moved to all supported
back-ends - If you do not care about DB portability, you can
use, in principle, as entry attribute type ALL
the datatypes supported by the back-end, even the
more esoteric ones (PostgreSQL Network Address
type or Geometric ones) - We played a little bit with GIS Datatype offered
by MySQL 5
45Metadata Replication
- Motivation
- Scalability Support hundreds/thousands of
concurrent users - Geographical distribution Hide network latency
- Reliability No single point of failure
- DB Independent replication Heterogeneous DB
systems - Disconnected computing Off-line access
(laptops) - Architecture
- Asynchronous replication
- Master-slave Writes only allowed on the master
- Replication at the application level
- Replicate Metadata commands, not SQL ? DB
independence - Partial replication supports replication of
only sub-trees of the metadata hierarchy
46Metadata Replication
Some use cases
Partial replication
Full replication
Federation
Proxy
47 48gLibrary Use Case
- Attempts to create a Multimedia Management System
on the Grid - Examples of Multimedia Contents handled by
gLibrary - Images
- Movies
- Audio Files
- Office Documents (Powerpoint, Word, Excel,
OpenOffice) - E-Mails, PDFs, HTMLs
- Customized versions of well-know document type
(ex. EGEE PPTs) - .
- Keeps track and organizes in a uniform way all
the additional details (metadata) of files saved
in Storage Elements and registered in File
Catalogues - Provides users with an easy way to locate and
retrieve files based on their contents
49gLibrary JAVA GUI screenshot
Alpha Prototype
50gLibrary Deployment scenario
VOMS
VOMS Proxy w/Role Group
VOMS Proxy with Group Role Information
Authenticate with X509 Certificate
PostGreSQL
(gLibraryManager, gLibrarySubmitter, VO user)
AMGA Server
UI
VOMS Proxy
VOMS Proxy
SE
SE
SE
51gMOD grid Movie On Demand
- gMOD provides a Video-On-Demand service
- User chooses among a list of video and the chosen
one is streamed in real time to the video client
of the users workstation - For each movie a lot of details (Title, Runtime,
Country, Release Date, Genre, Director, Case,
Plot Outline) are stored and users can search a
particular movie querying on one or more
attributes - Two kind of users can interact with gMOD
TrailersManagers that can administer the db of
movies (uploading new ones and attaching metadata
to them) GILDA VO users (guest) can browse,
search and choose a movie to be streamed.
52gMOD interactions
VOMS
Storage Elements
GENIUS Portal
AMGA
get Role
User
Workload Management System
53gMOD screenshot
gMOD is accesible through the GENIUS Portal
(https//glite-tutor.ct.infn.it)
54Data movement introduction
- Grids are naturally distributed systems
- The means that data also needs to be distributed
- First generation data distribution mainly
concentrated on copy protocols in a grid
environment - gridftp
- http mod_gridsite
- File movement started and controlled on the
client side - But copies controlled by clients have problems
55Direct Client Controlled Data Movement
Control Channels
Client
Source Storage Element
Destination Storage Element
- Although transport protocol may be robust, state
is held inside client inconvenient and fragile. - Client only knows about local state, no sense of
global knowledge about data transfers between
storage elements. - Storage elements overwhelmed with replication
requests - Multiple replications of the same data can happen
simultaneously - Site has little control over balance of network
resources - DoS
Data Flow Channel
56Transfer Service
- Clear need for a service for data transfer
- Client connects to service to submit request
- Service maintains state about transfer
- Client can periodically reconnect to check status
or cancel request - Service can have knowledge of global state, not
just a single request - Load balancing
- Scheduling
- Submit new request
- Monitor progress
- Cancel request
Client
SOAP via https
Transfer Service
Control
Source Storage Element
Destination Storage Element
Data Flow
57Transfer Service Architecture
- Clients submit jobs via SOAP over https.
- Jobs are lists of URLs in srm// format. Some
transfer parameters can be specified (streams,
buffer sizes). - Clients cannot subscribe for status changes, but
can poll. - C command line clients. C, Java and Perl APIs
available. - Backend databases supported MySQL and Oracle.
- Web service runs in Tomcat5 container, agents
runs as normal daemons.
Client
Secure web service connection
Transfer Service
Storage Elements
Well defined state transitions/ checkpointing
Database
58gLite FTS Channels
- FTS Service has a concept of channels
- A channel is a unidirectional connection between
two sites - Transfer requests between these two sites are
assigned to that channel - Channels usually correspond to a dedicated
network pipe (e.g., OPN) associated with
production - But channels can also take wildcards
- to MY_SITE All incoming
- MY SITE to All outgoing
- to Catch all
- Channels control certain transfer properties
transfer concurrency, gridftp streams. - Channels can be controlled independently
started, stopped, drained.
59gLite FTS Agents
- Channel Agents
- Transfers on channel are managed by the channel
agent - Channel agents can perform inter-VO scheduling
- VO Agents
- Any job submitted to FTS is first handled by the
VO agent - VO agent authorises job and changes its state to
Pending - VO agents can perform other tasks naturally
these can be VO specific - Scheduling
- File catalog interaction
60FTS summary
- Efficient and easy way to manage File movement
service - gLite File Transfer Service FTS
- File movement is asynchronous submit a job
- Held in file transfer queue
- Task execution is demanded to FTS
- User can monitor job status through jobID
- Maintains state of job transfers
- Manage the network and the storage at both ends
- Define the concept of a CHANNEL a link between
two SEs - Channels can be managed by the channel
administrators, i.e. the people responsible for
the network link and storage systems - These are potentially different people for
different channels - Optimize channel bandwidth usage lots of
parameters that can be tuned by the administrator - VOs using the channel can apply their own
internal policies for queue ordering (i.e.
professors transfer jobs are more important than
students)
61FTS conclusion
- FTS offer an important and useful service on the
grid a significant advance on client managed
file transfers. - FTS channel architecture offers very useful
features to control transfers between sites or
into a single site, though it may become overly
complex in a grid without clear data flow
patterns. - The ability to control VO shares and transfer
parameters on a channel is important for sites. - FTS agent architecture allows VOs to connect the
transfer service closely with their own data
management stacks, a useful feature for HEP
experiments. - Neither service is completely mature at this
stage bugs were found on it, but this service
will continue to mature and develop, especially
in its relationship to higher level data
management components, and significant steps to
integrate with the file catalogs have already
been taken.
62Data Management Services Summary
- Storage Elements save data and provide a common
interface - Storage Resource Manager (SRM) Castor, dCache,
DPM, - Native Access protocols rfio, dcap, nfs,
- Transfer protocols gsiftp, ftp,
- Catalogs keep track where data are stored
- File Catalog
- Replica Catalog
- Metadata Catalog
- Data Movement schedules reliable file transfer
- File Transfer Service gLite FTS (manages
physical transfers)
LCG File Catalog (LFC)
AMGA Metadata Catalogue
63References
- gLite documentation homepage
- http//glite.web.cern.ch/glite/documentation/defau
lt.asp - DM subsystem documentation
- http//egee-jra1-dm.web.cern.ch/egee-jra1-dm/doc.h
tm - LFC and DPM documentation
- https//uimon.cern.ch/twiki/bin/view/LCG/DataManag
ementDocumentation - AMGA Project Homepage
- http//project-arda-dev.web.cern.ch/project-arda-d
ev/metadata/ - FTS user guide
- https//edms.cern.ch/file/591792/1/EGEE-TECH-59179
2-Transfer-CLI-v1.0.pdf
64Questions