Title: CLRC e-Science Centre
1CLRC e-Science Centre
SRB Kerstin Kleese -van Dam k.kleese_at_dl.ac.uk
2Special thanks to George Kremenek -
kremenek_at_sdsc.edu Alasdair Earl -
aearl_at_ph.ed.ac.uk
3Contents
- Introduction
- Architecture description
- What is good
- What needs improving
- What can it be used for
4Introduction
- More and more information is available today, it
can be - Random Information (e.g. news items)
- Scientific Data
- Commercial or Administrative Data
- Data about Data (metadata describing the content
of the actual data) - The information is generally available via/from
- Web-sites, Filesystems, Databases, Tape Libraries
or on Paper and other none digital media.
5Introduction (2)
How do you find the information Search Engines,
Catalogue Systems or Hard Work (big bucket) How
do you evaluate the information Combine,
Compare, Present How do you manage the
information Preservation, Sharing, Replicating,
Transferring, Securing
6Where does SRB fit into this Scenario?
- SRB - the Storage Resource Broker can
- Integrate distributed, heterogeneous storage
devices - Make data access transparent for the user
- Helps to share, replicate, transfer and preserve
data - SRB can not
- Replace metadata catalogues
- Provide high level information services
7How does SRB fit into a Grid Environment?
SRB can used to Manage information required
internally by Portals Integrate data across
various media Integrate data across sites SRB
can be used For a particular site In a research
collaboration In a wider Grid community
8General Facts
- Storage Resource Broker - SRB
- Developed by the San Diego Supercomputing Centre
(SDSC) from the mid 1990s for the US
governments National Partnership for Advanced
Computational Infrastructure (NPACI). - Initial release 1997
- Latest version V1.1.8 - released February 2001
- In the US approximately 200TB of data are shared
via SRB between 30 participating Universities. - Used by the HPCPortal developed by Mary Thomas
group at SDSC.
9The SRB/MCAT Core Team
- SDSC Team
- Reagan Moore, Arcot Rajasekar, Michael Wan,
George Kremenek, Charlie Coward, Sheau Yen Chen,
Roman Olschanowski - SRB Expertise at SDSC
- Michael Wan (SRB client/server, drivers,
srbBrowser) - Arcot Rajasekar (MCAT, DB drivers)
- George Kremenek (SRB Client Modules, Security,
DAM, application design) - Charlie Coward Windows Servers and Browser
- Sheau Yen Chen administration
- Roman Olschanowski - testing
10What is SRB?
- SRB is an Intelligent Data Access System
- SRB provides protocol transparency to diverse and
distributed storage systems - SRB provides location transparency to distributed
datasets - SRB provides access transparency to remote user
- Extends File Systems
- Extends Database Systems
- Extends I/O protocol
11SRB Access
- SRB can be accessed in three ways
- High Level graphical Java interface - SRB
Browser - Application Programming interface - SRB API
(high and low level) - Unix shell Command Line Interface - SRB Scommands
12SRB Concepts(1)
- Provide Scalability (Hosts, Resource Types,
Resources, Collections, Data Objects - size and
number, Users Groups) - Provide Uniform Interfaces (to Resources,
Collections and Datasets, authentication across
SRB Space) - Replication of Datasets
- Access Control Lists
- Ticket-based Access
- Authentication and Encryption (text password,
encrypted password, SEA and GSI) - Server-side proxy Operations
- Metadata-based Discovery
13SRB Concepts(2)
- Provide Logical Abstractions
- srbSpace - an abstract storage space
- Resource Types - resource defined by properties
- Resources - resource identified by name and type
- multiple resources tied together as a single
resource - Collections - abstraction over directory
structure - distributed curated
- Datasets - identified by properties
- Users - authenticated across hosts/networks
- Domain - abstraction over physical domains
- Metadata Schema/Attributes
14What is MCAT?
- Cataloging System
- Metadata Repository
- Digital Object Metadata
- type, format, lineage, usage methods,
domain-specific attributes, collection info, etc - System-level Metadata
- access control, audit trails, location,
replication, resource types, user groups, etc - Schema-level Metadata
- ontology, relationships among attributes/schemas,
semantics of attributes, etc - Uniform Access and Federation interface
15Contents
- Introduction
- Architecture description
- What is good
- What needs improving
- What can it be used for
16SRB V1.x Features
- Multi-platform (clients and servers)
- SunOS/Solaris, AIX, Cray C90, SGI, OSX
- API and command line interfaces
- Low-level and high-level APIs
- Storage systems supported
- Oracle, DB2, Sybase, HPSS, UNIX FS, W2000/NT FS,
- Support for distributed servers, GSI
authentication, password encryption
17The Storage Resource Broker
18How does SRB work?
- The SRB Server spawns SRB Agent to authenticates
the User/Application (SRB Client) by comparing it
with information stored in MCAT - Find file location in MCAT
- Check user request against permissions stored in
MCAT - SRB Agent contacts user with the result of
his/her request - The SRB Agent communicates with the user through
a port specific to this client session, it can
handle one or more requests from the client.
19The SRB Process Model
Application
(Host, port)
SRB Master
(port)
SRB agents
MCAT
20How does SRB handle remote Data Access?
- Steps 1-3 are the same as in the simple case -
Spawn SRB Agent on local Machine Authenticate,
Check User Request, Locate File - SRB Agent contacts remote SRB Agent via SRB
Server on the remote Machine where the data is
stored - The second SRB Agent returns the pointer to the
data item to the first SRB Agent, which passes it
on to the user - The SRB Client can then interact with the data
item directly (as described before, however all
communication still runs via the first SRB Agent
and the Machine it is situated on
21Remote SRB Operation
Application
1
6
SRB server
SRB server
3
4
5
SRB agent
SRB agent
2
MCAT
22SRB Space
- The SRB Space consists of
- A number of SRB Servers (possibly across
multiple sites) - Many heterogeneous Storage Resources linked to
SRB Servers via SRB Media Drivers - One MCAT System
- Many Users
- The SRB Space provides a single view on all the
data within the Space.
23SRB Space
24MCAT Metadata Catalog
- Stores metadata about
- Users, Data sets, Resources, Methods
- Provides collection abstraction
- Stores detailed access control information
- Maintains audit trail information on data sets
- Implemented as a relational database with
referential integrity constraints (currently uses
Oracle, DB2 , Sybase)
25MCAT Architecture
26Federated Catalog Architecture
MAPS
MCAT
CATALOG
Semantics Definitions
Local Routines
Internal Catalogs
External CATALOG Interface
CATALOG
MAPS Interface
Local Interface
Local Interface
CAT-2
CAT-1
Semantics Definitions
Semantics Definitions
Local Routines
CATALOG
CATALOG
Local Routines
27New MCAT Features
- Meta-Schema to hold System and User meta data
schema information - Extensible meta data schema
- Distributed meta data schema
- Metadata exchange Interface Protocol
- MAPS- Metadata Attribute Presentation Structure
- query, update and result structures
- Close to Z39.50
28New MCAT Features (contd.)
- Core Schema Implemented
- MCAT Core - Data, Resources, Users and Methods
- Dublin Core
- IV Core - Image Visualization attributes
- Web-based Prototype User Interface
- extensible schema functions
- query,, insert and update of meta data
- integrated presentation of meta data and data
29SRB Data Replication Support
- Replication via Resource Set definition
- Replication support integrated into write
function - srbObjReplicate API can be used for post facto
replication - Synchronous replication across all sites. Can
choose any k out of n - Can choose specific replica on read operation
30Data Replication Example
Application SAIC
MCAT
SDSC
SRB
SRB
SRB
Caltech
NCSA
LogRsrc1
LogRsrc2
HPSS
HPSS
Oracle
DB2
Unix
31Ticket-based Access Control
- Owner can request ticket for a data set
- Ticket can be issued for a data set or a
collection - Ticket controls access by
- time-period (start and expire timestamps)
- number of access (count)
- user names ( any, single or group users)
- Non-registered Users can also access using
tickets - Useful for sharing data and access through the
web - Tickets generated and stored in MCAT
- Currently supports read-only tickets
32SRB API
- Programmatic API
- High-level API
- Low-level API
- SRB Manager API
- Command Level Interface - Scommands
- Graphical User Interface - srbBrowser
- Web Utilities
33SRB API Interface
Application
MCAT
SRB Master
34High Low-level API
- Low-level API
- talks to resource drivers
- no registration of data sets in MCAT
- no authentication through MCAT
- User provides all information
- High-level API
- Uses low-level API to access resources
- Registers data management information in MCAT
- Uses MCAT for authentication and meta information
- Uses MCAT for resource and data discovery
- Access/store data in remote SRB
35System Manager API
- srbChkMdasAuth(conn, userName, userAuth, domain)
- srbChkMdasSysAuth(conn, userName, userAuth,
domain) - srbRegisterUser(conn, userName, domain, password,
userType, userAddress, userPhone, userEmail) - srbRegisterUserGrp(conn, userGrpName,
userGrpPassword, userGrpType,
userGrpAddress, userGrpPhone, userGrpEmail)
36srbBrowser - A SRB Graphical Interface
- A java GUI
- Interface with SRB servers using the client
API library. - Performs most SRB operations - cp, replicate,
import, export, metadata query, etc.
USER
Windows or Java GUI
Obtain users metadata information via SRB.
Invoke SRB operations
SRB Agent
MCAT
Proxy operation
37SRB Command Line Interface
Environment File
USER
SRB shell commands Sls, Scp, Scat, Sput, Sget,
...
MCAT
SRB Agent
Proxy operation
38Scommands
- Sinit - initialize S-environment
- Sexit - clean up
- Sman - get manpage for Scommand
- Scat - display srbObject on screen
- Sput - copy local file into srbSpace
- Sget - copy srbObject to local space
- Sappend - append to srbObject
- Srename - change srbObject name
- Srm - remove srbObject
- Schmod - change/grant access to srbObject
- Scd - change collection
- Spwd - display current collection
- Sls - list collection
- Smkdir - make new collection
- Srmdir - remove old collection
- SgetD - get srbObject information
- SgetR - get resource information
- SgetU - get user information
- SmodD - modify srbObject info
- SmodU - modify user info
- Stoken - get native type information
- Scopy - copy srbObject in another
collection and under another name - Sreplicate - clone object in new resource -
same internal id - Smove - move srbObject to new collection or
resource
39Scommands (contd )
- ingestUser - adding a new user or group
- ingestResource - adding a new resource
- ingestLogicalResource - making a new resource
grouping - addLogicalResource - adding to a resource
grouping - ingetLocation - adding new location
information - ingestToken - adding new native types
(eg. resourceType, objectType, userType,
domainName, ActionType, . . .)
40Scommands
- Sls
- Sls -h -L number -Y number -r-f
collection ... - Sls -L number -Y number srbObj
- Sput
- Sput -p -D dataType -R resourceName
-P pathName localFileName ...
TargetName - Sput -p -D dataType -R resourceName
-P pathName -i TargetName - Sget
- Sget -C_n -p srbObj ... localFile
- Sreplicate
- Sreplicate -Cn -p -R resourceName
-P pathName srbObj ...
41SRBIO
- Open
- creat
- read
- write
- close
- lseek
- fopen
- fread
- fwrite
- fclose
- fseek
- fflush
- fgetc
- fgets
- fputc
- fputs
- getc
- putc
- ungetc
- rewind
- vfprintf
- fprintf
- fscanf
42Contents
- Introduction
- Architecture description
- What is good
- What needs improving
- What can it be used for
43Useful features
- Easy interfaces to access data held in SRB
- Transparent access independent of location or
type - Support for replication of data
- Support for logical structuring of data
- Database support to locate data
- Ticket system
- Enhanced access right structure
- Modular SRB Media Drivers
- Useful to users and system administrators
44Contents
- Introduction
- Architecture description
- What is good
- What needs improving
- What can it be used for
45Current Obstacles
- Only one MCAT catalogue - single point of
failure, performance, ownership - All MCAT metadata is visible to everyone
- Data Access at remote sites - two many interim
steps - Documentation not up-to-date
- Installation not straight forward - patches
needed, dependent on other software - Licence required
46Contents
- Introduction
- Architecture description
- What is good
- What needs improving
- What can it be used for
47Grid Applications within CLRC
- Various Portals to access experimental, data and
computing facilities within CLRC and outside. - Issues
- Data held widely distributed across the site and
in community owned facilities - Data required where it is not stored
- Data located through service that is not local
to data holding
48Planned Structure of CLRC - Services
Problem Solving Environments
CLRC Authentication
Computing Applications
Experimental Facilities
HPCPortal
Remote systems
Local systems
49Integrated Solution for Earth Science
Data Storage
DataPortal
RasDaMan
Disk
Tape
BADC Catalogue
SRB
HPCPortal
50General CLRC DataPortal Architecture
CLRC DataPortal Server
XML wrapper
XML wrapper
Local metadata
Common metadata catalogue database
Local data
Facility 1
51Server Architecture
USER
Key
User input interpreter
User output generator
Internal
http
pre-set XSL Script
Query Generator
module
Response Generator
XML Schema
XML Parser
External agent
Central metadata repository
Wrapper for other Catalogues
XML File
XML File
Ascii file
52 Architecture for integrating existing Catalogues
DataPortal Server
Request file(s)
XML Wrapper
Response Generator
SQL input translator
XML output generator
Local Metadata Catalogue
RasDaMan
SRB
External agent
53Local Integration of SRB
54Remote Integration of SRB
User
Locating Data
DataPortal
HPCPortal
Job submission
RasDaMan
BADC
CSAR
EPCC
SRB Server
MCAT
SRB Server
MCAT
SRB Agent
SRB Agent
SRB Agent
SRB Agent
SRB Agent
SRB Agent
Data location
Data itself
55Conclusions
SRB is a useful tool in the GRID context It has
many plus points But there is still a lot to
do There is nothing comparable out there!
56Where can you get more information?
For a SRB license send mail to K.kleese_at_dl.ac.uk
For general information see the UK Grid Support
Centre http//www.grid-support.ac.uk/ For
specific questions register with the
Centre http//www.grid-support.ac.uk/form.html Fo
r information on e-science research within CLRC
see the CLRC e-Science Centre http//www.e-scienc
e.clrc.ac.uk/