Title: SRB Tutorial NPACI All Hands Meeting 1999
1SRB TutorialNPACI All Hands Meeting 1999
2WWW
- Exchange of information
- specifically text, images and multi-media
- hyper-links to navigate through documents
- search engines for indexing
- Not friendly for exchange of meta-information -
yet - Not easy to integrate with computation
3DATA
- Data - any body of information that can be used
for computation and communication - Scientific Data
- data from experiments
- images (scans)
- genetic strings (DNA)
- simulation
4METADATA
- Meta Data - data that qualifies data
- information that captures the semantics of data
- date of experiment, reactants used, result
obtained, - useful for communication and computation
- Example
- Titanic
- Leanordo DiCaprio
- James Cameron
5Data Handling System
- Require knowledge of file name
- Distributed file systems
- Persistent object environments
- Require special interface for data access
- Database systems
- Local solution with well-knowm file name
- Data migration systems
6Data-Intensive Computing
- Support new modes of science
- Enable analysis of very large data sets
- Improve the ability to conduct science
- Build discipline specific data collections
- Build tools that decrease time needed to transfer
information - Automate information discovery
- Enable Information Based Computing
7Information Based Computing
- Enable information discovery from scientific
applications - Metadata Catalogs
- Enable data management and access to
heterogeneous, distributed data sources - Storage Resource Brokers
- Provide scalable systems, terabyte data access
from petabyte archives - Parallel I/o
8Common Middleware
- Distributed computing environment for remote
execution of procedures - Distributed data handling environment for access
to archives, databases and file systems - Inter-realm authentication system
- Distributed information discovery
- Collaboration environment
9Evolution of data Handling Environment
- Tightly couple database / archival storage
- Metadata catalog implemented to identify
resources - Separation of data identification from dataset
access - Separation of services from repositories to
improve interoperability - Integration with digital library technology
10What is SRB?
- SRB is an Intelligent Data Access System
- SRB provides federated access to datasets
- SRB provides protocol transparency to diverse and
distributed storage systems - SRB provides location transparency to distributed
datasets - SRB provides access transparency to remote user
11What is SRB?
- Extends File Systems
- Extends Database Systems
- Extends I/O protocol
- Extends WWW
- Extends Digital Library Systems
12SRB Concepts(1)
- Provide Scalability
- Hosts
- Resource Types
- Resources
- Collections
- Data Objects - size and number
- Users Groups
- Methods
- MetaData
13SRB Concepts(2)
- Provide Logical Abstractions
- srbSpace - an abstract storage space
- Resource Types - resource defined by properties
- Resources - resource identified by name and type
- multiple resources tied together as a single
resource - Collections - abstraction over directory
structure - distributed curated
- Datasets - identified by properties
- Users - authenticated across hosts/networks
- Domain - abstraction over physical domains
- Metadata Schema/Attributes
14SRB Concepts(3)
- Provide Uniform Interfaces
- Uniform API to Resources - archival, file and DB
- Uniform API to federated Resources
- Uniform Access to Collections Datasets
- Uniform Authentication across SRB space
15SRB Concepts(4)
- Replication of Datasets
- Access Control Lists
- Ticket-based Access
- Auditing
- Authentication and Encryption (SEA)
- Server-side proxy Operations
- Metadata-based Discovery
- Rich Interface - programmatic interactive
16What is MCAT?
- Cataloging System
- Metadata Repository
- Digital Object Metadata
- type, format, lineage, usage methods,
domain-specific attributes, collection info, etc - System-level Metadata
- access control, audit trails, location,
replication, resource types, user groups, etc - Schema-level Metadata
- ontology, relationships among attributes/schemas,
semantics of attributes, etc - Uniform Access and Federation interface
17The Storage Resource Broker is Middleware
MCAT
Application (SRB client)
SRB Server
DB2, Oracle, Illustra, ObjectStore
HPSS, UniTree
UNIX, ftp
18Software Architecture of the SRB
SRB Client
Resource
User
FILE SID DB SID
Object SID
SRB
MCAT
Dublin Core
Application Metadata
DB2
Oracle
Unix
ADSM
HPSS
19The SRB Process Model
Application
(Host, port)
SRB Master
(port)
SRB agents
MCAT
20Federated SRB Operation
Application
1
6
SRB server
SRB server
3
4
5
SRB agent
SRB agent
2
MCAT
21SRB Space
SRB
SRB
SRB
SRB
SRB
SRB
SRB
DL
DR - Data Repository DL - Dig Library MC - Meta
Catalog CP - Comp Process/ SRB Client
SRB
SRB
SRB
22SRB V1.0 Features
- Multi-platform (clients and servers)
- SunOS/Solaris, AIX, Cray C90, DEC OSF
- API and command line interfaces
- Low-level and high-level APIs
- Storage systems supported
- DB2, Illustra, Unitree, HPSS, UNIX
- Support for federated servers
- Released early September, 1997
23SRB V1.1 Features
- In beta in DOCT. To be released in January, 1998
- Ported to additional platforms - SGI, Cray T3E
- Incorporates the SDSC Encryption and
Authentication (SEA) Library - Ticket-based access control
- Graphical user interface - SRBTool
- Additional storage systems supported
- Oracle, Objectstore, ftp, http
- Oracle-based MCAT
- Support for proxy operations, e.g. move, copy,
replicate - Data replication using Logical Storage Resource
24New SRB Features
- Java-based SRB browser
- C API
- SRBIO - C library for redirecting stdio
- Proxy functions for meta data extraction
- System Monitor for remote auto-startup
- System Parameters stored in MCAT
25MCAT Metadata Catalog
- Stores metadata about
- Users, Data sets, Resources, Methods
- Provides collection abstraction
- Stores detailed access control information
- Maintains audit trail information on data sets
- Implemented as a relational database with
referential integrity constraints (currently uses
DB2, ported to Oracle)
26MCAT Architecture
MCAT Interface Functions
MAPS to Schema Convertor
Schema to MAPS Convertor
MAPS Initialization
MAPS Semantics
Answer Extractor Cursor Control
Dynamic Query Generator
Schema Initialization
Schema Semantics
Oracle Query System
DB2 Query System
27Federated Catalog Architecture
MAPS
MCAT
CATALOG
Semantics Definitions
Local Routines
Internal Catalogs
External CATALOG Interface
CATALOG
MAPS Interface
Local Interface
Local Interface
CAT-2
CAT-1
Semantics Definitions
Semantics Definitions
Local Routines
CATALOG
CATALOG
Local Routines
28New MCAT Features
- Meta-Schema to hold System and User meta data
schema information - Extensible meta data schema
- Distributed meta data schema
- Metadata exchange Interface Protocol
- MAPS- Metadata Attribute Presentation Structure
- query, update and result structures
- Close to Z39.50
29New MCAT Features (contd.)
- Core Schema Implemented
- MCAT Core - Data, Resources, Users and Methods
- Dublin Core
- IV Core - Image Visualization attributes
- Web-based Prototype User Interface
- extensible schema functions
- query,, insert and update of meta data
- integrated presentation of meta data and data
30SRB Data Replication Support
- Replication via Resource Set definition
- Replication support integrated into write
function - srbObjReplicate API can be used for post facto
replication - Synchronous replication across all sites. Can
choose any k out of n - Can choose specific replica on read operation
31Data Replication (DOCT)
Application SAIC
MCAT
SDSC
SRB
SRB
SRB
Caltech
NCSA
LogRsrc1
LogRsrc2
HPSS
HPSS
Oracle
DB2
Unix
32SEA(SDSC Encryption Authentication)
- Developed as part of DOCT
- Designed for Supercomputing/ MetaComputing
Environment - Based On RSA Public/Private Keys and RC5
Encryption Algorithm - Integrated into SRB
- Being integrated into pftp hsi - for
Remote HPSS Access
33SEA Features
- Secure User/Process Authentication Across Network
(TCP Sockets) - Optional Encryption As Independent Function
- Simple API
- Batch Support - Long-term Certificates
- Adjustable Key Lengths (Speed/Security Tradeoff)
- User-Adjustable Encryption Levels (Speed/Security
Tradeoff) - Multiple Initial User Registration Methods (Set
By Administrator) - Self-Introduction
- Trusted Host
- Password
- Available for Cray T90, C90, T3E, SunOS, Solaris,
IRIX, OSF1, AIX, CS6400, NextStep - More Information http//www.sdsc.edu/schroede/se
a.html
34Ticket-based Access Control
- Owner can request ticket for a data set
- Ticket can be issued for a data set or a
collection - Ticket controls access by
- time-period (start and expire timestamps)
- number of access (count)
- user names ( any, single or group users)
- Non-registered Users can also access using
tickets - Useful for sharing data and access through the
web - Tickets generated and stored in MCAT
- Currently supports read-only tickets
35SRB API
- Programmatic API
- High-level API
- Low-level API
- SRB Manager API
- Command Level Interface - Scommands
- Graphical User Interface - srbBrowser
- Web Utilities
36SRB API Interface
Application
MCAT
SRB Master
37High Low-level API
- Low-level API
- talks to resource drivers
- no registration of data sets in MCAT
- no authentication through MCAT
- User provides all information
- High-level API
- Uses low-level API to access resources
- Registers data management information in MCAT
- Uses MCAT for authentication and meta information
- Uses MCAT for resource and data discovery
- Access/store data in remote SRB
38Low-level API
- srbFileOpen(conn, storType, host, fileName, mode)
- srbFileCreate(conn, storType, host, fileName,
mode) - srbFileClose(conn, fd)
- srbFileUnlink(conn, storType, host, fileName)
- srbFileRead(conn, fd, buffer, length)
- srbFileWrite(conn, fd, buffer, length)
- srbFileSeek(conn, fd, offset, whence)
- srbFileSync(conn, fd)
- srbFileStat(conn, storType, host, fileName,
statBuf) - srbFileMkdir(conn, storType, host, dirName, mode)
- srbFileRmdir(conn, storType, host, dirName, mode)
- srbFileChmod(conn, storType, host, fileName, mode)
39Low-Level API (contd )
- srbDbLobjOpen(conn, storType, resourceLoc,
positionName, mode) - srbDbLobjCreate(conn, storType, resourceLoc,
positionName, mode) - srbDbLobjClose(conn, dd)
- srbDbLobjUnlink(conn, storType, host, fileName)
- srbDbLobjRead(conn, dd, buffer, length)
- srbDbLobjWrite(conn, dd, buffer, length)
- srbDbLobjSeek(conn, dd, offset, whence)
40High-level API
- srbObjOpen(conn, objChar, mode, collectionName)
- srbObjCreate(conn, objName, objType,
resourceName, collectionName,
pathName, size) - srbObjClose(conn, od)
- srbObjUnlink(conn, objChar, collectionName)
- srbObjRead(conn, od, buffer, length)
- srbObjWrite(conn, od, buffer, length)
- srbObjSeek(conn, od, offset, whence)
- srbObjMove(conn, objChar, collectionName,
newResourceName, newPathName) - srbObjReplicate(conn, objChar, collectionName,
newResourceName, newPathName) - srbObjProxyOpr(conn, Operation, sourceDesc,
targetDesc)
41High-Level API (contd )
- srbGetDatasetInfo(conn, objChar, collectionName,
resultStruct, requiredNumber) - srbGetMoreInfo(resDesc, resultStruct,
requiredNumber) - srbGetDataDirInfo(conn, conditionList,
selectList, resultStruct) - srbModifyDataset(conn, objId, collectionName,
newValue1, newValue2, modifyType, resourceName,
pathName) - srbCreateCollect(conn, parentCollectionName,
childCollectionName) - srbListCollect(conn, CollectionName, flag,
resultStruct) - srbModifyCollect(conn, CollectionName, newValue1,
newValue2, newValue2, modifyType) - srbModifyUser(conn, newValue1, newValue2,
modifyType) - srbSetAuditTrail(conn, setValue)
42System Manager API
- srbChkMdasAuth(conn, userName, userAuth, domain)
- srbChkMdasSysAuth(conn, userName, userAuth,
domain) - srbRegisterUser(conn, userName, domain, password,
userType, userAddress, userPhone, userEmail) - srbRegisterUserGrp(conn, userGrpName,
userGrpPassword, userGrpType,
userGrpAddress, userGrpPhone, userGrpEmail)
43srbBrowser - A SRB Graphical Interface
- A java GUI
- Interface with SRB servers using the client
API library. - Performs most SRB operations - cp, replicate,
import, export, metadata query, etc.
USER
Java GUI
Obtain users metadata information via SRB.
Invoke SRB operations
SRB Agent
MCAT
Proxy operation
44SRB Command Line Interface
Environment File
USER
SRB shell commands Sls, Scp, Scat, Sput, Sget,
...
MCAT
SRB Agent
Proxy operation
45Scommands
- Sinit - initialize S-environment
- Sexit - clean up
- Sman - get manpage for Scommand
- Scat - display srbObject on screen
- Sput - copy local file into srbSpace
- Sget - copy srbObject to local space
- Sappend - append to srbObject
- Srename - change srbObject name
- Srm - remove srbObject
- Schmod - change/grant access to srbObject
- Scd - change collection
- Spwd - display current collection
- Sls - list collection
- Smkdir - make new collection
- Srmdir - remove old collection
- SgetD - get srbObject information
- SgetR - get resource information
- SgetU - get user information
- SmodD - modify srbObject info
- SmodU - modify user info
- Stoken - get native type information
- Scopy - copy srbObject in another
collection and under another name - Sreplicate - clone object in new resource -
same internal id - Smove - move srbObject to new collection or
resource
46Scommands (contd )
- ingestUser - adding a new user or group
- ingestResource - adding a new resource
- ingestLogicalResource - making a new resource
grouping - addLogicalResource - adding to a resource
grouping - ingetLocation - adding new location
information - ingestToken - adding new native types
(eg. resourceType, objectType, userType,
domainName, ActionType, . . .)
47Scommands
- Sls
- Sls -h -L number -Y number -r-f
collection ... - Sls -L number -Y number srbObj
- Sput
- Sput -p -D dataType -R resourceName
-P pathName localFileName ...
TargetName - Sput -p -D dataType -R resourceName
-P pathName -i TargetName - Sget
- Sget -C_n -p srbObj ... localFile
- Sreplicate
- Sreplicate -Cn -p -R resourceName
-P pathName srbObj ...
48(No Transcript)
49SRBIO
50SRBIO
- Open
- creat
- read
- write
- close
- lseek
- fopen
- fread
- fwrite
- fclose
- fseek
- fflush
- fgetc
- fgets
- fputc
- fputs
- getc
- putc
- ungetc
- rewind
- vfprintf
- fprintf
- fscanf
51Web Utilities
- Sgetw - copies a SRBobject into server site
- Sputw - copies local file in SRBspace
- Scatw - displays SRBobject on browser
(handles types) - Slsw - displays information of SRBobjects
52SRB Case Studies
- Digital Libraries
- ELIB - Berkeley Digital Library (UCB)
- ADL - Alexandria Digital Library (UCSB)
- Ecological Archives Data Repository (UCSD)
- Environmental Archives
- International Satellite Cloud Climatology Project
Data - TIES Data Atlas - Chesapeake Bay Estuarine
Studies - DOCT - Patent Workflow System
53Digital Libraries
- Access to images, documents Tools
- Large Number of files -
- Images of various resolution
- Documents of various types (valences)
- Web-based access - form and spatial queries
- Domain Metadata - External DB
- Digital Objects replicated
- Uses SRB web interface and low-level API
54Digital Libraries and SRB
ADL
ELIB
SRB
55DOCT - Patent Workflow
- Archiving Applications and Office-Actions
- Replicated Archiving
- Storage of Issued Patents in multiple forms -
SGML, DB Schema, HTML - Access of Patents from replicated storage
- Controlled Access for Applications and
Office-Actions - Uses SRB web utilities and high-Level API
- URL http//www.sdsc.edu/DOCT
56I
Applicant
DOCT
Examiner
I
E
A Secure Electronic Filing B App As Filed
SRB/SEA C Replication SRB/SEA D Mailroom To
Workflow E User To Workflow F Search Other
Other G Workflow Agent FrameW H WorkF To
ArchSRB/SEA I Applicant Search Auth Web J Pub
Search Elec Commerce
Work Flow
G
Mail Room
D
A
Applicant
SRB Client
SRB Client
Search
B
G
H
C
SRB
SRB
SRB
SRB
I
C
Public Search
F
J
Public
Other Works
57ESADR
- Data Archive for Ecological Society
- Scientists can publish, update and control
access to data sets - Domain meta data kept in external database
(Oracle) - Web-based upload download mechanism
- FTP and Email support for very large data sets
- Spatial, temporal bibliographic search
- Uses SRB web-utilities and high-level API
- ESADR Homepage http//esadr.sdsc.edu
58ESADR Dataflow
Register
Login
Connect
Login Retrieve
SRB
59TIES
- Distributed Data Atlas for Cruise-Transects
- Data collected at Chesapeake Bay Estuarine
Studies - 26 transects, 3 times/year, 6 variables
- Reference atlas of 2-D color plots
- Domain metadata stored in Oracle
- Each Object registered in MCAT
- 3 partner sites - replication and staging in
hierarchical storage (Unix and HPSS) - Uses Scommands and srb web utilities
- URL http//www.sdsc.edu/marciano/DOCT/Atlas/doct
.html
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66Global Clouds Database
- Storing cloud information throughout the world
- Tabular data - made online through SRB
- 6,596 grid-cells over the globe
- 200 variables per grid-cell
- Data collected every 3 hours over 4 years (89-92)
- Small metadata - stored in Flat file
- Each cell dataset (4 yrs data) is stored in SRB
(HPSS) - Uses Scommands srb web utilities
- URL http//www.sdsc.edu/marciano/clouds/clouds.h
tm
67(No Transcript)
68(No Transcript)
69Summary
- Storing, Publishing, Sharing Cooperating
- Distributed, Replicated, Heterogeneous Data
Cache - Unifies access to Archival Storage, Database
Storage, Disk Storage - Information Discovery (application-level
metadata) - unifying meta-catalogs (future work)
- Secure, encrypted controlled access data
movement - integrate with other security systems (future
work) - No new environment requirement
70Future Work
- Integration with IBM digital library software
(e.g. federation of MCAT and DL metadata catalog) - Replication and partitioning of MCAT across wide
area - Integration with NWS
- Schema evolution, ie extending the extensiblity
feature to cover versioning concept - Parallel I/O
71Future Work
- Caching
- Incorporating concept of data set "resolution" in
the system and APIs - Streaming I/O and access to video and large
visualization data sets - Extending the IVCore concept to extract metadata
for other types of data sets
72The SRB/MCAT Team
- Design
- Reagan Moore, Chaitan Baru, Richard Frost,
Richard Marciano, Arcot Rajasekar, Wayne
Schroeder, Michael Wan - Implementation
- Michael Wan (SRB client/server, many drivers,
srbBrowser) - Arcot Rajasekar (MCAT, DB drivers, Scommands,
SRBIO) - Wayne Schroeder (SEAL, Illustra and ftp drivers)
- Mike Gleicher (HPSS driver)
- Rob Tempelton/Randy Sharpe (Oracle driver) --
NCSA - Dave Wade (Objectstore driver) -- SAIC
- Tom Hacker (Error management) -- U. Mich
- Marcio Faerman (NWS integrtion) -- UCSD
73Client Registration
- Get software at http//www.npaci.edu/DICE/SRB/tarf
iles /??? - Fill form at http//www.npaci.edu/DICE/SRB/install
/SRBUserRegister.html - SRB Admin will respond with your
- authorization password (should be changed
immediately using Spasswd command) - domain
- home collection
74Setting the Client Environs
- Set paths and environment variables
set path(path SRBDIR/utilities/bin)
set path(path SRBDIR/java/bin) setenv
CLASSPATH
/sdsc/local/generic/lib/java/swing/swing.jar
setenv THREADS_FLAG native
75Setting the Client Environs
- Two environment files
- .srb/.MdasEnv
mdasCollectionHome '/home/u26072.test
mdasDomainHome 'test srbUser
'u26072 srbHost
'ghidorah.sdsc.edu defaultResource 'unix-sdsc
SEA_OPT '0 - .srb/.MdasAuth
SRBTEST
76Hands On Demo
- srbBrowser
- Import a File into SRBspace
- Import a Directory as a Collection
- Display a SRBObject
- Create a Collection
- Remove SRBObjects
- Remove a Collection
- Copy a SRBObject
- Replicate a SRBObject
77Hands On Demo
- srbBrowser (contd.)
- Display Replicated Object
- Display Access Permission
- Enable Access for Another User
- Change Access for Another User
- Display Metadata Attributes
78Hands On Demo
- Scommands
- Sinit -v
- Sls
- Spwd
- Scat ltobject-namegt
- SgetU ltuser-namegt
- SgetR
- Sman Scommandgt
- Srm ltobject-namegt
- Sexit
- Scd ltcollection-namegt
- SgetD ltobject-namegt
- Sput ltfile-namegt ltobject-namegt