Title: Storage Resource Broker Persistent Management of Distributed Data
1Storage Resource Broker Persistent
Management of Distributed Data
Reagan W. Moore General Atomics, Inc. San Diego
Supercomputer Center moore_at_sdsc.edu http//www.nir
vanastorage.com
2Topics
- Data management systems
- Data collections, digital libraries
- Distributed data management
- Data grids
- Persistent data management
- Persistent archives
- Common infrastructure for data management
3Data Collections
- Astronomy
- CACR Computing Resource (NPACI)
- National Virtual Observatory (NSF)
- 2 Micron All Sky Survey (NPACI)
- DPOSS Collection (NSF-NVO)
- Hayden Planetarium
- Ecology and Environmental Sciences
- CEED (NPACI)
- Bionome
- HyperLTER (NPACI)
- Land Data Assimilation System
- Knowledge Networks for BioComplexity (NSF)
- Medical Sciences
- Digital Embryo (NLM)
- Molecular Sciences
- JCSG, Synchrotron Data Repository (NSF)
- AFCS, Alliance for Cell Signaling (NIH)
- NeuroSciences
- Biomedical Information Research Network (NIH)
4Data Collections
- Physics and Chemistry
- PPDG, Particle Physics Data Grid (DOE)
- GriPhyN (NSF)
- BaBar (DOE)
- GAMESS (NPACI)
- Digital Libraries and Archives
- SIO Digital Libraries (NSF)
- California Digital Library
- ADEPT (NSF)
- Stanford Digital Library Project (NSF)
- National Archives and Records Administration
(NARA) - Data Grids
- ROADNet, Real-time Observatories App.and Data
management - E-Science at CLRC, UK Grid Starter Kit (UK)
- Library of Congress data grid
- DOE ASCI Data Visualization Corridor
- NASA Information Power Grid
- DOE SciDAC - Portal Web Services
- NPACI Portal Projects
5Data Collections
- Define the context for describing a collection of
digital entities - Context specified by metadata attributes
- Provenance, origin of the digital entities
- Administrative, location of the digital entities
- Technical, purpose of the digital entities
- Support organization of attributes as hierarchy
of sub-collections
6Digital Libraries
- Provide services on the data collection
- Ingestion, loading of attribute values
- Extensibility, definition of new attributes
- Discovery, queries on attributes
- Browsing, hierarchical listing
- Presentation, formatting specified data models
7Data Grids
- Manage data in a distributed environment
- Logical name space, provide global identifier
- Data access, storage system abstraction
- Replication, disaster back up
- Uniform access, common API across file systems,
archives, and databases - Single sign-on, authenticate across
administration domains
8Persistent Archives
- Manage technology evolution
- Storage system abstraction, support data
migration across storage systems - Information repository abstraction, support
catalog migration to new databases - Logical name space, support global persistent
identifier
9SRB
- Integration of collection-based management of
digital entities, with - Remote data access through storage system
abstraction - Catalog access through information repository
abstraction - Automation through collection-owned data
Storage Resource Broker
10Capabilities
- Support legacy systems
- Integrate archives with file systems
- Share distributed data
- Maintain persistent collection
- Control data access
11Digital Entities
- Digital entities are images of reality made of
- Data, the bits (zeros and ones) put on a storage
system - Information, the attributes used to assign
semantic meaning to the data - Knowledge, the structural relationships described
by a data model - Every digital entity requires information and
knowledge to correctly interpret and display
12Digital Entities
- Files
- Text documents, images, spread sheets, binary
files - URLs
- Database query commands
- Databases
- Directories
13Digital Entities
- Register digital entities into a catalog
- Assign metadata to describe each digital entity
- Separate management of the associated data bits
from management of the metadata - Support manipulation of each digital entity data
type
14Technology Management
New Application
New Operating System
Wrap Storage System
Wrap Display System
Old Storage System
Old Display System
Migrate Encoding Format
Digital Object
15Preservation of Data
- Migration
- Preserve the data bits
- Preserve the digital entity name
- Preserve the information and knowledge content
for presentation by new applications
16Migration Advantages
- By migrating the digital entity encoding format
to new standards, more sophisticated technologies
can be applied to express the information and
knowledge content inherent in collections of
digital entities. - Requires the ability to associate data model with
digital entity
17Uniform API
- Provide common access semantics
- Map from the interface preferred by your
application to the interfaces required by legacy
storage systems
18SRB and MCAT
Uniform APIs
Application
Linux I/O
Web WSDL
DLL / Python
Java, NT Browsers
GridFTP
Access APIs
Consistency Management / Authorization-Authenticat
ion
Prime Server
Logical Name Space
Latency Management
Data Transport
Metadata Transport
Storage Abstraction
Catalog Abstraction
Databases DB2, Oracle, Sybase
Servers
HRM
19Discovery Transparencies
- Naming transparency - find a data set without
knowing its name - Map from attributes to a global file name
- Location transparency - access a data set without
knowing where it is - Map from global file name to local file name
- Access transparency - access a data set without
knowing the type of storage system - Federated client-server architecture
20SRB and MCAT
Transparencies
Application
Linux I/O
Web WSDL
Access APIs
DLL / Python
Java, NT Browsers
GridFTP
Consistency Management / Authorization-Authenticat
ion
Prime Server
Logical Name Space
Latency Management
Data Transport
Metadata Transport
Storage Abstraction
Catalog Abstraction
Databases DB2, Oracle, Sybase
HRM
Servers
21Persistent Collection
- Maintain authenticity
- Authenticate all accesses
- Assign roles for access control lists (curation,
write, annotate, read) - Manage audit trails of all operations
- Collection-owned data
- All accesses through the data management system
22SRB and MCAT
Persistency
Access APIs
Prime Server
Servers
23Preservation
- Name transparency
- Find a file by attributes (map from attributes to
global name) - Location transparency
- Access a file by a global identifier (map from
global to local file name) - Access transparency
- Use same API to access data in archive or file
cache - Authenticity
- Disaster recovery, replicate data across storage
systems - Audit and process management
(Similar requirements to a data grid)
24SRB MCAT
Preservation
Application
Linux I/O
Web WSDL
DLL / Python
Access APIs
Java, NT Browsers
GridFTP
Consistency Management / Authorization-Authenticat
ion
Prime Server
Logical Name Space
Latency Management
Data Transport
Metadata Transport
Storage Abstraction
Catalog Abstraction
Databases DB2, Oracle, Sybase
HRM
Servers
25Technology Convergence
- Data grids as basis for distributed data
management - Federation of distributed resources
- Creation of logical name space to automate
discovery - Distributed data collections
- Discovery based on attributes
- Distributed data storage systems
- Digital libraries
- Development of services for manipulating, viewing
data - Persistent archives
- Management of technology evolution
26Data Naming Ontologies
27Knowledge Creation
- Knowledge syntax
- (consensus)
- RDF, XMI, Topic Map
- Knowledge management
- (recursive operations)
- Oracle parallel database
- Knowledge manipulation
- (spatial/procedural rules)
- Generation of inference rules and mapping to data
models - Knowledge generation
- (scalable inference engine)
- Application of inference rules in inference
engine
28Knowledge-based Data Grid
Ingest Services
Management
Access Services
Knowledge or Topic-Based Query / Browse
Knowledge Repository for Rules
Relationships Between Concepts
Knowledge
XTM DTD
Rules - KQL
(Model-based Access)
Information Repository
Attribute- based Query
Attributes Semantics
SDLIP
Information
XML DTD
(Data Handling System - SRB)
Data
Fields Containers Folders
Storage (Replicas, Persistent IDs)
MCAT/HDF
Grids
Feature-based Query
29 Reagan W. Moore General Atomics San Diego
Supercomputer Center moore_at_sdsc.edu http//www.nir
vanastorage.com