Title: Chimera Virtual Data System
1- Chimera Virtual Data System
- Persistent Archives
2Chimera virtual data system
- Introduction of GriPhyN Project
- What is
- Four experiments
- Chimera Virtual Data System
- Chimera architecture
-
3GriPhyN Project
- What is GriPhyN(Grid Physics Network)
- ????????????
- ????????????????????
- ??????????,???????
- Include four physics experiments
- CMS and ATLAS ??????
- LIGO(Laser Interferometer Gravitational-wave
Observatory) ??????????? - SDSS(Sloan Digital Sky Survey)?????(???????????
)????????
4Chimera Virtual Data System
- Virtual data language(XML and Textual)
- Define VDC entities and queries
- Virtual data language interpreter
- Manipulate derivations and transformations
- VDC(Virtual Data Catalog)
- EntitiesTransformations, derivations, data
5Chimera Architecture
Virtual Data
Applications
DAG(Directed Acyclic Graph)
Chimera
VDL(Virtual Data Language)
Data Grid Resources
(distributed execution and data management)
VDL Interpreter
SQL
VDC(Virtual Data Catalog)
6Virtual Data Catalog Entities
- Transformation
- Is an executable program.
- Similar to "function definition" in C
- Derivation
- Represents an execution of a transformation.
- Similar to "function call" in C
- Store past and future
- Data object
- Is a named entity that may be consumed or
produced by a derivation.
7 Virtual Data Catalog Structure
8Example Transformation
- TR t1( out a2, in a1, none pa "500", none env
"100000" ) -
- profile hints.exec-pfn "/usr/bin/app3"
- argument "-p "pa
- argument "-f "a1
- argument "-x y"
- argument stdout a2
- profile env.MAXMEM env
a1
t1
a2
9Example Derivations
- DV t1 (env"20000", pa"600",a2_at_outrun1.exp15
.T1932.summary,a1_at_inrun1.exp15.T1932.raw, - )
- DV t1 (a1_at_inrun1.exp16.T1918.raw,a2_at_out.ru
n1.exp16.T1918.summary - )
10Managing Dependencies
- TR tr1( out a2, in a1 )
- profile hints.exec-pfn "/usr/bin/app1"Â
- argument stdin a1Â
- argument stdout a2
-
- TR tr2( out a2, in a1 )
- profile hints.exec-pfn "/usr/bin/app2"
- argument stdin a1
- argument stdout a2
-
- DV tr1( a2_at_outfile2, a1_at_infile1)
- DV tr2( a2_at_outfile3, a1_at_infile2)
file1
tr1
file2
tr2
file3
11SDSS cluster identification workflow
- Define Five transformations(1--5).
12DAG for cluster identification workflow
- The Last derivation can invoke all the prior
steps - At last produces the cluster catalog
13Chimera Summary
- Concept
- Support management of transformations and
derivations as community resources - Technology
- Include virtual data catalog and language
- use GriPhyN virtual data toolkit for automated
data derivation - Results
- Successful early used on CMS and SDSS data
generation/analysis experiments - Future
- Public release of prototype, new apps, knowledge
representation, planning
14Persistent Archives
- ??
- ????????,????????
- ??
- ???????????
- PA vs Virtual Data Grid
- ?????????????
- ???????????????????
- ???????????????
- The Persistent Archive Research Group of the Grid
Forum promotes the development of an architecture
for the construction of persistent archives.
15Persistent Archives requirements
- Name transparency
- Find a file by attributes (map from attributes to
global name) - Location transparency
- Access a file by a global identifier (map from
global to local file name) - Access transparency
- Use same API to access data in archive or file
cache - Authenticity
- Disaster recovery, replicate data across storage
systems - Audit and process management
16Preservation Infrastructure
Old Application
Old Operating System
Old Storage System
Old Display System
Digital Entity
17Technology Management
New Application
New Operating System
Wrap Storage System
Wrap Display System
Old Storage System
Old Display System
Migrate Encoding Format
Digital Entity
18Data, Information, and Knowledge Content of
Digital Entities
- Data
- Digital object
- Objects are streams of bits
- Information
- Any tagged data, which is treated as an
attribute. - Attributes may be tagged data within the digital
object, or tagged data that is associated with
the digital object - Knowledge
- Relationships between attributes
- Relationships can be procedural/temporal,
structural/spatial, logical/semantic, functional
19Preservation Approaches
- Storage system abstraction
- Logical name space and entity manipulation
- Information repository abstraction
- Logical schema and physical table structure
- Knowledge repository abstraction
- Topic maps and inference rules
- Digital entity abstraction
- Data model and encoding format
20Archival Processes
- ? Appraisal determine the archivable content
- ? Accession - determine the initial physical
location for the data, and the relationship of
the new collection to existing collections - Arrangement - add administration control,
describe the information content (provenance,
authenticity, structure, administrative), and
decompose digital objects into their components
as needed. - Description - complete the definition of
collection attributes by iterating between
arrangement, reformatting, and representation. - Preservation build an archivable form of the
digital entities, characterize the collection
context , and manage their storage - ? Access provide query mechanisms for
discovering, retrieving, and presenting the
digital entities.
21The use of some capability by the seven archival
process
22Self-Instantiating Archive
- Archive the processes that are used to control
the ingestion process - When accessing the collection, retrieve the
processes and the original digital objects - Apply the processing steps to re-create the
information content - Query the result to discover desired digital
objects - A self-instantiating archive is a virtual data
grid
23Persistent Archives Summary
- Concept
- ???????????.
- Results
- 29 core capabilities have been defined for the
implementation of persistent archives from data
grids - ????????