Chimera Virtual Data System - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Chimera Virtual Data System

Description:

Chimera Virtual Data System. Persistent Archives. Chimera virtual ... profile hints.exec-pfn = '/usr/bin/app3'; argument = '-p '${pa}; argument = '-f '${a1} ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 24
Provided by: sgz
Category:

less

Transcript and Presenter's Notes

Title: Chimera Virtual Data System


1
  • Chimera Virtual Data System
  • Persistent Archives

2
Chimera virtual data system
  • Introduction of GriPhyN Project
  • What is
  • Four experiments
  • Chimera Virtual Data System
  • Chimera architecture

3
GriPhyN Project
  • What is GriPhyN(Grid Physics Network)
  • ????????????
  • ????????????????????
  • ??????????,???????
  • Include four physics experiments
  • CMS and ATLAS ??????
  • LIGO(Laser Interferometer Gravitational-wave
    Observatory) ???????????
  • SDSS(Sloan Digital Sky Survey)?????(???????????
    )????????

4
Chimera Virtual Data System
  • Virtual data language(XML and Textual)
  • Define VDC entities and queries
  • Virtual data language interpreter
  • Manipulate derivations and transformations
  • VDC(Virtual Data Catalog)
  • EntitiesTransformations, derivations, data

5
Chimera Architecture
Virtual Data
Applications
DAG(Directed Acyclic Graph)
Chimera
VDL(Virtual Data Language)
Data Grid Resources
(distributed execution and data management)
VDL Interpreter
SQL
VDC(Virtual Data Catalog)
6
Virtual Data Catalog Entities
  • Transformation
  • Is an executable program.
  • Similar to "function definition" in C
  • Derivation
  • Represents an execution of a transformation.
  • Similar to "function call" in C
  • Store past and future
  • Data object
  • Is a named entity that may be consumed or
    produced by a derivation.

7
Virtual Data Catalog Structure
8
Example Transformation
  • TR t1( out a2, in a1, none pa "500", none env
    "100000" )
  • profile hints.exec-pfn "/usr/bin/app3"
  • argument "-p "pa
  • argument "-f "a1
  • argument "-x y"
  • argument stdout a2
  • profile env.MAXMEM env

a1
t1
a2
9
Example Derivations
  • DV t1 (env"20000", pa"600",a2_at_outrun1.exp15
    .T1932.summary,a1_at_inrun1.exp15.T1932.raw,
  • )
  • DV t1 (a1_at_inrun1.exp16.T1918.raw,a2_at_out.ru
    n1.exp16.T1918.summary
  • )

10
Managing Dependencies
  • TR tr1( out a2, in a1 )
  • profile hints.exec-pfn "/usr/bin/app1" 
  • argument stdin a1 
  • argument stdout a2
  • TR tr2( out a2, in a1 )
  • profile hints.exec-pfn "/usr/bin/app2"
  • argument stdin a1
  • argument stdout a2
  • DV tr1( a2_at_outfile2, a1_at_infile1)
  • DV tr2( a2_at_outfile3, a1_at_infile2)

file1
tr1
file2
tr2
file3
11
SDSS cluster identification workflow
  • Define Five transformations(1--5).

12
DAG for cluster identification workflow
  • The Last derivation can invoke all the prior
    steps
  • At last produces the cluster catalog

13
Chimera Summary
  • Concept
  • Support management of transformations and
    derivations as community resources
  • Technology
  • Include virtual data catalog and language
  • use GriPhyN virtual data toolkit for automated
    data derivation
  • Results
  • Successful early used on CMS and SDSS data
    generation/analysis experiments
  • Future
  • Public release of prototype, new apps, knowledge
    representation, planning

14
Persistent Archives
  • ??
  • ????????,????????
  • ??
  • ???????????
  • PA vs Virtual Data Grid
  • ?????????????
  • ???????????????????
  • ???????????????
  • The Persistent Archive Research Group of the Grid
    Forum promotes the development of an architecture
    for the construction of persistent archives.

15
Persistent Archives requirements
  • Name transparency
  • Find a file by attributes (map from attributes to
    global name)
  • Location transparency
  • Access a file by a global identifier (map from
    global to local file name)
  • Access transparency
  • Use same API to access data in archive or file
    cache
  • Authenticity
  • Disaster recovery, replicate data across storage
    systems
  • Audit and process management

16
Preservation Infrastructure
Old Application
Old Operating System
Old Storage System
Old Display System
Digital Entity
17
Technology Management
New Application
New Operating System
Wrap Storage System
Wrap Display System
Old Storage System
Old Display System
Migrate Encoding Format
Digital Entity
18
Data, Information, and Knowledge Content of
Digital Entities
  • Data
  • Digital object
  • Objects are streams of bits
  • Information
  • Any tagged data, which is treated as an
    attribute.
  • Attributes may be tagged data within the digital
    object, or tagged data that is associated with
    the digital object
  • Knowledge
  • Relationships between attributes
  • Relationships can be procedural/temporal,
    structural/spatial, logical/semantic, functional

19
Preservation Approaches
  • Storage system abstraction
  • Logical name space and entity manipulation
  • Information repository abstraction
  • Logical schema and physical table structure
  • Knowledge repository abstraction
  • Topic maps and inference rules
  • Digital entity abstraction
  • Data model and encoding format

20
Archival Processes
  • ? Appraisal determine the archivable content
  • ? Accession - determine the initial physical
    location for the data, and the relationship of
    the new collection to existing collections
  • Arrangement - add administration control,
    describe the information content (provenance,
    authenticity, structure, administrative), and
    decompose digital objects into their components
    as needed.
  • Description - complete the definition of
    collection attributes by iterating between
    arrangement, reformatting, and representation.
  • Preservation build an archivable form of the
    digital entities, characterize the collection
    context , and manage their storage
  • ? Access provide query mechanisms for
    discovering, retrieving, and presenting the
    digital entities.

21
The use of some capability by the seven archival
process
22
Self-Instantiating Archive
  • Archive the processes that are used to control
    the ingestion process
  • When accessing the collection, retrieve the
    processes and the original digital objects
  • Apply the processing steps to re-create the
    information content
  • Query the result to discover desired digital
    objects
  • A self-instantiating archive is a virtual data
    grid

23
Persistent Archives Summary
  • Concept
  • ???????????.
  • Results
  • 29 core capabilities have been defined for the
    implementation of persistent archives from data
    grids
  • ????????
Write a Comment
User Comments (0)
About PowerShow.com