The GriPhyN Virtual Data System - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

The GriPhyN Virtual Data System

Description:

The VDL definitions are stored in a Virtual Data Catalog ... is supported by the National Science Foundation under contract ITR-0086044 (GriPhyN) ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 2
Provided by: ianf197
Category:

less

Transcript and Presenter's Notes

Title: The GriPhyN Virtual Data System


1
The GriPhyN Virtual Data System
  • When we know exactly how to produce or re-create
    a data object, it becomes virtual the recipe for
    creating the object can in many contexts act as a
    virtual stand-in for the physical object. By
    tracking the recipes or provenance of that data
    of a collaboration, the Chimera Virtual Data
    System provides powerful data management
    capabilities for the Grid
  • the ability to audit data and know exactly how
    it was created
  • the ability to discover datasets of interest
    within massive data collections
  • the ability to specify a virtual workspace of
    data for selective instantiation at later times
  • the ability to re-create data that has been
    deleted, lost, or damaged
  • the ability to capture, exchange, and reason
    about the patterns of scientific and business
    workflows
  • How it works
  • A user or a user interface prepares a document
    in VDL the Virtual Data Language which
    describes the interface (inputs, output, and
    environment) of data transforming applications,
    and the arguments to call them with to produce
    specific data objects.
  • The VDL definitions are stored in a Virtual Data
    Catalog
  • Abstract workflows are produced by tools that
    traverse the VDL dependency graph and produce an
    abstract XML workflow
  • Executable workflows for grids running the
    GriPhyN Virtual Data Toolkit (VDT), are run
    through the DAGman, Condor-G, and GRAM services
  • This research is a collaboration of Ewa Deelman,
    Ian Foster, Carl Kesselman, Gaurang Mehta,
    Douglas Scheftner, Karan Vahi, Jens Voeckler,
    Mike Wilde, and Yong Zhao
  • For code downloads and more info
    www.griphyn.org/vds www.griphyn.org/vdt

    www.globus.org www.cs.wisc.edu/condor

Virtual Data Defines Workflow
Galaxy Cluster Analysis
Location-independent Workflow for OSG and TeraGrid
Manage workflow
On-demand data generation
Patch workflow following changes
Explain provenance, e.g. file8
psearch t 10 i file3 file4 file5 o
file8summarize t 10 i file6 o file7reformat
f fz i file2 o file3 file4 file5 conv l esd
o aod i file 2 o file6simulate t 10 o file1
file2
By Jim Annis, Steve Kent, Vijay Sehkri, Neha
Sharma Fermilab, Michael Milligan, Yong Zhao -
U of Chicago
Virtual Data Language
Transformation Similar to "function
definition Specifies formal
parameters Derivation Similar to "function
call Specifies actual parameters
Records how data products were generated
Recipe for re-generation Invocation Record of
a derivation execution
TR tr1(in a1, out a2) profile
hints.exec-pfn "/usr/bin/app1"  argument
stdin a1  argument stdout a2 TR
tr2(in a1, out a2) profile hints.exec-pfn
"/usr/bin/app2" argument stdin a1
argument stdout a2 DV
x1-gttr1(a1_at_infile1, a2_at_outfile2) DV
x2-gttr2(a1_at_infile2, a2_at_outfile3)
This research is supported by the National
Science Foundation under contract ITR-0086044
(GriPhyN). Data Grid research is supported by the
Mathematical, Information, and Computational
Sciences Division subprogram of the Office of
Advanced Scientific Computing Research, U.S.
Department of Energy, under Contract
W-31-109-Eng-38 (Data Grid Toolkit).
Write a Comment
User Comments (0)
About PowerShow.com