Scientific Workflows - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Scientific Workflows

Description:

VDS Provenance Tracking Catalog. Executions on the TG last fall (Pasadena and USC) ... Automated data management and provenance tracking ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 16
Provided by: ewa49
Category:

less

Transcript and Presenter's Notes

Title: Scientific Workflows


1
Scientific Workflows
  • Ewa Deelman, Carl Kesselman, Gaurang Mehta, Karan
    Vahi
  • Yolanda Gil, Jihie Kim, Varun Ratnakar
  • USC Information Sciences Institute
  • Many contributions from Scott Callaghan, Edward
    Field, Hunter Francoeur, Robert Graves, Nitin
    Gupta, Vipin Gupta, Thomas H. Jordan, Philip
    Maechling, John Mehringer, David Okaya, Li Zhao

2
The Process of Creating an Executable Workflow
User guided
  • Creating a valid workflow template
  • Selecting application components and connecting
    inputs and outputs to specify data flow
  • Adding other steps for data conversions/transforma
    tions
  • Creating an instantiated workflow
  • Providing input data to pathway inputs (logical
    assignments)
  • Creating an executable workflow
  • Given requirements of each model, find and assign
    adequate resources for each model
  • Select physical locations for logical names
  • Include data movement steps and data
    registration steps

Automated
3
SCEC CyberShake
  • Calculate hazard curves by generating synthetic
    seismograms from estimated rupture forecast

Hazard Map
Strain Green Tensor
4
SCEC workflows on the TeraGrid
Executable workflow
Condor Glide-ins
VDS Provenance Tracking Catalog
Pegasus
Condor DAGMan
Globus
Workflow Instance Generator
5
Executions on the TG last fall (Pasadena and USC)
6
Sites done at HPCC, Spring 2006
7
Managing the Scale Through Workflow Partitioning



Nodes are mapped to resources according to labels
Nodes are labeled
8
Technical Contributions in Workflow Mapping and
Execution
  • Management of larger-scale computations
  • Automated data management and provenance tracking
  • Partitioning of workflows for increased
    scalability
  • Combining resource provisioning with computation
  • Dynamic deployment of SCEC-specific services

9
The Process of Creating an Executable Workflow
User guided
  • Creating a valid workflow template
  • Selecting application components and connecting
    inputs and outputs to specify data flow
  • Adding other steps for data conversions/transforma
    tions
  • Creating an instantiated workflow
  • Providing input data to pathway inputs (logical
    assignments)
  • Creating an executable workflow
  • Given requirements of each model, find and assign
    adequate resources for each model
  • Select physical locations for logical names
  • Include data movement steps, including
    data registration steps

Automated
10
WINGS/Pegasus Workflow Instance Generation and
Selection
Validate this workflow based on the component
specs
  • Workflow templates specify
  • complex analyses sequences
  • - Workflow instances specify data

WINGS
Show me workflows that generate hazard maps
Workflow Creation
Workflow Selection
Workflow Libraries
EXPERT SCIENTIST
Ontologies Domain terms, Component
types, Workflow Products
Workflow Template
  • Specifies data
  • requirements
  • Specifies execution
  • requirements

Application Components
SCIENTIST
(OWL)
Run that with the USGS data set
Data Selection
Data Repositories
Component Specification
- Preexisting data collections - Workflow
execution results
Workflow Instance
SCIENTIST RESEARCHING NEW MODELS
Here is a new wave propagation model, takes in a
series of fault ruptures, is compiled for MPI
Globus
Pegasus
Executable Workflow
11
Wings extensions required forCyberhake Workflows
FS-I
XYZinput
N1
FD_GRID_XYZ
CCS-Rup
XYZGRD
FCS-Var
FCS-V
FS-G
F1
F1
F1
RVM
rupvars
XYZGRD
NC3
Handle nested file collections
BoxNameCheck
L5
L5
L6
F1
CCS-Rup
F1
FileOfSGTNames
FCS-FSGTN-B
FCS-Var
L4
L7
FCS-V
CCS-SGT
F1
F1
F1
FS-S
FS-T
rupvar
FCS-SGTCol
FCS-V
F1
F1
F1
SGT
RVM
SeisParamValues
SiteName
RVM
NC1
SeismogramGen_Li
GenSeisMetadata
L8
L9
SeisMetadata
Handle many files and large number of workflow
instantiations (4626 instantiations of each
component)
seism
FCS-D
FCS-M
SeisMetadata
seism
NC2
PeakValCalc_Okaya
Handle filenames, Metadata no longer rely on
filenames
L10
SA
FCS-SA
12
Iterative workflow instantiation, mapping and
execution
13
Workflow Instance



XYZGRD



Boxnamecheck
Boxnamecheck
Boxnamecheck
SGT
SGT
SGT
SGT
127_7.txt.variation-s000-h000
127_6.txt.variation-s000-h000
SGT282
SGT161
SeisGen_Li
SeisGen_Li
SeisGen_Li
. . .
Seismograms_PAS_127_7.grm
Seismograms_PAS_127_6.grm
Seismograms_PAS_151_11.grm
PeakValCalc
PeakValCalc
PeakValCalc
PeakVals_allPAS_127_7.bsa
PeakVals_allPAS_151_11.bsa
PeakVals_allPAS_127_6.bsa
4,000 ruptures, gt100,000 variations for a site,
14
Technical Contributions Semantic Metadata
Approach to Creating Large Scientific Workflows
  • Semantic representations of workflow templates to
    express repetitive computational structures and
    collections
  • Expanding template to instances that orchestrate
    large amounts of computations reflecting the
    workflow template structure
  • Generating appropriate metadata descriptions for
    all the new data created during execution and
    full elaboration of workflow specs
  • Ensuring validity of workflow instance
    (BindValidate algorithm)
  • Keeping track of constraints on dataset used,
    including global constraints among multiple
    components as well as local constraints within
    individual components.
  • Mapping equivalent datasets, detecting
    pre-existing intermediate data, and prevent
    unnecessary execution of workflow parts when
    datasets already existAllows Pegasus to identify
    same data products

15
Publications
  • SCEC CyberShake Workflows - Automating
    Probabilistic Seismic Hazard Analysis
    Calculations, Philip Maechling, Ewa Deelman, Li
    Zhao, Robert Graves, Gaurang, Mehta, Nitin Gupta,
    John Mehringer, Carl Kesselman, Scott Callaghan,
    David Okaya, Hunter Francoeur, Vipin Gupta,
    Yifeng Cui, Karan Vahi, Thomas Jordan, Edward
    Field, in Workflows for e-Science, in press
  • Managing Large-Scale Workflow Execution from
    Resource Provisioning to Provenance tracking The
    CyberShake Example, Ewa Deelman, Scott
    Callaghan, Edward Field, Hunter Francoeur, Robert
    Graves, Nitin Gupta, Vipin Gupta, Thomas H.
    Jordan, Carl Kesselman, Philip Maechling, John
    Mehringer, Gaurang Mehta, David Okaya, Karan
    Vahi, Li Zhao (Under review)
  • Semantic Metadata Generation for Large
    Scientific Workflows , Jihie Kim, Yolanda Gil,
    and Varun Ratnakar (Under review.)
  • Wings for Pegasus A Semantic Approach to
    Creating Very Large Scientific Workflows,
    Yolanda Gil, Varun Ratnakar, Ewa Deelman, Marc
    Spraragen, and Jihie Kim. (Under review.)
Write a Comment
User Comments (0)
About PowerShow.com