Title: Scientific Workflows
1Scientific Workflows
- Ewa Deelman, Carl Kesselman, Gaurang Mehta, Karan
Vahi - Yolanda Gil, Jihie Kim, Varun Ratnakar
- USC Information Sciences Institute
- Many contributions from Scott Callaghan, Edward
Field, Hunter Francoeur, Robert Graves, Nitin
Gupta, Vipin Gupta, Thomas H. Jordan, Philip
Maechling, John Mehringer, David Okaya, Li Zhao
2The Process of Creating an Executable Workflow
User guided
- Creating a valid workflow template
- Selecting application components and connecting
inputs and outputs to specify data flow - Adding other steps for data conversions/transforma
tions - Creating an instantiated workflow
- Providing input data to pathway inputs (logical
assignments) - Creating an executable workflow
- Given requirements of each model, find and assign
adequate resources for each model - Select physical locations for logical names
- Include data movement steps and data
registration steps
Automated
3SCEC CyberShake
- Calculate hazard curves by generating synthetic
seismograms from estimated rupture forecast
Hazard Map
Strain Green Tensor
4SCEC workflows on the TeraGrid
Executable workflow
Condor Glide-ins
VDS Provenance Tracking Catalog
Pegasus
Condor DAGMan
Globus
Workflow Instance Generator
5Executions on the TG last fall (Pasadena and USC)
6Sites done at HPCC, Spring 2006
7Managing the Scale Through Workflow Partitioning
Nodes are mapped to resources according to labels
Nodes are labeled
8Technical Contributions in Workflow Mapping and
Execution
- Management of larger-scale computations
- Automated data management and provenance tracking
- Partitioning of workflows for increased
scalability - Combining resource provisioning with computation
- Dynamic deployment of SCEC-specific services
9The Process of Creating an Executable Workflow
User guided
- Creating a valid workflow template
- Selecting application components and connecting
inputs and outputs to specify data flow - Adding other steps for data conversions/transforma
tions - Creating an instantiated workflow
- Providing input data to pathway inputs (logical
assignments) - Creating an executable workflow
- Given requirements of each model, find and assign
adequate resources for each model - Select physical locations for logical names
- Include data movement steps, including
data registration steps
Automated
10WINGS/Pegasus Workflow Instance Generation and
Selection
Validate this workflow based on the component
specs
- Workflow templates specify
- complex analyses sequences
- - Workflow instances specify data
WINGS
Show me workflows that generate hazard maps
Workflow Creation
Workflow Selection
Workflow Libraries
EXPERT SCIENTIST
Ontologies Domain terms, Component
types, Workflow Products
Workflow Template
- Specifies data
- requirements
- Specifies execution
- requirements
Application Components
SCIENTIST
(OWL)
Run that with the USGS data set
Data Selection
Data Repositories
Component Specification
- Preexisting data collections - Workflow
execution results
Workflow Instance
SCIENTIST RESEARCHING NEW MODELS
Here is a new wave propagation model, takes in a
series of fault ruptures, is compiled for MPI
Globus
Pegasus
Executable Workflow
11Wings extensions required forCyberhake Workflows
FS-I
XYZinput
N1
FD_GRID_XYZ
CCS-Rup
XYZGRD
FCS-Var
FCS-V
FS-G
F1
F1
F1
RVM
rupvars
XYZGRD
NC3
Handle nested file collections
BoxNameCheck
L5
L5
L6
F1
CCS-Rup
F1
FileOfSGTNames
FCS-FSGTN-B
FCS-Var
L4
L7
FCS-V
CCS-SGT
F1
F1
F1
FS-S
FS-T
rupvar
FCS-SGTCol
FCS-V
F1
F1
F1
SGT
RVM
SeisParamValues
SiteName
RVM
NC1
SeismogramGen_Li
GenSeisMetadata
L8
L9
SeisMetadata
Handle many files and large number of workflow
instantiations (4626 instantiations of each
component)
seism
FCS-D
FCS-M
SeisMetadata
seism
NC2
PeakValCalc_Okaya
Handle filenames, Metadata no longer rely on
filenames
L10
SA
FCS-SA
12Iterative workflow instantiation, mapping and
execution
13 Workflow Instance
XYZGRD
Boxnamecheck
Boxnamecheck
Boxnamecheck
SGT
SGT
SGT
SGT
127_7.txt.variation-s000-h000
127_6.txt.variation-s000-h000
SGT282
SGT161
SeisGen_Li
SeisGen_Li
SeisGen_Li
. . .
Seismograms_PAS_127_7.grm
Seismograms_PAS_127_6.grm
Seismograms_PAS_151_11.grm
PeakValCalc
PeakValCalc
PeakValCalc
PeakVals_allPAS_127_7.bsa
PeakVals_allPAS_151_11.bsa
PeakVals_allPAS_127_6.bsa
4,000 ruptures, gt100,000 variations for a site,
14Technical Contributions Semantic Metadata
Approach to Creating Large Scientific Workflows
- Semantic representations of workflow templates to
express repetitive computational structures and
collections - Expanding template to instances that orchestrate
large amounts of computations reflecting the
workflow template structure - Generating appropriate metadata descriptions for
all the new data created during execution and
full elaboration of workflow specs - Ensuring validity of workflow instance
(BindValidate algorithm) - Keeping track of constraints on dataset used,
including global constraints among multiple
components as well as local constraints within
individual components. - Mapping equivalent datasets, detecting
pre-existing intermediate data, and prevent
unnecessary execution of workflow parts when
datasets already existAllows Pegasus to identify
same data products
15Publications
- SCEC CyberShake Workflows - Automating
Probabilistic Seismic Hazard Analysis
Calculations, Philip Maechling, Ewa Deelman, Li
Zhao, Robert Graves, Gaurang, Mehta, Nitin Gupta,
John Mehringer, Carl Kesselman, Scott Callaghan,
David Okaya, Hunter Francoeur, Vipin Gupta,
Yifeng Cui, Karan Vahi, Thomas Jordan, Edward
Field, in Workflows for e-Science, in press - Managing Large-Scale Workflow Execution from
Resource Provisioning to Provenance tracking The
CyberShake Example, Ewa Deelman, Scott
Callaghan, Edward Field, Hunter Francoeur, Robert
Graves, Nitin Gupta, Vipin Gupta, Thomas H.
Jordan, Carl Kesselman, Philip Maechling, John
Mehringer, Gaurang Mehta, David Okaya, Karan
Vahi, Li Zhao (Under review) - Semantic Metadata Generation for Large
Scientific Workflows , Jihie Kim, Yolanda Gil,
and Varun Ratnakar (Under review.) - Wings for Pegasus A Semantic Approach to
Creating Very Large Scientific Workflows,
Yolanda Gil, Varun Ratnakar, Ewa Deelman, Marc
Spraragen, and Jihie Kim. (Under review.)