Pegasus: Mapping Scientific Workflows onto the Grid - PowerPoint PPT Presentation

About This Presentation
Title:

Pegasus: Mapping Scientific Workflows onto the Grid

Description:

matchmaker. Workflow. repair. Policy. info. Workflow Refinement and execution ... Condor matchmaker. Attribute based discovery and selection ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 25
Provided by: deel3
Learn more at: https://pegasus.isi.edu
Category:

less

Transcript and Presenter's Notes

Title: Pegasus: Mapping Scientific Workflows onto the Grid


1
Pegasus Mapping Scientific Workflows onto the
Grid
  • Ewa Deelman
  • Center for Grid Technologies
  • USC Information Sciences Institute

2
Pegasus Acknowledgements
  • Ewa Deelman, Carl Kesselman, Saurabh Khurana,
    Gaurang Mehta, Sonal Patil, Gurmeet Singh,
    Mei-Hui Su, Karan Vahi (Center for Grid
    Computing, ISI)
  • James Blythe, Yolanda Gil (Intelligent Systems
    Division, ISI)
  • Collaboration with Miron Livny (UW Madison)
  • http//pegasus.isi.edu
  • Research funded as part of the NSF GriPhyN, NVO
    and SCEC projects and EU-funded GridLab

3
Outline
  • Workflow Management in Grids
  • Pegasus, Planning for Execution in Grids
  • Applications Using Pegasus
  • In-time planning
  • Future Research Directions

4
Grid Applications
  • Increasing in the level of complexity
  • Use of individual application components
  • Reuse of individual intermediate data products
    (files)
  • Description of Data Products using Metadata
    Attributes
  • Execution environment is complex and very dynamic
  • Resources come and go
  • Data is replicated
  • Components can be found at various locations or
    staged in on demand
  • Separation between
  • the application description
  • the actual execution description

5
(No Transcript)
6
Why Automate Workflow Generation?
  • Usability Limit Users necessary Grid
    knowledge
  • Monitoring and Directory Service
  • Replica Location Service
  • Complexity
  • User needs to make choices
  • Alternative application components
  • Alternative files
  • Alternative locations
  • The user may reach a dead end
  • Many different interdependencies may occur among
    components
  • Solution cost
  • Evaluate the alternative solution costs
  • Performance
  • Reliability
  • Resource Usage
  • Global cost
  • minimizing cost within a community or a virtual
    organization
  • requires reasoning about individual users
    choices in light of other users choices

7
GriPhyNsExecutable Workflow Construction
  • Build an abstract workflow based on VDL
    descriptions (Chimera)
  • Build an executable workflow based on the
    abstract workflows (Pegasus)
  • Execute the workflow (Condors DAGMan)

8
VDL and Abstract Workflow
VDL descriptions
User request data file c
9
Condors DAGMan
  • Developed at UW Madison (Livny)
  • Executes a concrete workflow
  • Makes sure the dependencies are followed
  • Execute the jobs specified in the workflow
  • Execution
  • Data movement
  • Catalog updates
  • Provides a rescue DAG in case of failure

10
PegasusPlanning for Execution in Grids
  • Maps from abstract to concrete workflow
  • Algorithmic and AI-based techniques
  • Automatically locates physical locations for both
    components (transformations) and data
  • Finds appropriate resources to execute
  • Reuses existing data products where applicable
  • Publishes newly derived data products
  • Chimera virtual data catalog
  • Provides provenance information

11
Information ComponentsUsed by Pegasus
  • Globus Monitoring and Discovery Service (MDS)
  • Locates available resources
  • Finds resource properties
  • Dynamic load, queue length
  • Static location of gridftp server, RLS, etc
  • Globus Replica Location Service
  • Locates data that may be replicated
  • Registers new data products
  • Transformation Catalog
  • Locates installed executables

12
Example Workflow Reduction
  • Original abstract workflow
  • If b already exists (as determined by query to
    the RLS), the workflow can be reduced

13
Mapping from abstract to concrete
  • Query RLS, MDS, and TC, schedule computation and
    data movement

14
Montage
  • Montage (NASA and NVO)
  • Deliver science-grade custom mosaics on demand
  • Produce mosaics from a wide range of data sources
    (possibly in different spectra)
  • User-specified parameters of projection,
    coordinates, size, rotation and spatial sampling.

Mosaic created by Pegasus based Montage from a
run of the M101 galaxy images on the Teragrid.
15
Small Montage Workflow
1200 nodes
16
Montage Acknowledgments
  • Bruce Berriman, John Good, Anastasia Laity,
    Caltech/IPAC
  • Joseph C. Jacob, Daniel S. Katz, JPL
  • http//montage.ipac. caltech.edu/
  • Testbed for Montage Condor pools at USC/ISI, UW
    Madison, and Teragrid resources at NCSA, PSC, and
    SDSC.
  • Montage is funded by the National Aeronautics
    and Space Administration's Earth Science
    Technology Office, Computational Technologies
    Project, under Cooperative Agreement Number
    NCC5-626 between NASA and the California
    Institute of Technology.

17
Applications Using Chimera, Pegasus and DAGMan
  • GriPhyN applications
  • High-energy physics Atlas, CMS (many)
  • Astronomy SDSS (Fermi Lab, ANL)
  • Gravitational-wave physics LIGO (Caltech, AEI)
  • Astronomy
  • Galaxy Morphology (NCSA, JHU, Fermi, many others,
    NVO-funded)
  • Biology
  • BLAST (ANL, PDQ-funded)
  • Neuroscience
  • Tomography for Telescience(SDSC, NIH-funded)

18
Current System
19
Workflow Refinement and execution
Users
Workflow refinement
Request
Levels of
abstraction
Application
Policy info
Workflow repair
-level
knowledge
Relevant
components
Logical
tasks
Full
abstract
workflow
Tasks
bound to
Task matchmaker
resources
and sent for
Partial
execution
execution
Not yet
time
executed
executed
20
Incremental Refinement
  • Partition Abstract workflow into partial
    workflows

21
Meta-DAGMan
22
Conclusions
  • Pegasus maps complex workflows onto the Grid
  • Uses Grid information services to find resources,
    data and executables
  • Reduces the workflow based on existing
    intermediate products
  • Used in many applications
  • Part of GriPhyNs Virtual Data Toolkit

23
Future Directions
  • Investigate various scheduling techniques
  • Investigating fault tolerance issues
  • Enable flexible interactions between workflow
    refiners (GriPhyN-wide scope Pegasus, DAGMan)
  • http//pegasus.isi.edu
  • GGF10 workshop on workflow management
  • GGF Workflow management research group
  • deelman_at_isi.edu

24
Summary
  • The Future Grid
  • Knowledge-based reasoning about resources enables
  • Semantic matchmaking
  • Aggregate resource reasoning
  • Task-level reasoning to plan and schedule jobs
    and resources
  • More agility and coordination
  • Wide range of users can specify high level
    requirements in a mixed-initiative mode
  • Mapping of high-level requirements to details
    required for execution
  • End-to-end resource negotiation and adaptive
    strategies to accommodate failure
  • The Grid Now
  • Syntax-based matchmaking of resources to job
    requirements
  • Condor matchmaker
  • Attribute based discovery and selection
  • Scheduling of jobs based on Grid-able users that
    specify job execution sequences and computing
    requirements
  • Scripting languages
  • Workflow languages,
  • Task graphs
  • Explicit mappings from task to jobs, simple job
    brokers
  • Explicit service negotiation and recovery
    strategies
Write a Comment
User Comments (0)
About PowerShow.com