A Task Pipelining Framework for eScience Workflow Management Systems

1 / 15
About This Presentation
Title:

A Task Pipelining Framework for eScience Workflow Management Systems

Description:

FUSE kernel module. Intercept I/O system call. Redirect the system call to the PFS Client ... fuse. fuse. Conclusion. We propose a task pipelining framework ... –

Number of Views:39
Avg rating:3.0/5.0
Slides: 16
Provided by: agno9
Category:

less

Transcript and Presenter's Notes

Title: A Task Pipelining Framework for eScience Workflow Management Systems


1
A Task Pipelining Framework for e-Science
Workflow Management Systems
  • Hyeong S. Kim (hskim_at_dcslab.snu.ac.kr)
  • In Soon Cho (ischo_at_dcslab.snu.ac.kr)
  • Heon. Y. Yeom (yeom_at_snu.ac.kr)
  • Dept. of Computer Science and Engineering
  • Seoul National University

2
Outline
  • Introduction
  • Motivation
  • HVEM Grid
  • Proposing System
  • PIPE File System
  • Conclusion

3
Introduction
  • Complex Scientific Workflow
  • Input/output data are becoming larger and larger
  • In most of the scientific workflows, we cannot
    ignore the time consumed int the intermediate
    data movement which possesses high portion of
    running time
  • Our Focus
  • Staging is our primary concern.
  • We seek a way to pipeline multiple interconnected
    tasks
  • Applications can benefit if the output of the
    prior task can be used by the posterior task once
    the data gets ready
  • In this paper
  • We consider several components to enable task
    pipelining
  • As a reference implementation, we propose PFS
    that supports various legacy applications without
    modification to the existing applications.
  • Our system can also be described in a workflow
    specification and thus, a user is able to
    construct a task pipelining framework without any
    further efforts except presenting a workflow
    specification for the PFS.

4
Motivating Application HVEM Grid
  • HVEM (High Voltage Electronic Microscope)
    financially supported by the Ministry of Science
    and Technology in Korea.
  • HVEM has been installed in October, 2003, at the
    headquarter of Korea Basic Science Institute
    (KBSI), a nation user facility.
  • The main purpose is to offer a leading-edge
    analytical technology to researchers in diverse
    scientific fields.

5
HVEM Grid System
  • High Voltage Electron Microscope (HVEM) Grid
    system is a powerful tool designed upon the
    concepts of Grid and Web Service
  • To control instruments remotely
  • To manage and control 3-D processing of images
  • To store data automatically

6
Image Processing (G-Render)
  • Grid-based image processing system
  • 3-step image processing service
  • 1) Image preprocessing
  • 2) Image alignment
  • 3) Tomogram generation
  • 4) Segmentation
  • Enabling high-performance image processing by
    utilizing the Grid to acquire unlimited computing
    power

7
Grid Workflow Management System
Grid users
Grid Workflow Application Modeling Definition
Tools
Workflow Design Definition
Grid Information Services
Build Time
Grid Workflow Specification
Resource Info Service
Run Time
Application Info Service
Grid Workflow Enactment Service
Workflow Execution Control
Workflow Scheduling
Data Movement
Fault Management
Grid Middleware
Interaction with Grid Resources
Grid Resources
8
Design Consideration
  • Application-transparency
  • Supporting various legacy applications
  • Flexibility
  • Providing a general solution
  • Usability

9
Components Required
  • Workflow engine
  • If sufficient amount of data available, run next
    task immediately
  • Storage manager
  • Manage storage
  • Logical to physical mapping
  • Directory management
  • Advertise data availability to workflow engine
  • Physical storage
  • Store input/output files
  • Handle read/write operations

10
Reference Implementation
  • PIPE File System (PFS) consists of
  • PFS Manager (storage manager)
  • PFS Data Servers (physical storage)
  • PFS Library (user-transparency by FUSE)

11
PFS Manager
  • Resource Management
  • Storage management
  • PFS Data Server Maintenance
  • Logical to physical file mapping
  • Directory management
  • Single access point for the clients
  • Client can mount the PIPE File System manipulate
    the files as usual
  • Schedule Triggering
  • Advertise data availability to workflow engine

12
PFS Data Servers
  • Physical File Management
  • Store input/output files in its local storage
  • Serve read/write operations

13
PFS Library
  • User-level Library for Application
  • Two components
  • FUSE kernel module
  • Intercept I/O system call
  • Redirect the system call to the PFS Client
  • PFS Client
  • Interpret the I/O system call
  • Redirect the command to PFS Manager or PFS Data
    Server
  • Maintains the open file list

14
Integration
Workflow scheduler
enactment
enactment
metadata
metadata
p
p
PFS Manager
fuse
fuse
data
data
PFS Data Server
15
Conclusion
  • We propose a task pipelining framework
  • Our system provides task pipelining in a form of
    a simple distributed file system
  • Triggering interface is used to enact next task
Write a Comment
User Comments (0)
About PowerShow.com