Title: A Gridbased Extensible, Composable Service Execution
1A Grid-based Extensible, Composable Service
Execution
- Karpjoo Jeong (jeongk_at_konkuk.ac.kr)
- Konkuk University
- Suntae Hwang (sthwang_at_kookmin.ac.kr)
- Kookmin University
2MGrid An Integrated and Shared Molecular
Simulation Grid/e-Science Infrastructure(Korea
e-Science Initiative)
- Karpjoo Jeong (jeongk_at_konkuk.ac.kr)
- Konkuk University KISTI
- CO-PIs
- Seunho Jung, Konkuk University
- Suntae Hwang, Kookmin University
- Yoonsup Lee, KAIST
3Molecular Simulation
Computation
- Wide Application Areas
- Nanotechnolgy
- Biotechnology
- Medical Research
- Mechanical Engineering
- Etc.
4Major Obstacles to Effective Molecular Simulation
- Obstacle Enormous Computational Requirements
- Ex, Protein simulation may take months even with
supercomputers
- Solution Grid Computing
- Cost-effective large scale computing
infrastructure by sharing and aggregation of
computing resources
- New obstacle Complexity
- Grid computing is still too difficult to
scientists
- Obstacle Simulation Result Validation
- Different parameter settings or simulation tools
may result in different results even for same
molecules
- Solutions Comparative Study
- Perform simulation with various parameter
settings and tools - Comparative analysis of related simulation
results
- New Obstacle Exponential Increase in
Computational Requirements - Individual scientists or institutes may not
afford this approach
5MGrid Approach Integrated and Shared
- Integrated Grid Environment
- Web-based PSE, Computing, Databases, and Analyses
- Shared Environment
- Sharing of Simulation Results
- Comparative and Collective Analyses
- Research Community
- Promote Collaborative Simulation Efforts
6MGrid Approach Collaborative Research Env.
7Decide Publication from PSE
- Publish simulation results
- Select the destination General-purpose Semantic
Grid or e-Glycoconjugates
8Register Simulation Result (Insert Meta
Information) into e-Glycoconjugates
9Search Simulation Results from e-Glycoconjugates
10Download Simulation Jobs into PSE
11Analysis and Visualization using Plug-in Program
12Re-run Simulation Jobs Download Jobs into PC
13Re-run Simulation Jobs Create New Job
14Re-run Simulation Jobs Edit Script
15Re-run Simulation Jobs Upload Input Files
- Upload a related file such as 3D coordinate,
parameter, or topology file
16Re-run Simulation Jobs Execute
- Click a Auto or Manual button for the job
running -
17MGrid Software Architecture
18- Single System View
- Centralized monitoring control
Scheduling
PSE
Legacy software support standard interface
Cluster
Computational/Data/Semantic Grids
19MGrid Structure
PSE
PSE
Grid Portal
XML interface
Distributed Job Server
Distributed Job Server
Distributed Job Server
20Challenge Application-specific Support on
General Grid System Structure
Grid Middleware (ex Globus)
Local Resource Manager (ex PBS)
Client PSE
Simulation Server
Application-specific request
Application-independent connection/integration/man
agement
Application-specific service
21MGrid Approach Shared Info Infrastructure
Grid Middleware (ex Globus)
Local Resource Manager (ex PBS)
Client PSE
Simulation Server
Globus Middleware (ex Globus)
Distributed Directory Repository
Job Metadata Management (Identity, Conf, etc)
Grid Portal
22Active File System
Web Browser
Grid Middleware (e.g. Globus)
Local Job Management System (e.g. PBS, LSF)
Portal Framework
Job Management
Simulation Server
Global ID Service
23Web Browser
Portal Framework
MGridJob Management
Active File System
Global ID Service
RSL lt invoker XML gt
GRAM
Globus Toolkit 2
Invoker
Job Execution Framework
Infomation Provider
Event Manager
(MSM) Job Manager
Shared Repository
PBS, LSF, Condor
(WSM) Wrapper Legacy Driver
24Web Browser
Portal Framework
MGridJob Management
Active File System
Global ID Service
RSL Version 2
JSDL
WS-GRAM
Globus Toolkit 4
Job Execution Framework
Job Factory
(MSM) Job Manager
PBS, LSF, Condor
Infomation Provider
Event Manager
Shared Repository
(WSM) Wrapper Legacy Driver
(WSM) Wrapper Legacy Driver
(WSM) Wrapper Legacy Driver
(WSM) Wrapper Legacy Driver
25Legacy Simulation Package
- Client software with nice utilities
- E.g., GUI, visualization, molecule building, data
management tools) - Simulation Program (a kind of engine)
- A kind of script interpreter
- Assume a working directory where script file,
input data files and output files are stored
Analysis Visualization Tools
GUI Utilities
internal data files are program-specific
Simulation Engine
script
26- Integration Approach
- Fine-grained data management. Deal with each
file. Complicated - Coarse-grained data management. Directory as a
unit. Simple
Grid Middleware
Grid Middleware
Simulation Engine
Simulation Engine
script
27System Structure
Simulation Service Server
Global Scheduler
Legacy Simulation Package Management System
Legacy Simulation Package
Client System
Legacy Simulation Package Management System
Remote Program Invocation System
PSE
Simulation Working Directory
Simulation Working Directory
Legacy Simulation Package Management System
Synchronization
Legacy Simulation Package
Remote Monitoring System
28Challenging Design Issues
- Parametric Design
- Defined as simulation-system(x) where x CHARMM,
GAUSSIAN, or AMBER - Minimize and localize the legacy software
dependency - Current Design Interface for legacy software
- Remote program invocation
- Replicating working directories
- Standard interface for legacy simulation SW
management system
control message flow
Legacy Simulation Package Management System
Remote Program Invocation System
PSE
replicated directory synchronization
legacy software-specific data
Simulation Working Directory
Simulation Working Directory
Synchronization
29- Working Directory Replication
- Some data files are architecture-dependent binary
data - Some data files such as log files are very big
(e.g., a few hundred MBs or more) - Synchronization between remote program invocation
and directory replication is required - Current Design intelligent replication mechanism
controlled by legacy simulation package
management system
control message flow
Legacy Simulation Package Management System
Remote Program Invocation System
PSE
replicated directory synchronization
legacy software-specific data
Simulation Working Directory
Simulation Working Directory
Synchronization
30- Real time Remote Monitoring of Legacy Software
Execution - Complicated by grid computing-blind legacy
software, remote execution, and
application-dependent monitoring data (e.g.,
represented as data files or plotted graphs) - Current Design by supporting remote execution of
traditional monitoring methods(scientists already
have) - Simplified by replicating working directories
- Performance issue local visualization vs. remote
visualization
Legacy Simulation Package Management System
Remote Program Invocation System
PSE
simulation directory copy
Simulation Working Directory
Simulation Working Directory
Synchronization
Local Visualization
Remote Visualization
31CHARMM-based Prototype
Client System
Simulation Service Server
CHARMM Management System in Python
CHARMM Management System in JavaCoG
GRAM-based Remote Program Invocation
PSE
CHARMM
grep gnuplot
GridFTP-based Synchronization
Simulation Data Repository
Simulation Data Repository
32Virtual Directory-based Design
- Logically allocated to each simulation job and
shared by PSEs and Computing Servers - Physically implemented by data grids with GridFTP
metadata command
output files
33Decoupling of Control and Data Channels
- Control Channels. Do not deal with data inside
files
- Data Channels. Do not deal with computing controls
metadata command
output files
34MGrid Data Grids
- Distributed Simulation Result Repository
- A Collection of Virtual Directories for
simulation results - Support global access
- Information Service
- Maintain metadata about simulation jobs
automatically
Information Service
Relocate
35Synchronization Issues in Data Grids
- Synch between Control and Data Channels
- Synch between PSE and Data Grids
- Synch between Computing Servers and Data Grids
metadata command
output files
36Active File System- Development of PSE for BT
Applications on Computational GRID -
- Suntae Hwang
- School of Computer Science
- Kookmin University
- sthwang_at_kookmin.ac.kr
37Our Approach for PSE
- Integrate Legacy Software Utility with our PSE
- Allow scientists to use client software that they
already have - eg. Visualization, molecular structure
build/analysis tool - Design and implement gluing system
- Workflow Management System
- Allow scientists to plan and execute experiments
(a set of simulation tasks and human intervention
tasks) in a workflow style - Designed to support single system view
- Look as if simulation tasks were run locally
- Allow centralized monitoring and control
- Actual simulation execution is delegated to
distributed simulation platform by grid
middleware - Support BT application first and extend for other
similar applications - Chiral Separation by Cyclocarbohydrates Konkuk
University - MGrid A Molecular Simulation Grid
38Experiment Chiral Separation Database
- Differentiate chiral drug candidates (pair) by
docking them with chiral selectors - Chiral drug candidates (guest) 1000 for now
- Chiral selectors (host) 50 for now
- Motivation for molecular simulation
- So far, real experiments have been mostly used,
but take a couple of years for a single pair of
guest and host. Selecting a right host is very
important. Molecular simulation takes much
shorter time - By building databases about host and guest
docking, develop a host prediction method - Estimated computation time
- For a single workstation, molecular simulation
for a single pair of guest and host takes about
two weeks - Molecular simulation for 100050 pairs takes
2,000years with a single workstation. With MGrid,
we can shorten this time significantly.
39PSE Design Issues
- Workflow Management
- Manage inter-subworkflows dependency manually
- Using PSE client many legacy SW utilities
- Manage inter-tasks dependency by automatic engine
- Activate tasks by dependencies
- forward triggering
- By automatic management
- Using complete data product
- Sometime require user confirmation to trigger
- Activate tasks on users demand for results
- backward triggering
- For monitoring/analyzing
- Using intermediate data product
Task
Task
Task
Task
Sub-workflow
Task
Task
Sub-workflow
Task
Task
Task
Task
Sub-workflow
Task
40PSE Design Issues (cont)
- Gluing system
- User manually handles the flows of interactive
tasks (do not build automatic workflows for them
because they are too complicated) - Need a kind of glue system to keep all activities
among user interactions and tasks in discipline
PSE
Interactive software utilities
Job preparation tool
Workflow Management for sub-workflows
MC Docking beta-cyclodextrin N-acetyltyrosine
MD Simulation
Compute Energy Field
41PSE Design Issues (cont)
- Product oriented view of tasks
- Task consists of a Product list and an associated
Creator - All application data which affect inter-task
dependencies must belong to any product list - All tasks (manual or automatic) which access to
an application data are synchronized through the
Product List
Structure Building beta-cyclodextrin
Structure Building N-acetyltyrosine
MC Docking beta-cyclodextrin N-acetyltyrosine
Application Data
MD Simulation
42PSE on Active File System
PSE
Job preparation tool
Interactive software utilities
Workflow Management for sub-workflows
Structure Building beta-cyclodextrin
Structure Building N-acetyltyrosine
MC Docking beta-cyclodextrin N-acetyltyrosine
Creator
MD Simulation
Product List
Active File System
Import /FTP
Import /FTP
mc-doc.inp /CHARMM
md-sim.inp /CHARMM
Normal File Access
File path Information
File Access through Product List
Ordinary File System
Ordinary File System
43Components of Active File System Creator
- Creator
- consist of Input file list and output file list
- Two lists must contain all associated file names
completely - All input file names must be mapped to active
files - Zero or more output file names are mapped to
active files in associated Product List - Special kind of creator
- Interactive utilities which aware of Active file
system can be a creator - Ex. FTP between ordinary file system and active
file system(import), - Active file system enabled text
editor(save file on Product List)
- Input/output file lists
- Generated automatically or manually by either
Active File Manager or Job preparation tool - Map information
- Filled automatically or manually by either Active
File Manager or Job preparation tool - Resource information
- Filled by scheduler automatically or by other
tools manually
Creator
Inputs
Resource Information
Outputs
44Components of Active File System Product List
- Product List
- contains zero or more active files
- Must have only one associated creator
- Can be updated dynamically by adding/removing
active files whenever user decides to see/discard
them
Creator
Creator
Creator
Inputs
Outputs
45Dispatching Task
- Working directory can be built with Map
Information - File path information can be resolved with
information of resource allocated for a task - Determine that remote files are staged, copied,
or synched
Task
Task
Dispatching
Creator
Product List
Normal File Access
Creator
Inputs
Creator
Inputs
Resource Information
File path Information
Outputs
Outputs
File Access through Product List
File Staging or Copying
File Staging or Copying
Export
Ordinary File System
Ordinary File System
Working Directory
File Sync On Demand
File Sync On Demand
46Components of Active File System Active File
- Active File
- Consist of an anchor file and an ordinary file
which may be located in remote site - Anchor file contains status among INITIATED,
CREATING, COMPLETE, and Never Synced, Partially
Synced, Completely Synched, - All access to active file is synchronized
Multiple Reader/Single Writer Lock at Active File
System level - Reader Visualization utilities, editor(read
only), FTP(export) - Writer Simulation task, editor(save),
FTP(import) - Creator should be matched or compatible when
writing in Product List - Otherwise, save active files in a different or
new Product List
Creator
Inputs
Creator
Resource Information
Inputs
Creator
Outputs
Product List
Outputs
Normal File Access
Export
File path Information
Ordinary File System
Working Directory
File Access through Product List
File Sync On Demand
47API for Active File System
- Access primitives for Active File
- open
- create
- close
- read
- write
- lseek
- unlink
- remove
- fcntl
- Standard io
- getc/putc
- In context
- access, chmod, chown, link, rename, symlink,
readlink - stat, fstat
- mkproductlist/rmproductlist
- mkcreator/rmcreator
48PSE on Active File System (again)
Legacy Interactive SW Tools
Product
beta-cyclodextrin
mc-doc.inp /CHARMM
Molecular Structure Builder (ex. Insight2)
Molecular Structure Viewer Analyzer (ex.
gOpenMol)
mc-doc.inp /CHARMM
Other Tools (ex. Text viewer)
Active File
Import
MC Docking results
Ordinary File System
Creator
Working Directory
Extract
Ordinary File System
md-sim.inp /CHARMM
md-sim.inp /CHARMM
MD Simulation results
Extract
New Tools for PSE on Active File System
Workflow Manager
Ordinary File
Head/tail filter
Built-in viewer
Filtered OUT file
Working Directory
Creator(Task) Scheduling /Monitoring
File Staging or Copying
Job Preparation Tool
energy.inp /CHARMM
Energy Field
energy.inp /CHARMM
gyration.inp /CHARMM
Gyration Field
File Sync On Demand
Working Directory
Ordinary File Read/Write
Active File System
49Summary
50Summary
- MGrid is an integrated molecular simulation grid
environment for computing, databases, and
analyses - MGrid software architecture is designed to be
extensible and composable - Make PSE, Distributed Batch System, Job
Execution as independent as possible - Isolate application-dependent operations
- Decoupling of data and control channels
- Control channels are application-independent
- Data channels are application-dependent
- Active File System
- Virtual Directory/File-based Design