Development of Grid Applications on Standard Grid Middleware - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Development of Grid Applications on Standard Grid Middleware

Description:

Computational Grids becomes feasible for running Grid-enabled applications. ... A software package for programming Grid applications using GridRPC. ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 26

Provided by: yos4

Category:

more less

Transcript and Presenter's Notes

Title: Development of Grid Applications on Standard Grid Middleware

1
Development of Grid Applications on Standard Grid
Middleware

Hiroshi Takemiya, Kazuyuki Shudo, Yoshio Tanaka,
Satoshi Sekiguchi
Grid Technology Research Center, AIST

2
Background

Computational Grids becomes feasible for running
Grid-enabled applications.
How do you implement Grid-enabled applications?
Use Globus APIs? - too complicate
MPI? yes, its easy, but
need co-allocation ?
cannot use private IP address resources ?
fault intolerant ?
Many potential application developers need
information of
how to write/execute Grid-enabled programs
is it easy?
is it efficiently executed in computational Grids?

3
Objectives

Through the work of gridifying a legacy
program, we would like to
show how to program Grid-enabled applications.
Sample application climate simulation
Middleware Ninf-G (globus-based GridRPC system)
evaluate performance of the Grid-enabled
application.
Is it efficiently executed?
evaluate Grid middleware (Globus, Ninf-G).
The results should be fed back to the system
design and implementation.
find possible problems in building/using
international Grid Testbed
Pay much efforts for initiation.
Keeping it stable is not easy.

4
Outline

Brief overview of application
Ninf-G GridRPC system
What is GridRPC?
Architecture of Ninf-G
How to program using Ninf-G
Experiment
Testbed ApGrid Testbed
Results
Lessons Learned
Summary

5
Climate Simulation System

Forcasting short to middle term climate change
Windings of jet streams
Blocking phenomenon of high atmospheric pressure
Barotropic S-model proposed by Prof. Tanaka
Legacy FORTRAN program
Simple and precise
Treating vertically averaged quantities
150 sec for 100 days prediction/1 simulation
Keep high precision over long period
Introducing perturbation for each simulation
Taking a statistical ensemble mean
Requires100 1000 simulations

1989/1/30-2/12
Gridifying the program enables quick response
6
GridRPC RPC-based programming model on the Grid
Utilization of remote supercomputers
? Notify results
Internet
user
? Call remote procedures
Call remote libraries
Large scale computing utilizing multiple
supercomputers on the Grid
7
GridRPC (contd)

v.s. MPI
Client-server programming is suitable for
task-parallel applications.
Does not need co-allocation
Can use private IP address resources if NAT is
available (at least when using Ninf-G)
Better fault tolerancy
1st GridPRC WG at GGF8 (today! 1400)
Define standard GridRPC API later deal with
protocol
Standardize only minimal set of features
higher-level features can be built on top
Provide several reference implementations
Ninf-G, NetSolve,

8
Ninf-G Features At-a-Glance

A software package for programming Grid
applications using GridRPC.
Ease-of-use, client-server, Numerical-oriented
RPC system
No stub information at the client side
Built on top of the Globus Toolkit

9
Architecture of Ninf-G
Server side
Client side
IDL file
Numerical Library
Client
IDL Compiler
Generate
Globus-IO
Interface Request
Interface Reply
Remote Library Executable
GRAM
GRIS
Interface Information LDIF File
retrieve
10
How to program using Ninf-G

Build remote libraries
Write an IDL file
compile it using IDL compiler
register the information to GRIS (simply run make
install)
Write a client program using GridRPC APIs
two kinds of RPC APIs
synchronous call (grpc_call())
asynchronous call (grpc_call_async())

11
Gridify the original (seq.) climate simulation

Dividing a program into two parts as a
client-server system
Client
Pre-processing reading input data
Post-processing averaging results of ensembles
Server
climate simulation, visualize

S-model Program
Reading data
Solving Equations
Solving Equations
Solving Equations
Averaging results
VIsualize
12
Gridify the climate simulation (contd)

Behavior of the Program
Typical to task parallel applications
Establish connections to all nodes
Distribute a task to all nodes
Retrieve a result
Throw a next task
Cost for gridifying the program
Performed on a single computer
Eliminating common variables
Eliminating data dependence among server
processes
Seed for random number generation
Performed on a grid environment
Inserting Ninf-g functions
Creating self scheduling routine

Adding totally 100 lines (lt 10 of the original
program) Finished in a few days
13
Testbed ApGrid Testbed
http//www.apgrid.org/
14
Resources used in the experiment

KOUME Cluster (AIST)
Client
UME Cluster (AIST)
jobmanager-grd, (40cpu 20cpu)
AIST GTRC CA
AMATA Cluster (KU)
jobmanager-sqms, 6cpu
AIST GTRC CA
Galley Cluster (Doshisha U.)
jobmanager-pbs, 10cpu
Globus CA
Gideon Cluster (HKU)
jobmanager-pbs, 15cpu
HKU CA

PRESTO Cluster (TITECH)
jobmanager-pbs, 4cpu
TITECH CA
VENUS Cluster (KISTI)
jobmanager-pbs, 60cpu
KISTI CA
ASE Cluster (NCHC)
jobmanager-pbs, 8cpu
NCHC CA
Handai Cluster (Osaka U)
jobmanager-pbs, 20cpu
Osaka CA
Total 183

15
Illustration of Climate Simulation
server
front node - public IP - Globus - gatekeeper
- jobmanager - pbs, grd, sqms - NAT
client
Sim. Server
backend nodes - private IP or public IP -
Globus SDK - Ninf-G Lib
Sequential Run 8000 sec Execution on Grid
300 sec (100cpu)
Vis. Server
16
Lessons Learned

We have to pay much efforts for initiation
Problems on installation of GT2/PBS/jobmanger-pbs,
grd
Failed in lookup service of hostname/IP addresses
Both for internet and intranet
Add host entries in /etc/hosts in our resources
failed in rsh/ssh server to/from backend nodes
.rhosts, ssh key, mismatch of hostname
pbs_rcp was located in NFS mounted (nosuid)
volume
bugs in jobmanager scripts (jobmanager-grd is not
formally released)
GT2 has poor interface with queuing system

17
Lessons Learned (contd)

We have to pay much efforts for initiation
(contd)
What I asked
Open firewall/TCP Wrapper
Additionally build Info SDK bundle with gcc32dbg
Add GLOBUS_LOCATION/lib to /etc/ld.so.conf and
run ldconfig (this can be avoided by specifying
link option)
change configuration of xinetd/inetd
Enable NAT

18
Lessons Learned (contd)

Difficulties caused by the bottom-up approach for
building ApGrid Testbed and the problems on the
installation of the Globus Toolkit.
Most resources are not dedicated to the ApGrid
Testbed.
There may be busy resources
Need grid level scheduler, fancy Grid reservation
system?
Incompatibility between different version of GT2

19
Lessons Learned (contd)

Performance Problems
Overhead caused by MDS lookup
it takes several 10 seconds
Added a new feature to Ninf-G so as to bypass MDS
lookup
Default polling interval of the Globus jobmanager
(30 seconds) is not appropriate for running
fine-grain applications.
AIST and Doshisha U. have changed the interval to
5 seconds (need to re-compile jobmanager)

20
Lessons Learned (contd)

Performance Problems (contd)
Time for initialization of function handles
cannot be negligible
Overhead caused by not only by MDS lookup but
also hitting gatekeeper (GSI authentication) and
a jobmanager invocation
Current Ninf-G implementation needs to hit
gatekeeper for initialization of function handles
one-by-one
Although Globus GRAM enables to invoke multiple
jobs at one contact to gatekeeper, GRAM API is
not sufficient to control each jobs.
Used multithreading for initialization to improve
performance
Ninf-G2 will provide a new feature which supports
efficient initialization of multiple function
handles.

21
Lessons Learned (contd)

We observed that Ninf-G apps did not work
correctly due to un-expected configuration of
clusters
Failed in GSI auth. for establishing connection
for file transfers using GASS.
Backend nodes do not have host certs.
Added a new feature to Ninf-G which allows to use
non-secure connection
Due to the configuration of local scheduler
(PBS), Ninf-G executables were not activated.
Example
PBS jobmanager on a 16 nodes cluster
Call grpc_call 16 times on the cluster. App.
developer expected to invoke 16 Ninf-G
executables simultaneously.
Configuration of PBS Queue Manager set the max
number of simultaneous job invocation for each
user a 9
9 Ninf-G executables were launched, however 7
were not activated
Added a new feature to Ninf-G so as to set
timeout for initialization of a function handle.

22
Lessons Learned (contd)

Some resources are not stable
example If I call many (more than 20) RPCs,
some of them fails (but sometimes all will done)
not yet resolved
GT2? Ninf-G? OS? Hardware?
Other instability
Version up of software (gt2, pbs, etc.) without
notification
realized when the application would fail.
it worked well yesterday, but Im not sure
whether it works or not today
We could adapt for these instability by dynamic
task allocation.

23
Summary

Introduce how to develop Grid-enabled
application using Ninf-G.
Many lessons learned.
Existing sequential application could be easily
gridified using Ninf-G.
Performance was so so.
Its very hard to establish/keep stable Grid
testbed.
Performance problems in GT2, and thus, in Ninf-G
Insights gained by the experiments gave important
direction for Ninf-G2.
Ninf-G2 will be released at SC2003.

24
Special Thanks (for technical support) to

Kasetsart University (Thailand)
Sugree Phatanapherom
Doshisha University (Japan)
Yusuke Tanimura
University of Hong Kong (Hong Kong)
CHEN Lin, Elaine
KISTI (Korea)
Gee-Bum Koo, Jae-Hyuck

Tokyo Institute of Technology (Japan)
Kenichiro Shirose
NCHC (Taiwan)
Julian Yu-Chung Chen
Osaka University (Japan)
Susumu Date
AIST (Japan)
Grid Support Team
APAN
HK, TW, JP

25
For more info.

Ninf/Ninf-G
http//ninf.apgrid.org/
ninf_at_apgrid.org
JOGC paper
Y. Tanaka et.al., Ninf-G A Reference
Implementation of RPC-based Programming
Middleware for Grid Computing, JOGC, Vol.1,
No.1, pp.41-51.
ApGrid
http//www.apgrid.org/