Development of Grid Applications on Standard Grid Middleware - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Development of Grid Applications on Standard Grid Middleware

Description:

Computational Grids becomes feasible for running Grid-enabled applications. ... A software package for programming Grid applications using GridRPC. ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 26
Provided by: yos4
Category:

less

Transcript and Presenter's Notes

Title: Development of Grid Applications on Standard Grid Middleware


1
Development of Grid Applications on Standard Grid
Middleware
  • Hiroshi Takemiya, Kazuyuki Shudo, Yoshio Tanaka,
    Satoshi Sekiguchi
  • Grid Technology Research Center, AIST

2
Background
  • Computational Grids becomes feasible for running
    Grid-enabled applications.
  • How do you implement Grid-enabled applications?
  • Use Globus APIs? - too complicate
  • MPI? yes, its easy, but
  • need co-allocation ?
  • cannot use private IP address resources ?
  • fault intolerant ?
  • Many potential application developers need
    information of
  • how to write/execute Grid-enabled programs
  • is it easy?
  • is it efficiently executed in computational Grids?

3
Objectives
  • Through the work of gridifying a legacy
    program, we would like to
  • show how to program Grid-enabled applications.
  • Sample application climate simulation
  • Middleware Ninf-G (globus-based GridRPC system)
  • evaluate performance of the Grid-enabled
    application.
  • Is it efficiently executed?
  • evaluate Grid middleware (Globus, Ninf-G).
  • The results should be fed back to the system
    design and implementation.
  • find possible problems in building/using
    international Grid Testbed
  • Pay much efforts for initiation.
  • Keeping it stable is not easy.

4
Outline
  • Brief overview of application
  • Ninf-G GridRPC system
  • What is GridRPC?
  • Architecture of Ninf-G
  • How to program using Ninf-G
  • Experiment
  • Testbed ApGrid Testbed
  • Results
  • Lessons Learned
  • Summary

5
Climate Simulation System
  • Forcasting short to middle term climate change
  • Windings of jet streams
  • Blocking phenomenon of high atmospheric pressure
  • Barotropic S-model proposed by Prof. Tanaka
  • Legacy FORTRAN program
  • Simple and precise
  • Treating vertically averaged quantities
  • 150 sec for 100 days prediction/1 simulation
  • Keep high precision over long period
  • Introducing perturbation for each simulation
  • Taking a statistical ensemble mean
  • Requires100 1000 simulations

1989/1/30-2/12
Gridifying the program enables quick response
6
GridRPC RPC-based programming model on the Grid
Utilization of remote supercomputers
? Notify results
Internet
user
? Call remote procedures
Call remote libraries
Large scale computing utilizing multiple
supercomputers on the Grid
7
GridRPC (contd)
  • v.s. MPI
  • Client-server programming is suitable for
    task-parallel applications.
  • Does not need co-allocation
  • Can use private IP address resources if NAT is
    available (at least when using Ninf-G)
  • Better fault tolerancy
  • 1st GridPRC WG at GGF8 (today! 1400)
  • Define standard GridRPC API later deal with
    protocol
  • Standardize only minimal set of features
    higher-level features can be built on top
  • Provide several reference implementations
  • Ninf-G, NetSolve,

8
Ninf-G Features At-a-Glance
  • A software package for programming Grid
    applications using GridRPC.
  • Ease-of-use, client-server, Numerical-oriented
    RPC system
  • No stub information at the client side
  • Built on top of the Globus Toolkit

9
Architecture of Ninf-G
Server side
Client side
IDL file
Numerical Library
Client
IDL Compiler
Generate
Globus-IO
Interface Request
Interface Reply
Remote Library Executable
GRAM
GRIS
Interface Information LDIF File
retrieve
10
How to program using Ninf-G
  • Build remote libraries
  • Write an IDL file
  • compile it using IDL compiler
  • register the information to GRIS (simply run make
    install)
  • Write a client program using GridRPC APIs
  • two kinds of RPC APIs
  • synchronous call (grpc_call())
  • asynchronous call (grpc_call_async())

11
Gridify the original (seq.) climate simulation
  • Dividing a program into two parts as a
    client-server system
  • Client
  • Pre-processing reading input data
  • Post-processing averaging results of ensembles
  • Server
  • climate simulation, visualize

S-model Program
Reading data
Solving Equations
Solving Equations
Solving Equations
Averaging results
VIsualize
12
Gridify the climate simulation (contd)
  • Behavior of the Program
  • Typical to task parallel applications
  • Establish connections to all nodes
  • Distribute a task to all nodes
  • Retrieve a result
  • Throw a next task
  • Cost for gridifying the program
  • Performed on a single computer
  • Eliminating common variables
  • Eliminating data dependence among server
    processes
  • Seed for random number generation
  • Performed on a grid environment
  • Inserting Ninf-g functions
  • Creating self scheduling routine

Adding totally 100 lines (lt 10 of the original
program) Finished in a few days
13
Testbed ApGrid Testbed
http//www.apgrid.org/
14
Resources used in the experiment
  • KOUME Cluster (AIST)
  • Client
  • UME Cluster (AIST)
  • jobmanager-grd, (40cpu 20cpu)
  • AIST GTRC CA
  • AMATA Cluster (KU)
  • jobmanager-sqms, 6cpu
  • AIST GTRC CA
  • Galley Cluster (Doshisha U.)
  • jobmanager-pbs, 10cpu
  • Globus CA
  • Gideon Cluster (HKU)
  • jobmanager-pbs, 15cpu
  • HKU CA
  • PRESTO Cluster (TITECH)
  • jobmanager-pbs, 4cpu
  • TITECH CA
  • VENUS Cluster (KISTI)
  • jobmanager-pbs, 60cpu
  • KISTI CA
  • ASE Cluster (NCHC)
  • jobmanager-pbs, 8cpu
  • NCHC CA
  • Handai Cluster (Osaka U)
  • jobmanager-pbs, 20cpu
  • Osaka CA
  • Total 183

15
Illustration of Climate Simulation
server
front node - public IP - Globus - gatekeeper
- jobmanager - pbs, grd, sqms - NAT
client
Sim. Server
backend nodes - private IP or public IP -
Globus SDK - Ninf-G Lib
Sequential Run 8000 sec Execution on Grid
300 sec (100cpu)
Vis. Server
16
Lessons Learned
  • We have to pay much efforts for initiation
  • Problems on installation of GT2/PBS/jobmanger-pbs,
    grd
  • Failed in lookup service of hostname/IP addresses
  • Both for internet and intranet
  • Add host entries in /etc/hosts in our resources
  • failed in rsh/ssh server to/from backend nodes
  • .rhosts, ssh key, mismatch of hostname
  • pbs_rcp was located in NFS mounted (nosuid)
    volume
  • bugs in jobmanager scripts (jobmanager-grd is not
    formally released)
  • GT2 has poor interface with queuing system

17
Lessons Learned (contd)
  • We have to pay much efforts for initiation
    (contd)
  • What I asked
  • Open firewall/TCP Wrapper
  • Additionally build Info SDK bundle with gcc32dbg
  • Add GLOBUS_LOCATION/lib to /etc/ld.so.conf and
    run ldconfig (this can be avoided by specifying
    link option)
  • change configuration of xinetd/inetd
  • Enable NAT

18
Lessons Learned (contd)
  • Difficulties caused by the bottom-up approach for
    building ApGrid Testbed and the problems on the
    installation of the Globus Toolkit.
  • Most resources are not dedicated to the ApGrid
    Testbed.
  • There may be busy resources
  • Need grid level scheduler, fancy Grid reservation
    system?
  • Incompatibility between different version of GT2

19
Lessons Learned (contd)
  • Performance Problems
  • Overhead caused by MDS lookup
  • it takes several 10 seconds
  • Added a new feature to Ninf-G so as to bypass MDS
    lookup
  • Default polling interval of the Globus jobmanager
    (30 seconds) is not appropriate for running
    fine-grain applications.
  • AIST and Doshisha U. have changed the interval to
    5 seconds (need to re-compile jobmanager)

20
Lessons Learned (contd)
  • Performance Problems (contd)
  • Time for initialization of function handles
    cannot be negligible
  • Overhead caused by not only by MDS lookup but
    also hitting gatekeeper (GSI authentication) and
    a jobmanager invocation
  • Current Ninf-G implementation needs to hit
    gatekeeper for initialization of function handles
    one-by-one
  • Although Globus GRAM enables to invoke multiple
    jobs at one contact to gatekeeper, GRAM API is
    not sufficient to control each jobs.
  • Used multithreading for initialization to improve
    performance
  • Ninf-G2 will provide a new feature which supports
    efficient initialization of multiple function
    handles.

21
Lessons Learned (contd)
  • We observed that Ninf-G apps did not work
    correctly due to un-expected configuration of
    clusters
  • Failed in GSI auth. for establishing connection
    for file transfers using GASS.
  • Backend nodes do not have host certs.
  • Added a new feature to Ninf-G which allows to use
    non-secure connection
  • Due to the configuration of local scheduler
    (PBS), Ninf-G executables were not activated.
  • Example
  • PBS jobmanager on a 16 nodes cluster
  • Call grpc_call 16 times on the cluster. App.
    developer expected to invoke 16 Ninf-G
    executables simultaneously.
  • Configuration of PBS Queue Manager set the max
    number of simultaneous job invocation for each
    user a 9
  • 9 Ninf-G executables were launched, however 7
    were not activated
  • Added a new feature to Ninf-G so as to set
    timeout for initialization of a function handle.

22
Lessons Learned (contd)
  • Some resources are not stable
  • example If I call many (more than 20) RPCs,
    some of them fails (but sometimes all will done)
  • not yet resolved
  • GT2? Ninf-G? OS? Hardware?
  • Other instability
  • Version up of software (gt2, pbs, etc.) without
    notification
  • realized when the application would fail.
  • it worked well yesterday, but Im not sure
    whether it works or not today
  • We could adapt for these instability by dynamic
    task allocation.

23
Summary
  • Introduce how to develop Grid-enabled
    application using Ninf-G.
  • Many lessons learned.
  • Existing sequential application could be easily
    gridified using Ninf-G.
  • Performance was so so.
  • Its very hard to establish/keep stable Grid
    testbed.
  • Performance problems in GT2, and thus, in Ninf-G
  • Insights gained by the experiments gave important
    direction for Ninf-G2.
  • Ninf-G2 will be released at SC2003.

24
Special Thanks (for technical support) to
  • Kasetsart University (Thailand)
  • Sugree Phatanapherom
  • Doshisha University (Japan)
  • Yusuke Tanimura
  • University of Hong Kong (Hong Kong)
  • CHEN Lin, Elaine
  • KISTI (Korea)
  • Gee-Bum Koo, Jae-Hyuck
  • Tokyo Institute of Technology (Japan)
  • Kenichiro Shirose
  • NCHC (Taiwan)
  • Julian Yu-Chung Chen
  • Osaka University (Japan)
  • Susumu Date
  • AIST (Japan)
  • Grid Support Team
  • APAN
  • HK, TW, JP

25
For more info.
  • Ninf/Ninf-G
  • http//ninf.apgrid.org/
  • ninf_at_apgrid.org
  • JOGC paper
  • Y. Tanaka et.al., Ninf-G A Reference
    Implementation of RPC-based Programming
    Middleware for Grid Computing, JOGC, Vol.1,
    No.1, pp.41-51.
  • ApGrid
  • http//www.apgrid.org/
Write a Comment
User Comments (0)
About PowerShow.com