Title: Grid-Computing with NetSolve
1Grid-Computing with NetSolve
2Grid Computing with NetSolve
- NetSolve introduction, history and overview
- NetSolve collaborations
- NetSolve and the Grid ?
- NetSolve and Scheduling on the Grid
- Conclusion
3NetSolve Introduction
- Developed at the University of Tennessee and the
Oak Ridge National Lab. - Project leaders Jack Dongarra, Henri Casanova
- Source freely available
http//www.cs.utk.edu/netsolve
4NetSolve Genesis
- Started as an RPC system for Matlab (Cleve
Moller)
- Each host must have a Matlab license
- Limited to Matlabs functions (no LAPACK)
- Other similar projects (Multi-Matlab, etc)
- Rapidly distanced itself from Matlab
- Netlib repository free software
- Matlab people not too interested !!
- Easier to develop outside Matlab (v4.2) !!
5NetSolve Genesis
- Run-time system that provides access to freely
available software running on computational
servers - Easy to use for domain scientists
- Easy to add new software so the servers
- Easy to deploy (light-weight)
- Multiple user interfaces
- RPC programming model
6NetSolve Brief History
- Jan. 1996 v 1.0 (UNIX)
- Jan. 1997 v 1.1 (UNIX)
- Sep. 1998 v 1.2 (UNIX/Win32)
- Complete rewrite
- Mathematica interface
7NetSolve Overview
8NetSolve The server
- Daemon running on a computational server
- single host, cluster, Condor cluster, MPP.
- Provides access to problems that can be solved
using pre-installed software - Implements basic access control mechanisms
- Monitor its host workload when possible
- Reports to agent(s)
9NetSolve The Agent
- Daemon running on any host
- Maintains information on the available servers
- Gathers workload and network measurements
- Performs decisions for mapping tasks to resources
- There can be multiple agents
10NetSolve agent/server
- Non-hierarchical (no master agent)
- makes it easy to deploy
- Any agent/server can be stopped/restarted safely
- multiple institutions can
contribute - System can be started on an intranet or Internet
- System can be open, private or controlled
- Simple failure detection/restart mechanism
11NetSolve how it works
NetSolve server daemon
Register
Client stubs
NetSolve problem description files
Computational Modules
Java applet
- Problem description files
- Client download stubs at run-time
- Problem description files are portable
- Java applet to generate them
12Available software
- BLAS
- LAPACK
- ScaLAPACK
- ItPack
- PETSc
- Aztec
- FitPack
- FFTPack
- NAG software
- Minpack
- QMR
- ARPACK
- ImageVision
- MCell
software added by users
13NetSolve The client
Multiple interfaces
- Matlab, Mathematica
- C, Fortran
- Perl
- Java API, Java GUI
- MS Excel in progress
All interfaces implement same basic mechanisms
14NetSolve Matlab interface
gtgt netsolve_init gtgt netsolve
/LinearAlgebra/L3/dmatmul /LinearAlgebra/L3/li
nsol /ImageProcessing/Vision/filter gtgt
netsolve(linsol) solves Axb.
2 inputs matrix A, matrix b 1 output
matrix x
15NetSolve Matlab interface
Synchronous call
gtgt load(a) gtgt load(b) gtgt x netsolve(linsol,a,b
) x 12.326 23.432 . gtgt y
netsolve(linsol,aa,b) y 31.234
-0.323 .
16NetSolve Matlab interface
Asynchronous call
gtgt load(a) load(A) load(b) gtgt r1
netsolve_nb(linsol,a,b) r1 0 gtgt
r2 netsolve_nb(linsol,A,b) r2
1 gtgt netsolve_nb(status) request
0 done request 1 still pending gtgt
x netsolve_nb(wait,0) x 1.234
-4.534 ... gtgt y netsolve_nb(probe,1)
17NetSolve Fortran interface
parameter( MAX 100) double precision
A(MAX,MAX), B(MAX) integer IPIV(MAX), N, INFO,
LWORK integer NSINFO call DGESV(N,1,A,MAX,IPIV,B,
MAX,INFO)
call NETSL(DGESV(),NSINFO,
N,1,A,MAX,IPIV,B,MAX,INFO)
18NetSolve Parallel libraries
- NetSolve user is unaware of parallel processing
- NetSolve takes care of the starting the message
passing system, data distribution, and returning
the results.
19NetSolve Condor
Condor Pool (U of Wisconsin)
20NetSolve Ninf
NetSolve Network
ADAPTOR (Java)
Ninf Network
Ninf MetaServer
ETL (Tsukuba, Japan)
21NetSolve and the Grid
- Emergence of the Grid vision
- Can NetSolve be part of the Grid ?
- Somewhat a different philosophy
- Grid Hi-Perf resources, large-scale apps.
- Global infrastructure
- NetSolve various resources, small-scale apps.
- More reduced deployment
22(No Transcript)
23NetSolve Grid Middleware
Q What does NetSolve need to become Grid
middleware ?
A NetSolve provides the right level of
abstraction for computational services, but a
few more features
- More low-level control over jobs
- - non-location transparent interface
- - stop jobs, no automatic restart of
jobs - Ways to query the systems topology
- Better network/CPU/memory load sensors
- Ways to manage remote storage for data and
executables - Security
- General job-launching facility (batch systems)
- Interface to a Global information directory
service
24NetSolve on the Grid
Minor software enhancements NWS Globus seem
to do the trick
Minor software enhancements
- int netsl(128.34.45.43linsol(),)
- / performs no automatic
resubmission / - int netsl_kill(int request_id)
- / terminates a job /
- void netsl_info()
- / returns static and dynamic
information /
Use of the Network Weather Service (Rich Wolski,
U. of Tenn.)
25NetSolve and Globus
Many parts of Globus seem to provide exactly
whats needed
- GRAM job launching
- MDS information service (NWS-fed)
- GSI security
- GASS remote storage
- HBM liveliness
- GEM ? Nexus ??
Risks
Light-weight aspect of NetSolve lost ? What if
Globus fails ? (even though NT ) ) What if some
site just does not want Globus installed
? Developing on top of Globus is no easy task at
the moment
26NetSolve-Globus Design
Goals
- Maintain both modes of executions
- Isolate Globus-specific parts
Globus gatekeeper
NetSolve server daemon
Standard NetSolve protocol
Globus NetSolve protocol
GRAM
Local Disk
GSI
GASS storage (GEM ?)
HBM
Computational Module
Computational Module
NetSolve agent
NetSolve sub-tree
MDS
27NetSolve-Globus Status
- Proxy architecture in place
- First experiments with Globus underway
- Transparent access Globus just looks
- like more resources if you have a
certificate. - (Agent talks to MDS ?)
- Issues
- Globus shortcomings
- (GASS, map-files, MDS non-dist., ...)
- Globus stability and deployment ?
28NetSolve a research vehicle
Global deployment is an important
issue but NetSolve is also a great research tool
for experimenting in Grid environments right now.
Besides, some applications want the Grid NOW
! (after all the hype)
29Performance on the Grid ?
- The power-Grid analogy breaks
- down for performance.
- Scheduling seems to be the answer.
- Globus is a low-level infrastructure and
- does not provide scheduling facilities
- Hence projects like AppLeS (Pr. Fran Berman)
Application-level information
AppLeS scheduling agent
Static Grid information
Performance
Dynamic Grid information
30Scheduling Research with NetSolve ?
As middleware, NetSolve provides the ideal level
of abstraction for doing research on
Grid-scheduling for several classes of
applications. Even the simple RCP-style
applications can prove challenging to schedule on
the Grid. That research could then in turn be
deployed with NetSolve on the Grid. Bottomline
How to do AppLeS within NetSolve ?
31Scheduling in NetSolve
Issues
- NetSolves programing model is very general and
NetSolve - interfaces to arbitrary software
- Very little application-level
information. - Hence, the scheduler in the agent is primitive
as of now.
Solution
- Consider classes of applications
- Build NetSolve-based frameworks for the
applications - With focus on a built-in scheduler
32First approach Task farming
- New call in NetSolve for independent tasks
- Opportunity to experiment with scheduling
- Bypassing the agent for decision making
- (agent becomes an information service)
- Preliminary experimental results satisfactory
netsl_farm(i1,100,linsol, ltarrays of
pointersgt)
- Work queue scheduling - Queue size dynamically
tuned according to available resources -
Implemented with NetSolve internals (before
new non-location transparent interface)
33Target Application MCell
- MCell 3-D Monte-Carlo simulation of
- neuro-transmitter release in between cells.
- Developed at Salk Institute, Cornell U.
- Fits the farming semantic and need for NetSolve
List of seeds
Agent
Input files
NetSolve Servers
Output files
script
MCell
Scrip ...
Scrip ...
Scrip ...
Scrip ...
Input scripts
Scrip ...
Scrip ...
Scrip ...
Scrip ...
Scrip ...
Scrip ...
34Farmings shortcomings
- NetSolves farming interface is very general
- Fails to capture applications idiosyncrasies
Input Files
Task 1
Task 1
Task 1
Task 1
Taking advantage of file sharing is paramount
! Need for more evolved scheduling facilities
35Grid-Scheduling Templates
- Shortcomings of AppLeS
- Next generation of schedulers
- Idea - class of applications
- - common scheduler
- - ready-to-use
- First template Parameter Sweep Applications
- ( MCell, INS2D, )
- large number of independent tasks - inputs from
files or command line arguments - output to
files - user provides an executable
36Template structure
Application Specific Interfaces
A basis for a generic Template structure
File describing the entire application
Standard Interface
Data structures describing the application-level
information
NWS
Scheduler
Measurements/ Predictions
Data structures describing the environment
Requests for computation resources
Grid Middleware NetSolve,
Monitoring
The Grid Globus,
37Initial implementation
- First prototype completed last week
(non-NetSolve) - NetSolve code being finished
- Need for a remote storage infrastructure
- - GASS too crude for now (no name space)
- - IBP too early
- - At the moment, simulation with NetSolve
! - Scheduler early prototypes
- Use of NWS as a plug-in service
- Great infrastructure to do scheduling research
38Short- Long-Term Goals
- Large Mcell run with Large Condor Pools
- (first complete bio-chemical model of a cell)
- - scheduling over Condor ?
- - storage infrastructure ?
- Deployment of INS2D in production env.
- - Globus ?
short
- General template structure
- New PS applications (Monte-Carlo)
- New template for new applications
long
39Conclusion
- The Grid is an exciting playground
- - Lot of things are needed
- - Many research groups involved
- - Collaborations difficult but getting
there - - A lot of interest from domain scientists
- - Still need of a sociological model
- - Middleware is a key component.
- Scheduling is the key to Grid becoming a
reality, - i.e. usable by more than a selected set of
people, - not in demo mode