Title: Developing Applications on Todays Grids
1Developing Applications on Todays Grids
- Tom Goodale
- Max Planck Institute for
- Gravitational Physics
- goodale_at_aei.mpg.de
2Grid Projects
- Globus
- GrADS
- Condor
- GridLab
- DataGrid
- Many more, see for example
- http//www-fp.mcs.anl.gov/foster/grid-projects/
- http//www.gridcomputing.com/
3Globus
- http//www.globus.org
- Large and established project which has
contributed much Grid middleware - Based at Argonne National Laboratories (USA)
- Globus is the most widely deployed software for
Grid computing - The Globus Project is developing fundamental
technologies needed to build computational grids.
4GrADS
- http//hipersoft.cs.rice.edu/grads/
- Grid Application Development Software
- The goal of the Grid Application Development
Software (GrADS) Project is to simplify
distributed heterogeneous computing in the same
way that the World Wide Web simplified
information sharing over the Internet. The GrADS
project will explore the scientific and technical
problems that must be solved to make grid
application development and performance tuning
for real applications an everyday practice.
5Condor
- http//www.cs.wisc.edu/condor/
- The goal of the Condor Project is to develop,
implement, deploy, and evaluate mechanisms and
policies that support High Throughput Computing
(HTC) on large collections of distributively
owned computing resources. Guided by both the
technological and sociological challenges of such
a computing environment, the Condor Team has been
building software tools that enable scientists
and engineers to increase their computing
throughput.
6GridLab
- http//www.gridlab.org
- The GridLab project will develop an
easy-to-use, flexible, generic and modular Grid
Application Toolkit (GAT), enabling todays
applications to make innovative use of global
computing resources. The project is grounded by
two principles, (i) the co-development of
infrastructure with real applications and user
communities, leading to working scenarios, and
(ii) dynamic use of grids, with self-aware
simulations adapting to their changing
environment.
7DataGrid
- http//eu-datagrid.web.cern.ch/eu-datagrid/
- DataGrid is a project funded by European Union.
The objective is to build the next generation
computing infrastructure providing intensive
computation and analysis of shared large-scale
databases, from hundreds of TeraBytes to
PetaBytes, across widely distributed scientific
communities.
8...
- There are many more projects the preceding was
just a sample - More projects are starting all the time
- The Grid is gaining interest...
9Grid Infrastructure
- There is already a lot of infrastructure out
there to help one run applications on grids - Not so much infrastructure so far for tailoring
applications to run on grids, but that doesn't
stop existing legacy applications being able to
run in grid environments. - Lots of effort currently underway to develop
portals to facilitate use of the existing
infrastructure with existing applications.
10Globus
- Globus project has developed the most widely
deployed grid infrastructure - This infrastructure splits into
- Security
- Data management
- Resource management
- Information systems
- Additionally the project have developed an MPI
implementation which helps MPI applications to
run across multiple computational resources.
11GSI
- The Globus Toolkit uses the Grid Security
Infrastructure for enabling secure authentication
and communication over an open network. GSI
provides a number of useful services for Grids,
including mutual authentication and single
sign-on. - GSI uses public key encryption and X.509
certificates, along with SSL, with extensions to
allow single sign on and credential delegation. - Globus implementation GSSAPI compliant
12Data Management
- The Globus Toolkit includes various data
management components - GridFTP
- GSI enabled FTP including multiple parallel
streams to increase overall throughput - Data Replication
- Multiple copies of data distributed to allow
faster access - Toolkit provides replica catalogue and replica
management software - GASS
- Global Access to Secondary Sources
- Can access data from anywhere with a URL
13Resource Management
- Globus Resource Allocation Manager (GRAM)
- GRAM processes the requests for resources for
remote application execution, allocates the
required resources, and manages the active jobs.
It also returns updated information regarding the
capabilities and availability of the computing
resources to the Monitoring and Discovery Service
(MDS). - GRAM provides an API for submitting and canceling
a job request, as well as checking the status of
a submitted job. The specifications are written
by the user in the Resource Specification
Language (RSL), and is processed by GRAM as part
of the job request.
14Information Systems
- Monitoring and Discovery Service (MDS)
- The MDS contains static and dynamic information
about compute resources, as well as static and
dynamic information about the network performance
between compute resources. - LDAP based database
- Hierarchical
15MPICH-G2
- Can be used to run across multiple distributed
parallel resources - Based on the widely available MPICH MPI
implementation from Argonne in fact a standard
device which may be built if globus is installed
on the system - May use vendor's native MPI implementation for
intra-machine communication - Uses Globus infrastructure to launch jobs on
remote resources
16Condor
- Condor converts collections of distributively
owned workstations and dedicated clusters into a
distributed high-throughput computing facility. - Uses ClassAds to specify resource requirements
for jobs. - Contains checkpointing and process migration
- Can use Globus to be batch system across multiple
resources
17How Do Applications Use These ?
- To use Condor or MPICH-G2 no changes need to be
made to the application to make use of new
distributed features - So simple applications used in their current mode
may be able to use infrastructure transparently. - More complicated applications have more needs...
18What is an Application ?
- Sometimes causes much confusion in conversations
- Is an application a single process, or many
processes collaborating to perform some task ? - For the purposes of this tutorial I will define
an application as the latter - An application is one or more processes which
perform a particular task such as a simulation or
a calculation on behalf of a user - E.g. all processes in an MPI Job
19Application Developers
What do Application Developers Need to Think
About in Grid Environments ?
- This is very similar to the requirements for an
application to be able to run on many different
architectures - Need now to also think that not all processes in
an application are necessarily running on the
same resource or even the same architecture - Not all processes have access to the same
environment, or may be able to reach the same set
of remote resources
20IO
- As discussed for frameworks, files must be in
some format which is readable on all
architectures - Not all processes may have access to the same
file systems, so may need to use communication
technologies to access files remotely - The user may not ordinarily have access to any of
the filespaces accessible to the application, so
there must be some way to migrate files to and
from the space available to the application.
21Parallel Issues
- If using MPI, must be an MPI version which can
run heterogeneously - e.g. MPICH-G2, PACX.
- When running across multiple resources, the
bandwidth and latencies of communication between
processes on different resources is much greater
than between processes on a single resource - Need to think about communication patterns is
it possible to reduce the amount of communication
by, for example, buffering data for longer and
sending larger batches of data.
22Inter Process Communication
- Need to locate other processes in application
- These may be on remote resources
- Remote resources may be firewalled
23Portability
- Need to be able to compile and run in
heterogeneous environments - Not all resources have the same sets of software
available - When starting a distributed application, how does
one make sure that there is a suitable executable
there ? - Should base code on standards, not on individual
compiler vendors' specific features.
24Firewalls
- In the modern world a lot of resources are
protected by firewalls. These restrict the ports
which may be access from the outside world, and
often the locations in the outside world from
which these ports may be opened. - Not generally a problem for an application
running on this resource - A real problem for monitoring such an application
- A real problem for running an application across
multiple such resources
25Testbeds - What and Why ?
- A testbed is a (heterogeneous) set of machines
which you may test your application on. - May or may not have a uniform distribution of
grid infrastructure. - Why use one ?
- Set of resources which you can find out about and
have accounts on. - Can ask the sysadmins what went wrong.
- Can request installation of other software
- Can thus test your application in a Grid
environment with less pain than on a random set
of machines
26Grid Programming Tools
- While there are many Grid projects, and much grid
middleware, there is, to date, very little in the
way of toolkits which make it easy for an
application developer to write an application
which makes full use of the possibilities of the
Grid. - Both MPICH-G2 and Condor allow specific classes
of applications to make use of the power of the
grid to run distributed applications, however
access to resource and data management is still
hard to do from an application, and IPC for
distributed application is still hard.
27GAT - What ?
- The Grid Application Toolkit (GAT), which is
currently being developed by the GridLab project
aims to make this easier - The GAT aims to develop an API to enable
application developers to make use of the best
Grid infrastructure when and as it becomes
available - The GAT API allows access to fundamental grid
operations - The GAT abstracts these operations allowing
access to alternative implementations or
instances of entities providing these operations
28GAT - Why ?
- People want to use the Grid
- However they don't want to have to learn all
about the various Grid technologies - Users want to just submit a job and get results
back - Application developers want to be able to write
applications which can access Grid resources and
run in a Grid environment they dont want to
have to rewrite parts of their application when
new technologies come along - Want to be able to have applications developed
today, so they can use the Grid as it emerges. - Provides a buffer zone between applications and
the Grid.
29The Grid is complex
Cactus
Is there a better resource I could be using?
SOAP
WSDL
Corba
OGSA
Other
Monitoring
Profiling
Information
Logging
Security
Resource Management
Notification
Application Manager
Migration
Data Management
GLOBUS
Other Grid Infrastructure?
30need to make it easier to use
Cactus
Is there a better resource I could be using?
GAT_FindResource( )
GAT
The Grid
31GridLab Architecture
- The GridLab architecture is split in several
pieces - The application itself
- A library which interfaces between the
application and the Grid middleware - The Grid middleware.
- The GridLab project aims to develop the library
the GAT Engine - and a set of middleware
GridLab services.
32GAT What is It?
- GAT Grid Application Toolkit
- Implements the GAT-API
- Used by applications (different languages)
- GAT Adaptors
- Connect to
- capabilities/services
- GAT Engine
- Provides the
- function bindings
- for the GAT-API
33The GAT Engine
- This is a library which applications link against
to make use of Grid infrastructure - It provides stub calls for the basic Grid
operations - Applications can always make calls to any of
these operations, and will get an error back if
it is not available. - Thus an application need not be re-written,
recompiled or re-linked to make use of new
middleware. - The actual access to middleware is provided by
dynamically loadable modules which provide access
to specific implementations of these grid
operations
34GAT Engine
- When an application makes a GAT-API call, the
engine searches through an internal database of
adaptors for the requested capability and calls it
35GAT Adaptor
- Interface between GAT Engine and one or more
capabilities - Translates user requests to appropriate interface
syntax for a capability provider - Active adaptors change dynamically
- Includes security context
- Return appropriate error codes
- Examples
- OGSA adaptor (provides many capabilities)
- GRAM adaptor (directly talk to gatekeepers)
- Adaptors for each GridLab service provider
- Local adaptors (GAT_MoveFile gt cp,
GATFindResource gt localhost)
36GAT Adaptor
37GAT Adaptor Initialisation
38GAT Adaptor Call (I)
39GAT Adaptor Call (ii)
40Current Status
- Many GridLab services in development or available
in prototype or alpha release - Data management, resource brokering, application
monitoring, information systems, access for
mobile devices - GridLab portal under development
- GAT Engine available in prototype form
- Usable with a test API
- Allows access to available GridLab services and
some other Grid middleware - Grid Operations for GAT API being identified and
codified and actual API being developed.
41The Same Application
Laptop
Super Computer
The Grid
Application
Application
Application
GAT
GAT
GAT
Firewall issues!
No network!
42Getting Ready For The Grid
- Grid Toolkits and middleware aren't a magic wand
which will 'grid enable' your application. You
still need to think ! - Unless your application is very simple, or makes
use of a framework, you are very likely to need
to modify it. - Use standards !
- This is basic to any portable application
- If you want to locate data you need to be able to
describe it. - The grid will not magically decrease data
processing time for data intensive applications
unless the data can be described adequately.
43Getting Ready ...
- As said before, simple applications can be 'grid
enabled' in a basic way by use of MPICH-G2 or
Condor. - In fact any basic MPI application is 'grid
enabled' ! However it may need modification to
run optimally in a Grid environment.
44Various Scenarios
- Application steering
- This should be done in some standard way so that
in the future it can be replaced by some
actuators from some toolkit which gives
authentication and authorisation - Checkpointing and IO
- Should be in some standard file format
- May want to advertise files to some data
managements system - Visualisation
- Again standard file or data formats allow grid
middleware to operate - Can be linked to file advertising
45What Frameworks and Toolkits Give You
- Frameworks such as Cactus give you a lot of these
things. Using such a framework frees you as an
application developer from having to worry about
a lot of issues the framework developers have
hopefully done it instead. - Similarly toolkits such as the GAT free you from
having to worry about specific Grid
infrastructure. All the access to Grid
middleware, and worrying about how it is deployed
can be delegated to the toolkit. - Using frameworks and toolkits frees you from
having to worry about a lot of generic things,
leaving you more time and energy to work on
application specific things.
46Example
- A simple of example of an GAT enabled application
will be presented and run. - This will be available from
- http//www.gridlab.org/WorkPackages/wp-1/Examples