Title: Algorithms and Tools for (Distributed) Heterogeneous Computing
1Algorithms and Tools for (Distributed)
Heterogeneous Computing
- Yves ROBERT
- www.ens-lyon.fr/yrobert
2Contents
- Framework
- Hardware, system and administration issues,
applications - Programming environments
- Globus, Legion, Albatross, AppLeS, NetSolve
- Algorithmic and programming aspects
- Data decomposition techniques for cluster
computing - Granularity issues for metacomputing
- Scheduling and load-balancing methods
- Conclusion
3Bibliography
- Books
- The Grid Blueprint for a New Computing
Infrastructure - High Performance Cluster ComputingVol 1
Architecture and SystemsVol 2 Programming and
ApplicationsR. Buyya ed. Prentice Hall 1999. - Journals
- Blueprint for the future of high-performance
computing - The High-Performance computing continuumCACM
Nov. 1997 and Nov. 1998
4The Grid Blueprint for a New Computing
Infrastructure I. Foster, C. Kesselman (Eds),
Morgan Kaufmann, 1999
- ISBN 1-55860-475-8
- 22 chapters by expert authors including Andrew
Chien, Jack Dongarra, Tom DeFanti, Andrew
Grimshaw, Roch Guerin, Ken Kennedy, Paul Messina,
Cliff Neuman, Jon Postel, Larry Smarr, Rick
Stevens, and many others
5Bibliography (contd)
- Web
- NPCACI (National Partnership for Advanced
Computational Infrastructure) www.npaci.edu - An Overview of Computational Grids and Survey of
a Few Research Projects, Jack Dongarra
http//www.netlib.org/utk/people/JackDongarra/tal
ks.html - LIP Report 99-36
- Algorithms and Tools for (Distributed)
Heterogeneous Computing A Prospective Report
www.ens-lyon.fr/yrobert
6Framework
7Metacomputing
- Future of parallel computing distributed and
heterogeneous - Metacomputing Making use of distributed
collections of heterogeneous platforms - Target Tightly-coupled high-performance
distributed applications(rather than
loosely-coupled cooperative applications)
8Metacomputing Platforms (1)
- Low end of the field Cluster computing with
heterogeneous networks of workstations or PCs - Ubiquitous in university departments and
companies - Typical poor mans parallel computer
- Running large PVM or MPI experiments
- Make use of all available resources slower
machinesin addition to more recent ones
9Metacomputing Platforms (2)
- High end of the field Computational grid linking
the most powerful supercomputers of the largest
supercomputing centers through dedicated
high-speed networks. - Middle of the field Connecting medium size
parallel servers (equipped with
application-specific databases and application-or
iented software) through fast but non-dedicated,
thus creating a meta-system
10High end (1)
- Globus Ubiquitous Supercomputing Testbed
Organization (GUSTO) - November 1998, 70 institutions, 3 continents
- 17 sites, 330 supercomputers (over 3600
processors) - Aggregate global power in excess of 2 TeraFlops
per second!
11High end Gusto (2)
12Low end (1)
- Distributed ASCI Supercomputer (DAS)
- Common platform for research
- (Wide-area) parallel computing and distributed
applications - November 1998, 4 universities, 200 nodes
- Node
- 200 MHz Pentium Pro
- 128 MB memory, 2.5 GB disk
- Myrinet 1.28 Gbit/s (full duplex)
- Operating System BSD/OS
- ATM Network
13Low end (2)
14Administrative Issues (1)
- Intensive computations on a set of processors
across several countries and institutions - Strict rules to define the (good) usage of shared
resources se rules must be guaranteed by the
runtime, together with methods to migrate
computations to other sites whenever some local
request is raised
15Administratives Issues (2)
- A major difficulty is to avoid a large increase
in the administrative overhead - Each user cannot have an account on each machine
on the network - A single meta-user cannot be the one and only one
authorized user on the whole set of machines - Challenge find a tradeoff that does not
increase the administrative load while preserving
the userssecurity
16Tomorrows Virtual Super-Computer (1)
- The web (and the associate data-base) is built
using - A set of disks to store the data
- A network infrastructure enabling a large number
of users to access this data - Metacomputing
- Using the computing power of the computers linked
by Internet to execute various applications
(numerically-intensive applications first, but
many others to follow) - Internet will slowly evolve into a virtual
super-computer
17Tomorrows Virtual Super-Computer (2)
- Metacomputing applications will execute on a
hierarchical grid - Interconnection of clusters scattered all around
the world - A fundamental characteristic of the virtual
super-computer - A set of strongly heterogeneous and
geographically scattered resources
18Algorithmic and Software Issues (1)
Whereas the architectural vision is clear, the
software developments are not so well understood
19Algorithmic and Software Issues (2)
- Low end of the field
- Cope with heterogeneity
- Major algorithmic effort to be undertaken
- High end of the field
- Logically assemble the distributed computers
extensions to PVM and MPI to handle distributed
collection of clusters - Configuration and performance optimization
- Inherent complexity of networked and
heterogeneous systems - Resources often identified at runtime
- Dynamic nature of resource characteristics
20Algorithmic and Software Issues (3)
- High-performance computing applications must
- Configure themselves to fit the execution
environment - Adapt their behavior to subsequent changes in
resource characteristics - Parallel environments focused on strongly
homogeneous architectures (processor, memory,
network) - Array and loop distribution, parallelizing
compilers, HPF constructs, gang scheduling, MPI
However Metacomputing platforms are strongly
heterogeneous!
21Applications (1)
- All applications involving parallel
computingPerformance problems due to - Using a network of heterogeneous machines
- Relying on current (limited) programming
environments - Classical applications such as the grand
challenges can be ported on metacomputing
platforms - Forget fine-grain parallelism deep
hierarchy between all memory and communication
layers - Code coupling nicest application for
metacomputing
22Applications (2)
- Other applications out of the world of numerical
(or scientific) computing - data-bases, decision-support systems
- all kind of multimedia servers (PPI project at
Caltech) - Best candidates loosely-coupled applications
- All kinds of decomposition (functional, pipeline,
data-parallel, macrotasking, ...) - Actual challenge implementation of
tightly-coupled applications
23Programming environments
24Programing models (1)
- Extensions of MPI
- MPI_Connect, Nexus, PACX-MPI, MPI-Plus,
Data-Exchange, VCM, MagPIe, - Globus a layered approach
- Fundamental layer a set of core services,
including resource management, security, and
communications that enable the linking and
interoperation of distributed computer systems
25Programing models (2)
- Object-oriented technologies to cope with
heterogeneity - Encapsulate technical details'' such as
protocols, data representations, migration
policies - Legion is building on Mentat, an object-oriented
parallel processing system - Albatross relies on a high-performance Java
system, with a very efficient implementation of
Java Remote Method Invocation.
26Programing models (3)
- Far from achieving the holy goal
- Using the computing resources remotely and
transparently,just as we do with
electricity,without knowing where it comes from
27References
- Globus www.globus.org
- Legion www.cs.virginia.org/legion
- Albatross www.cs.vu.nl/bal/albatross
- AppLeSwww-cse.ucsd.edu/groups/hpcl/apples/apples.
html - NetSolve www.cs.utk.edu/netsolve
28Case study Globus
- A big machinery
- A sophisticated machinery
- The most widely used testbed
29Layered Architecture
Applications
High-level Services and Tools
GlobusView
Testbed Status
DUROC
globusrun
MPI
Nimrod/G
MPI-IO
CC
Core Services
GRAM
Nexus
Metacomputing Directory Service
Globus Security Interface
Heartbeat Monitor
Gloperf
GASS
30Core Globus Services
- Communication infrastructure (Nexus)
- Information services (MDS)
- Network performance monitoring (Gloperf)
- Process monitoring (HBM)
- Remote file and executable management (GASS and
GEM) - Resource management (GRAM)
- Security (GSI)
31Running a Program
- Goal Run a Message Passing Interface (MPI)
program on multiple computers - MPICH-G uses Globus for authentication, resource
allocation, executable staging, output
redirection, etc.
mpirun -np 4 my_app
1
32Globus Components in Action
mpirun
globusrun
DUROC
GRAM
GRAM
GRAM
fork
LSF
LoadLeveler
P2
P2
P2
P1
P1
P1
Nexus
33DUROC Review
- Simultaneous allocation of a resource set
- Handled via optimistic co-allocation based on
free nodes or queue prediction - In the future, advance reservations will also be
supported - globusrun will co-allocate specific
multi-requests - Uses a Globus component called the Dynamically
Updated Request OnlineCo-allocator (DUROC)
34Using Information forResource Brokering
Info service location selection
Metacomputing Directory Service
Resource Broker
What computers? What speed? When available?
20 Mb/sec
GRAM
Globus Resource Allocation Managers
50 processors storage from 1020 to 1040 pm
Fork LSF EASYLL Condor etc.
35Examples of Useful Information
- Characteristics of a compute resource
- IP address, software available, system
administrator, networks connected to, OS version,
load - Characteristics of a network
- Bandwidth and latency, protocols, logical
topology - Characteristics of the Globus infrastructure
- Hosts, resource managers
36Metacomputing Directory Service
- Store information in a distributed directory
- Directory stored in collection of servers
- Directory can be updated by
- Globus system
- Other information providers and tools
- Applications (i.e., users)
- Information dynamically available to
- Tools
- Applications
37Remote Service Request
init_rsr() put_int() put_float() send_rsr()
handler get_int() get_float()
- Allow method to be selected independently either
automatically or manually
send
receive
select comm method
Application Level
startpoint
endpoint
Implementation Level
available methods
38Algorithmic issues
39Data Decomposition Techniques for Cluster
Computing
- Block-cyclic distribution paradigm preferred
layout for data-parallel programs (HPF,
ScaLAPACK) - Evenly balances total workload only if all
processors have same speed - Extending ScaLAPACK to heterogeneous clusters
turns out to be surprisingly difficult
40Algorithmic challenge
- Bad news designing a matrix-matrix product or a
dense linear solver proves a hard task on a
heterogeneous cluster! - Next problems
- Simple linear algebra kernels on a collection of
clusters (extending the platform) - More ambitious routines, composed of a variety
of elementary kernels, on a heterogeneous cluster
(extending the application) - Implementing more ambitious routines on more
ambitious platforms (extending both)
41Scheduling (1)
- Two-step clustering heuristics
for classical parallel machines - Difficult to trade-off parallelism and
communication even in the presence of
unlimited resources
42Scheduling (2)
- Heterogeneity poses new challenges to scheduling
techniques - Clustering with unlimited resources has no more
meaning - Sophisticated scheduling heuristics available,
such as a dynamic remapping of tasks after having
computed a first allocation based on critical
paths
43Load-balancing (1)
- Distributing the computations (together with the
associated data) can be performed either
dynamically or statically, or a mixture of both. - Some simple schedulers are available, but they
use naive mapping strategies - master-slave techniques
- use the past to predict the future''
44Load-balancing (2)
- Trade-off between the data distribution
parameters and the process spawning and possible
migration policies - Redundant computations might also be necessary to
use a heterogeneous cluster at its best
capabilities
45AppLeS, a high-level scheduling and
load-balancing tool
- Both application-specific and system-specific
information are required for good schedules - Dynamic information is necessary to accurately
assess system state. Predictions are accurate
only within a particular time frame
? Network Weather Service - Built on top of Globus or Legion
46NetSolve
- The remote computing paradigm
- the program resides on the server
- the user's data is sent to the server, where the
appropriate programs or numerical libraries
operate on it - the result is then sent back to the user's
machine.
47NetSolve - The big picture
Request
48NetSolve - Solving a Problem
Computational Servers
49The ScaLAPACK NetSolve Server
50ScaLAPACK on heterogeneous clusters
- Dynamic allocation strategies not suited
- large (prohibitive?) communication overhead
- dependences may keep fast processors idle
- Static allocation load inversely proportional to
processor speed - efficient but difficult to accurately estimate
and predict speed - static communication schemes and static memory
allocation (for library users)
51Matrix product on a 2D grid
c
j
P
r
i
ij
- Intuition load of a processor inversely
proportional to its speed
52 The 2D grid allocation problem
- Maximize amount of work ( ? r ) ( ? c
) - Subject to constraints ? i,j r t
c ? 1 t cycle-time of
processor P
j
i
ij
i
j
ij
ij
53Grid layout yet to be found!
P
ij
- Search over all permutations
- Use heuristics to solve this problem!
54Collections of clusters (1)
Fast link
Slower link
55Collections of clusters (2)
- Introduce yet another level of granularity
- Overlap inter-cluster communicationwith
independent computation - Static approach ?
- So far Globus uses a batch system ? dedicated
machines - Not sufficient in the long-term
56Conclusion
57(A) Europe
- While there are several projects related to
metacomputing in Europe, there is little
coordination and exchange between these projects - Only few European institutions have joined the
NPACI initiative only three international
affiliates in Europe
58(B) Algorithmic issues
- Difficulties seem largely underestimated
- Data decomposition, scheduling heuristics, load
balancing become extremely difficult in the
context of metacomputing platforms - Research community focuses on low-level
communication protocols and distributed system
issues (light-weight process invocation,
migration, ...)
59(C) Programming level
- Which is the good level ?
- Data-parallelism unrealistic, due to
heterogeneity - Explicit message passing too low-level
- Object-oriented approaches still request the user
to have a deep knowledge of both its application
behavior and the underlying resources - Remote computing systems (NetSolve) face severe
limitations to efficiently load-balance the work - Relying on specialized but highly-tuned libraries
of all kinds may prove a good trade-off
60(D) Applications
- Key applications (from scientific computing to
data-bases) have dictated the way classical
parallel machines are used, programmed, and even
updated into more efficient platforms - Key applications will strongly influence, or even
guide, the development of metacomputing
environments
61(D) Applications (contd)
- Which applications will be worth the abundant but
hard-to-access resources of the grid ? - tightly-coupled grand challenges ?
- mobile computing applications ?
- micro-transactions on the Web ?
- All these applications require new programming
paradigms to enable inexperienced users to access
the magic grid!