Grid Computing - PowerPoint PPT Presentation

About This Presentation

Title:

Grid Computing

Description:

high throughput (stream of jobs) ... Scheduling mechanisms for grid. Berman, 1998 (ext. by Kayser, 2006): Job scheduler ... GrADS ... – PowerPoint PPT presentation

Number of Views:54

Avg rating:3.0/5.0

Slides: 58

Provided by: FCUP

Category:

more less

Transcript and Presenter's Notes

Title: Grid Computing

1
Grid Computing
Grid Computing

02/05/2008

Grid Systems and scheduling

2
Grid systems

Many!!!
Classification (depends on the author)
Computational grid
distributed supercomputing (parallel application
execution on multiple machines)
high throughput (stream of jobs)
Data grid provides the way to solve large scale
data management problems
Service grid systems that provide services that
are not provided by any single local machine.
on demand aggregate resources to enable new
services
Collaborative connect users and applications via
a virtual workspace
Multimedia infrastructure for real-time
multimedia applications

3
Taxonomy of Applications

Distributed supercomputing consume CPU cycles
and memory
High-Throughput Computing unused processor
cycles
On-Demand Computing meet short-term requirements
for resources that cannot be cost-effectively or
conveniently located locally.
Data-Intensive Computing
Collaborative Computing enabling and enhancing
human-to-human interactions (eg CAVE5D system
supports remote, collaborative exploration of
large geophysical data sets and the models that
generated them)

4
Alternative classification

independent tasks
loosely-coupled tasks
tightly-coupled tasks

5
Application Management

Description
Partitioning
Mapping
Allocation

6
Description

Use a grid application description language
Grid-ADL and GEL
One can take advantage of loop construct to use
compilation mechanisms for vectorization

7
Grid-ADL
Traditional systems
1
2
5
6
alternative systems
1
..
2
5
6
8
Partitioning/Clustering

Application represented as a graph
Nodes job
Edges precedence
Graph partitioning techniques
Minimize communication
Increase throughput or speedup
Need good heuristics
Clustering

9
Graph Partitioning

Optimally allocating the components of a
distributed program over several machines
Communication between machines is assumed to be
the major factor in application performance
NP-hard for case of 3 or more terminals

10
Collapse the graph

Given G N, E, M
N is the set of Nodes
E is the set of Edges
M is the set of machine nodes

11
Dominant Edge

Take node n and its heaviest edge e
Edges e1,e2,er with opposite end nodes not in M
Edges e1,e2,ek with opposite end nodes in M
If w(e) Sum(w(ei)) max(w(e1),,w(ek))
Then the min-cut does not contain e
So e can be collapsed

12
Machine Cut

Let machine cut Mi be the set of all edges
between a machine mi and non-machine nodes N
Let Wi be the sum of the weights of all edges in
the machine cut Mi
Wis are sorted so
W1 W2
Any edge that has a weight greater than W2 cannot
be part of the min-cut

13
Zeroing

Assume that node n has edges to each of the m
machines in M with weights
w1 w2 wm
Reducing the weights of each of the m edges from
n to machines M by w1 doesnt change the
assignment of nodes for the min-cut
It reduces the cost of the minimum cut by (m-1)w1

14
Order of Application

If the previous 3 techniques are repeatedly
applied on a graph until none of them are
applicable
Then the resulting reduced graph is independent
of the order of application of the techniques

15
Output

List of nodes collapsed into each of the machine
nodes
Weight of edges connecting the machine nodes
Source Graph Cutting Algorithms for Distributed
Applications Partitioning, Karin Hogstedt, Doug
Kimelman, VT Rajan, Tova Roth, and Mark Wegman,
2001
homepages.cae.wisc.edu/ece556/fall2002/PROJECT/di
stributed_applications.ppt

16
Graph partitioning

Hendrickson and Kolda, 2000 edge cuts
are not proportional to the total communication
volume
try to (approximately) minimize the total volume
but not the total number of messages
do not minimize the maximum volume and/or number
of messages handled by any single processor
do not consider distance between processors
(number of switches the message passes through,
for example)
undirected graph model can only express symmetric
data dependencies.

17
Graph partitioning

To avoid message contention and improve the
overall throughput of the message traffic, it is
preferable to have communication restricted to
processors which are near each other
But, edge-cut is appropriate to applications
whose graph has locality and few neighbors

18
Kwok and Ahmad, 1999 multiprocessor scheduling
taxonomy
19
List Scheduling

make an ordered list of processes by assigning
them some priorities
repeatedly execute the following two steps until
a valid schedule is obtained
Select from the list, the process with the
highest priority for scheduling.
Select a resource to accommodate this process.
priorities are determined statically before the
scheduling process begins. The first step chooses
the process with the highest priority, the second
step selects the best possible resource.
Some known list scheduling strategies
Highest Level First algorithm or HLF
Longest Path algorithm or LP
Longest Processing Time
Critical Path Method
List scheduling algorithms only produce good
results for coarse-grained applications

20
Static scheduling task precedence graphDSC
Dominance Sequence Clustering

Yang and Gerasoulis, 1994 two step method for
scheduling with communication(focus on the
critical path)
schedule an unbounded number of completely
connected processors (cluster of tasks)
if the number of clusters is larger than the
number of available processors, then merge the
clusters until it gets the number of real
processors, considering the network topology
(merging step).

21
Graph partitioning

Kumar and Biswas, 2002 MiniMax
multilevel graph partitioning scheme
Grid-aware
consider two weighted undirected graphs
a work-load graph (to model the problem domain)
a system graph (to model the heterogeneous system)

22
Resource Management
(1988)
Source P. K. V. Mangan, Ph.D. Thesis, 2006
23
Resource Management

The scheduling algorithm has four components
transfer policy when a node can take part of a
task transfer
selection policy which task must be transferred
location policy which node to transfer to
information policy when to collect system state
information.

24
Resource Management

Location policy
Sender-initiated
Receiver-initiated
Symetrically-initiated

25
Scheduling mechanisms for grid

Berman, 1998 (ext. by Kayser, 2006)
Job scheduler
Resource scheduler
Application scheduler
Meta-scheduler

26
Scheduling mechanisms for grid

Legion
University of Virginia (Grimshaw, 1993)
Supercomputing 1997
Currently Avaki commercial product

27
Legion

is an object oriented infrastructure for grid
environments layered on top of existing software
services.
uses the existing operating systems, resource
management tools, and security mechanisms at host
sites to implement higher level system-wide
services
design is based on a set of core objects

28
Legion

resource management is a negotiation between
resources and active objects that represent the
distributed application
three steps to allocate resources for a task
Decision considers tasks characteristics and
requirements, resources properties and policies,
and users preferences
Enactment the class object receives an
activation request if the placement is
acceptable, start the task
Monitoring ensures that the task is operating
correctly

29
Globus

Toolkit with a set of components that implement
basic services
Security
resource location
resource management
data management
resource reservation
Communication
From version 1.0 in 1998 to the 2.0 release in
2002 and the latest 3.0, the emphasis is to
provide a set of components that can be used
either independently or together to develop
applications
The Globus Toolkit version 2 (GT2) design is
highly related to the architecture proposed by
Foster et al.
The Globus Toolkit version 3 (GT3) design is
based on grid services, which are quite similar
to web services. GT3 implements the Open Grid
Service Infrastructure (OGSI).
The current version, GT4, is also based on grid
services, but with some changes in the standard

30
Globus scheduling

GRAM Globus Resource Allocation Manager
Each GRAM responsible for a set of resources
operating under the same site-specific allocation
policy, often implemented by a local resource
management
GRAM provides an abstraction for remote process
queuing and execution with several powerful
features such as strong security and file
transfer
It does not provide scheduling or resource
brokering capabilities but it can be used to
start programs on remote resources, despite local
heterogeneity due to the standard API and
protocol.
Resource Specification Language (RSL) is used to
communicate requirements.
To take advantage of GRAM, a user still needs a
system that can remember what jobs have been
submitted, where they are, and what they are
doing.
To track large numbers of jobs, the user needs
queuing, prioritization, logging, and accounting.
These services cannot be found in GRAM alone, but
are provided by systems such as Condor-G

31
MyGrid and OurGrid

Mainly for bag-of-tasks (BoT) applications
uses the dynamic algorithm Work Queue with
Replication (WQR)
hosts that finished their tasks are assigned to
execute replicas of tasks that are still running.
Tasks are replicated until a predefined maximum
number of replicas is achieved (in MyGrid, the
default is one).

32
OurGrid

An extension of MyGrid
resource sharing system based on peer-to-peer
technology
resources are shared according to a network of
favors model, in which each peer prioritizes
those who have credit in their past history of
interactions.

33
GrADS

is an application scheduler
The user invokes the Grid Routine component to
execute an application
The Grid Routine invokes the component Resource
Selector
The Resource Selector accesses the Globus
MetaDirectory Service (MDS) to get a list of
machines that are alive and then contact the
Network Weather Service (NWS) to get system
information for the machines.
The Grid Routine then invokes a component called
Performance Modeler with the problem parameters,
machines and machine information.
The Performance Modeler builds the final list of
machines and sends it to the Contract Developer
for approval.
The Grid Routine then passes the problem, its
parameters, and the final list of machines to the
Application Launcher.
The Application Launcher spawns the job using the
Globus management mechanism (GRAM) and also
spawns the Contract Monitor.
The Contract Monitor monitors the application,
displays the actual and predicted times, and can
report contract violations to a re-scheduler.
Although the execution model is efficient from
the application perspective, it does not take
into account the existence of other applications
in the system.

34
GrADS

Vadhiyar and Dongarra, 2002 proposed a
metascheduling architecture in the context of the
GrADS Project.
The metascheduler receives candidate schedules of
different application level schedulers and
implements scheduling policies for balancing the
interests of different applications.

35
EasyGrid

Mainly concerned with MPI applications
Allows intercluster execution of MPI processes

36
Nimrod

uses a simple declarative parametric modeling
language to express parametric experiments
provides machinery that automates
task of formulating,
running,
monitoring,
collating results from the multiple individual
experiments.
incorporates distributed scheduling that can
manage the scheduling of individual experiments
to idle computers in a local area network
has been applied to a range of application areas,
e.g. Bioinformatics, Operations Research,
Network Simulation, Electronic CAD, Ecological
Modelling and Business Process Simulation.

37
Nimrod/G
38
AppLeS

UCSD (Berman and Casanova)
Application parameter Sweep Template
Use scheduling based on min-min, min-max,
sufferage, but with heuristics to estimate
performance of resources and tasks
Performance information dependent algorithms
(pida)
Main goal to minimize file transfers

39
GRAnD Kayser et al., CCPE, 2007

Distributed submission control
Data locality
automatic staging of data
optimization of file transfer

40
Distributed submission
Results of simulation with Monarc
http//monarc.web.cern.ch/MONARC/ Kayser, 2006
41
GRAnD

Experiments with Globus
Discussion list discuss_at_globus.org (05/02/2004)
Submission takes 2s per task
Place 200 tasks in the queue 6min
Maximum number of tasks few hundreds
experiments in CERN (D. Foster et al. 2003)
16s to submit a task
Saturation in the server 3.8 tasks/minute

42
GRAnD

Grid Robust Application Deployment

43
GRAnD
44
GRAnD data management
45
GRAnD data management
46
Comparison (Kayser, 2006)
47
Comparison (Kayser, 2006)
48
Condor performance
49
Condor performance
50
Condor x AppMan
51
Condor performance