Title: Grid Computing
1Grid Computing
Grid Computing
02/05/2008
- Grid Systems and scheduling
2Grid systems
- Many!!!
- Classification (depends on the author)
- Computational grid
- distributed supercomputing (parallel application
execution on multiple machines) - high throughput (stream of jobs)
- Data grid provides the way to solve large scale
data management problems - Service grid systems that provide services that
are not provided by any single local machine. - on demand aggregate resources to enable new
services - Collaborative connect users and applications via
a virtual workspace - Multimedia infrastructure for real-time
multimedia applications
3Taxonomy of Applications
- Distributed supercomputing consume CPU cycles
and memory - High-Throughput Computing unused processor
cycles - On-Demand Computing meet short-term requirements
for resources that cannot be cost-effectively or
conveniently located locally. - Data-Intensive Computing
- Collaborative Computing enabling and enhancing
human-to-human interactions (eg CAVE5D system
supports remote, collaborative exploration of
large geophysical data sets and the models that
generated them)
4Alternative classification
- independent tasks
- loosely-coupled tasks
- tightly-coupled tasks
5Application Management
- Description
- Partitioning
- Mapping
- Allocation
6Description
- Use a grid application description language
- Grid-ADL and GEL
- One can take advantage of loop construct to use
compilation mechanisms for vectorization
7Grid-ADL
Traditional systems
1
2
5
6
alternative systems
1
..
2
5
6
8Partitioning/Clustering
- Application represented as a graph
- Nodes job
- Edges precedence
- Graph partitioning techniques
- Minimize communication
- Increase throughput or speedup
- Need good heuristics
- Clustering
9Graph Partitioning
- Optimally allocating the components of a
distributed program over several machines - Communication between machines is assumed to be
the major factor in application performance - NP-hard for case of 3 or more terminals
10Collapse the graph
- Given G N, E, M
- N is the set of Nodes
- E is the set of Edges
- M is the set of machine nodes
11Dominant Edge
- Take node n and its heaviest edge e
- Edges e1,e2,er with opposite end nodes not in M
- Edges e1,e2,ek with opposite end nodes in M
- If w(e) Sum(w(ei)) max(w(e1),,w(ek))
- Then the min-cut does not contain e
- So e can be collapsed
12Machine Cut
- Let machine cut Mi be the set of all edges
between a machine mi and non-machine nodes N - Let Wi be the sum of the weights of all edges in
the machine cut Mi - Wis are sorted so
- W1 W2
- Any edge that has a weight greater than W2 cannot
be part of the min-cut
13Zeroing
- Assume that node n has edges to each of the m
machines in M with weights - w1 w2 wm
- Reducing the weights of each of the m edges from
n to machines M by w1 doesnt change the
assignment of nodes for the min-cut - It reduces the cost of the minimum cut by (m-1)w1
14Order of Application
- If the previous 3 techniques are repeatedly
applied on a graph until none of them are
applicable - Then the resulting reduced graph is independent
of the order of application of the techniques
15Output
- List of nodes collapsed into each of the machine
nodes - Weight of edges connecting the machine nodes
- Source Graph Cutting Algorithms for Distributed
Applications Partitioning, Karin Hogstedt, Doug
Kimelman, VT Rajan, Tova Roth, and Mark Wegman,
2001 - homepages.cae.wisc.edu/ece556/fall2002/PROJECT/di
stributed_applications.ppt
16Graph partitioning
- Hendrickson and Kolda, 2000 edge cuts
- are not proportional to the total communication
volume - try to (approximately) minimize the total volume
but not the total number of messages - do not minimize the maximum volume and/or number
of messages handled by any single processor - do not consider distance between processors
(number of switches the message passes through,
for example) - undirected graph model can only express symmetric
data dependencies.
17Graph partitioning
- To avoid message contention and improve the
overall throughput of the message traffic, it is
preferable to have communication restricted to
processors which are near each other - But, edge-cut is appropriate to applications
whose graph has locality and few neighbors
18Kwok and Ahmad, 1999 multiprocessor scheduling
taxonomy
19List Scheduling
- make an ordered list of processes by assigning
them some priorities - repeatedly execute the following two steps until
a valid schedule is obtained - Select from the list, the process with the
highest priority for scheduling. - Select a resource to accommodate this process.
- priorities are determined statically before the
scheduling process begins. The first step chooses
the process with the highest priority, the second
step selects the best possible resource. - Some known list scheduling strategies
- Highest Level First algorithm or HLF
- Longest Path algorithm or LP
- Longest Processing Time
- Critical Path Method
- List scheduling algorithms only produce good
results for coarse-grained applications
20Static scheduling task precedence graphDSC
Dominance Sequence Clustering
- Yang and Gerasoulis, 1994 two step method for
scheduling with communication(focus on the
critical path) - schedule an unbounded number of completely
connected processors (cluster of tasks) - if the number of clusters is larger than the
number of available processors, then merge the
clusters until it gets the number of real
processors, considering the network topology
(merging step).
21Graph partitioning
- Kumar and Biswas, 2002 MiniMax
- multilevel graph partitioning scheme
- Grid-aware
- consider two weighted undirected graphs
- a work-load graph (to model the problem domain)
- a system graph (to model the heterogeneous system)
22Resource Management
(1988)
Source P. K. V. Mangan, Ph.D. Thesis, 2006
23Resource Management
- The scheduling algorithm has four components
- transfer policy when a node can take part of a
task transfer - selection policy which task must be transferred
- location policy which node to transfer to
- information policy when to collect system state
information.
24Resource Management
- Location policy
- Sender-initiated
- Receiver-initiated
- Symetrically-initiated
25Scheduling mechanisms for grid
- Berman, 1998 (ext. by Kayser, 2006)
- Job scheduler
- Resource scheduler
- Application scheduler
- Meta-scheduler
26Scheduling mechanisms for grid
- Legion
- University of Virginia (Grimshaw, 1993)
- Supercomputing 1997
- Currently Avaki commercial product
27Legion
- is an object oriented infrastructure for grid
environments layered on top of existing software
services. - uses the existing operating systems, resource
management tools, and security mechanisms at host
sites to implement higher level system-wide
services - design is based on a set of core objects
28Legion
- resource management is a negotiation between
resources and active objects that represent the
distributed application - three steps to allocate resources for a task
- Decision considers tasks characteristics and
requirements, resources properties and policies,
and users preferences - Enactment the class object receives an
activation request if the placement is
acceptable, start the task - Monitoring ensures that the task is operating
correctly
29Globus
- Toolkit with a set of components that implement
basic services - Security
- resource location
- resource management
- data management
- resource reservation
- Communication
- From version 1.0 in 1998 to the 2.0 release in
2002 and the latest 3.0, the emphasis is to
provide a set of components that can be used
either independently or together to develop
applications - The Globus Toolkit version 2 (GT2) design is
highly related to the architecture proposed by
Foster et al. - The Globus Toolkit version 3 (GT3) design is
based on grid services, which are quite similar
to web services. GT3 implements the Open Grid
Service Infrastructure (OGSI). - The current version, GT4, is also based on grid
services, but with some changes in the standard
30Globus scheduling
- GRAM Globus Resource Allocation Manager
- Each GRAM responsible for a set of resources
operating under the same site-specific allocation
policy, often implemented by a local resource
management - GRAM provides an abstraction for remote process
queuing and execution with several powerful
features such as strong security and file
transfer - It does not provide scheduling or resource
brokering capabilities but it can be used to
start programs on remote resources, despite local
heterogeneity due to the standard API and
protocol. - Resource Specification Language (RSL) is used to
communicate requirements. - To take advantage of GRAM, a user still needs a
system that can remember what jobs have been
submitted, where they are, and what they are
doing. - To track large numbers of jobs, the user needs
queuing, prioritization, logging, and accounting.
These services cannot be found in GRAM alone, but
are provided by systems such as Condor-G
31MyGrid and OurGrid
- Mainly for bag-of-tasks (BoT) applications
- uses the dynamic algorithm Work Queue with
Replication (WQR) - hosts that finished their tasks are assigned to
execute replicas of tasks that are still running.
- Tasks are replicated until a predefined maximum
number of replicas is achieved (in MyGrid, the
default is one).
32OurGrid
- An extension of MyGrid
- resource sharing system based on peer-to-peer
technology - resources are shared according to a network of
favors model, in which each peer prioritizes
those who have credit in their past history of
interactions.
33GrADS
- is an application scheduler
- The user invokes the Grid Routine component to
execute an application - The Grid Routine invokes the component Resource
Selector - The Resource Selector accesses the Globus
MetaDirectory Service (MDS) to get a list of
machines that are alive and then contact the
Network Weather Service (NWS) to get system
information for the machines. - The Grid Routine then invokes a component called
Performance Modeler with the problem parameters,
machines and machine information. - The Performance Modeler builds the final list of
machines and sends it to the Contract Developer
for approval. - The Grid Routine then passes the problem, its
parameters, and the final list of machines to the
Application Launcher. - The Application Launcher spawns the job using the
Globus management mechanism (GRAM) and also
spawns the Contract Monitor. - The Contract Monitor monitors the application,
displays the actual and predicted times, and can
report contract violations to a re-scheduler. - Although the execution model is efficient from
the application perspective, it does not take
into account the existence of other applications
in the system.
34GrADS
- Vadhiyar and Dongarra, 2002 proposed a
metascheduling architecture in the context of the
GrADS Project. - The metascheduler receives candidate schedules of
different application level schedulers and
implements scheduling policies for balancing the
interests of different applications.
35EasyGrid
- Mainly concerned with MPI applications
- Allows intercluster execution of MPI processes
36Nimrod
- uses a simple declarative parametric modeling
language to express parametric experiments - provides machinery that automates
- task of formulating,
- running,
- monitoring,
- collating results from the multiple individual
experiments. - incorporates distributed scheduling that can
manage the scheduling of individual experiments
to idle computers in a local area network - has been applied to a range of application areas,
e.g. Bioinformatics, Operations Research,
Network Simulation, Electronic CAD, Ecological
Modelling and Business Process Simulation.
37Nimrod/G
38AppLeS
- UCSD (Berman and Casanova)
- Application parameter Sweep Template
- Use scheduling based on min-min, min-max,
sufferage, but with heuristics to estimate
performance of resources and tasks - Performance information dependent algorithms
(pida) - Main goal to minimize file transfers
39GRAnD Kayser et al., CCPE, 2007
- Distributed submission control
- Data locality
- automatic staging of data
- optimization of file transfer
40Distributed submission
Results of simulation with Monarc
http//monarc.web.cern.ch/MONARC/ Kayser, 2006
41GRAnD
- Experiments with Globus
- Discussion list discuss_at_globus.org (05/02/2004)
- Submission takes 2s per task
- Place 200 tasks in the queue 6min
- Maximum number of tasks few hundreds
- experiments in CERN (D. Foster et al. 2003)
- 16s to submit a task
- Saturation in the server 3.8 tasks/minute
42GRAnD
- Grid Robust Application Deployment
43GRAnD
44GRAnD data management
45GRAnD data management
46Comparison (Kayser, 2006)
47Comparison (Kayser, 2006)
48Condor performance
49Condor performance
50Condor x AppMan
51Condor performance
- exps on a cluster of 8 nodes (Sanches et al.
2005)
52ReGS Condor performance
53ReGS Condor performance
54Toward Grid Operating Systems
55Vega GOS (the CNGrid OS)
- GOS overview
- A user-level middleware running on a client
machine - GOS has 2 components GOS and gnetd
- - GOS is a daemon running on the client
machine - - gnetd is a daemon on the grid server
56GOS
- Grid process and Grid thread
- Grid process is a unit for managing the whole
resource of the Grid. - Grid thread is a unit for executing computation
on the Grid. - GOS API
- GOS API for application developers
- grid() constructs a Grid process on the client
machine. - gridcon() grid process connects to the Grid
system. - gridclose() close a connected grid.
- gnetd API for service developer on Grid servers
- grid_register() register a service to Grid.
- grid_unregister() unregister a service.
57Grid
- Not yet mentioned
- Simulation SimGrid and GridSim
- Monitoring RTM, MonaLisa, ...
- Portals GridIce, Genius, ...