Title: Resource Management of Grid Computing
1Resource Management of Grid Computing
- -- Juan Chen ??
- Ying Tao ??
2Overview
- Grid Computing
- Resource Management
- Local Resource Management
- Global Resource Management
- Scheduling
- Instance --Nimrod/G
3Grid Computing (1)
- A Grid is a very large-scale, generalized
distributed Network Computing system that can
scale to Internet size environments with machines
distributed across multiple organizations and
administrative domains. - There are four main aspects characterize
a grid - Multiple Administrative Domains and Autonomy
- Heterogeneity
- Scalability
- Dynamicity or Adaptability
4Grid Computing (2)
- Grid computing is concerned with coordinated
resource sharing and problem solving in dynamic,
multi-institutional virtual organizations. - The key concept is the ability to negotiate
resource-sharing arrangements among a set of
participating parties and then to use the
resulting resource pool for some purpose. So we
can see that resource management system is the
central component of grid computing systems.
5Designing a Grid architecture is challenging due
to
- supporting adaptability, extensibility, and
scalability - allowing systems with different administrative
policies to inter-operate while preserving site
autonomy - co-allocating resources
- supporting quality of service
- economy of computations
6A layered Grid architecture and components
7Globus Architecture
Applications
Basic library and supported softwares
mpich, PSEs
GIISs
collective
GRAM, GRIS, GSS
Globus
resource
GIS, GAA
connectivity
Low-level (Fabric)
local scheduler, PBS, Condor, SQMS
interface
8Traditional Resource Management
- Designed and operated under the assumption that
- They have complete control over a resource
- They can implement the mechanisms and
policies needed for effective use of that
resource in isolation - This is not the case for Grid Resource management
- Separate administrative domains
- Resource Heterogeneity
- Lack of control of different policies
9What is Grid Resource Management?
- Identifying application requirements, resource
specification - Matching resources to applications
- Allocating/scheduling and monitoring those
resources and applications over time in order to
run as effectively as possible.
10Resource Management System
11 RMS system abstract structure
12Grid Resource Management System consists of
-
- Local resource management system (Resource
Layer) - Basic resource management unit
- Provide a standard interface for using remote
resources - e.g. GRAM, etc.
-
- Global resource management system (Collective
Layer) - Coordinate all Local resource management system
within multiple or distributed Virtual
Organizations (VOs) - Provide high-level functionalities to efficiently
use all of resources - Job Submission
- Resource Discovery and Selection
- Scheduling
- Co-allocation
- Job Monitoring, etc.
- e.g. Meta-scheduler, Resource Broker, etc.
13Local Resource Management
- Globus Resource Allocation Manager (GRAM) is
responsible for - 1. processing RSL specifications representing
resource requests, by either denying the request
or by creating one or more processes (a \job")
that satisfy that request - 2. enabling remote monitoring and management of
jobs created in response to a resource request - 3. periodically updating the MDS information
service with information about the current
availability and capabilities of the resources
that it manages.
14Major components of the GRAM implementation
15Resource co-allocator
- it is often the case that a metacomputing
application requires that several resources be
allocated simultaneously. In these cases, a
resource broker produces a multirequest and
co-allocation is required. - the role of a co-allocator is to split a request
into its constituent components, submit each
component to the appropriate resource manager,
and then provide a means for manipulating the
resulting set of resources as a whole
16Types of co-allocation
- a range of different co-allocation service can be
constructed. - require all resources to be available before the
job is allowed to proceed, and fail globally if
failure occurs at any resource - allocate at least N out of M requested resources
then return - return immediately, but gradually return more
resources as they become available.
17Scheduling
- Scheduling is the matching of application
- requirements and available resources.
- System-level schedulers---focus on throughput and
generally do not consider application
requirements in scheduling decisions. - Application-specific schedulers---have been very
successful for individual applications, but are
not easily applied to new applications.
18Grid Application Development Software Project
(GrADS) Scheduling
- Launch-time scheduling is the pre-execution
determination of an initial matching of
application requirements and available resources. - Rescheduling involves making modifications to
that initial matching in response to dynamic
system or application changes. - Meta-scheduling involves the coordination of
schedules for multiple applications running on
the same Grid at once.
19Grid Application Development Software Architecture
20Launch-time scheduling
- The launch-time scheduler is called just before
application launch to determine how the current
application execution should be mapped to
available Grid resources. - The resulting schedule specifies the list of
target machines, the mapping of virtual
application processes to those machines, and the
mapping of application data to processes.
21GrADS launch-time scheduling architeture
22The drawbacks of launch-time scheduling
- When two applications are submitted to GrADS at
the same time, scheduling decisions will be made
for each application ignoring the presence of the
other. - If the launch-time scheduler determines there are
not enough resources for the application, it can
not make further progress. - A long running job in the system can severely
impact the performance of numerous new jobs
entering the system - The root cause of these and other problems is the
absence of a metascheduler
23Rescheduling
- Rescheduling, can include changing the machines
on which the application is executing or changing
the mapping of data and/or processes to those
machines according to the change of load or
application requirements. - Rescheduling can be implemented via two ways
- Application Migration
- Process Swapping
24GrADS rescheduling architecture
25Metascheduling
- The goal of metascheduling is to investigate
scheduling policies that take into account both
the needs of the application and the overall
performance of the system. - The metascheduler possesses global knowledge of
all applications in the system and tries to
balance the needs of the applications. - The metascheduler is implemented by the addition
of four components, namely database manager,
permission service, contract negotiator and
rescheduler
26Metascheduler and interactions
27Introduction of Nimrod
- A large-scale parameter study of a simulation is
well suited to high-throughput computing. It
involves the execution of a large number of tasks
(task farms) over a range of parameters. - The Nimrod system is designed to address the
complexities associated with parametric computing
on clusters of distributed systems. - However, Nimrod is unsuitable as implemented in
the large-scale dynamic context of computational
grids, where resources are scattered across
several administrative domains, each with their
own user policies, employing their own queuing
system, varying access cost and computational
power.
28Nimrod/G
- Shortcomings of Nimrod are addressed by a new
system called Nimrod/G - It uses the Globus middleware services for
dynamic resource discovery and dispatching jobs
over computational grids. - The architecture of Nimrod/G and its key
components are shown as follows
29The architecture of Nimrod/G
30Scheduling and Computational Economy of Nimrod/G
- Nimrod/G system has integrated computational
economy as part of a scheduling system, It can be
handled in two ways - systems can work on the users behalf and try to
complete the assigned work within a given
deadline and cost. (the early prototype of
Nimrod/G) - the user can enter into a contract with the
system and pose requests. The advantage of this
approach is that the user knows before the
experiment is started whether the system can
deliver the results and what the cost will be.
(rather complex and need grid middleware services
for resource reservation, broker services for
negotiating cost )
31- The important parameters of computational
- economy that can influence the way resource
- scheduling is done are
- Resource Cost (set by its owner)
- Price (that the user is willing to pay)
- Deadline (the period by which an application
execution need to completed) - The scheduler can use all sorts of information
- gathered by a resource discoverer and also
- negotiate with resource owners to get the best
value - for money.
32