Title: Adaptive Agentbased Grid Resource Management
1Adaptive Agent-basedGrid Resource Management
2Future Computing Application
- Future High-energy physics experiments will
generate 10 petabytes of data/day. - The Digital Sky Survey will make many terabytes
of astronomical photographic data - Astrophysics (e.g., simulations of a supernova
explosion or black hole collision) - Climate modeling (e.g., simulations of a tornado
or prediction of the earth's climate for the next
century) - Economics (e.g., modeling the world economy)
- Modern Meteorological forecasting systems
- Automotive/aerospace industry (e.g., simulations
of a car crash or a new airplane design) - Human Genome Folding (30,000 Human Genome
proteins ) ? require 1,000,000 years of
computational time on an up-to-date PC.
3SOLUTION-1 IBM System Blue Gene
- Specification
- 131,000 IBM PowerPC processors
- 1-64 racks
- 1024 nodes/rack
- PowerPC 440 700MHz, two/node
- Peak performance/ rack 5.73TFlops
- Economics
- Starting price 1.5 million
- Development Time Cost 5 years 100 million
dollars - Installation
- Department of Energy's / National Nuclear
Security Administration's Lawrence Livermore
National Laboratory - http//www.internetnews.com/ent-news/article.php/3
432221
4SOLUTION-2 Sun Grid Compute Utility
- The Network is the Computer (Sun Microsystems
ltd.) - Sun Fire dual processor Opteron-based servers
with 4GB/RAM per CPU - Solaris 10 (x64)
- Solaris 10 OS
- Sun N1 Grid Engine 6 software
- Grid Network Infrastructure of 1GB switched Data
Network and 100 MB dedicated management network - Web-based access portal
- Internet-only access to upload data and
applications (no physical access to location) - Storage allocation of up to 10 GB per user
account.
Price 1/ CPU hour
5SOLUTION-3 Adaptive Agent-based Grid
- Develop a Grid-based application
- Use Idle Available Resource of the
underlying network - Focus will be on Resource Management System
(RMS) of the Grid - To make RMS, Adaptive apply Software Agents
- (We will revisit this slide again)
6Computer Systems Architecture
7Peer-to-Peer Architectures
- Pure
- A distributed system without any centralized
control. - All nodes are equivalent in functionality
(SERVENT SERVer cliENT) - Example Gnutella, Freenet, Chord, CAN, Tapesty,
- Hybrid
- There is a central server that maintains
directories of information about registered users
to the network. - The end-to-end interaction (data exchange) is
between two peer clients. - Example Napster, Kazaa
Hybrid P2P
8Hybrid Systems
- Centralized indexing Each peer maintains a
connection to the central server, through which
the queries are sent. (Napster)
- Decentralized indexing super-peers maintain
the central indexes for the information shared by
local peers connected to them. (Kazaa)
9Peer-to-Peer Applications
- File Sharing content storage and exchange
- Distributed Computing resource sharing between a
number of networked computers. - Collaboration communication (instant messaging,
online games), and collaboration (collaborative
editing). - Platforms infrastructure to support distributed
applications using P2P mechanisms.
10Distributed Systems
- Distributed System (DS) is one in which
components located at networked computers
communicate and co-ordinate their actions only by
message passing
11Challenges Faced in Building DS
- Heterogeneity
- Openness
- Security
- Scalability
- Failure handling
- Concurrency
- Transparency
12GRID Computing
- A hardware and software infrastructure that
provides dependable, consistent, pervasive, and
inexpensive access to high-end computational
capabilities (Ian Foster)
13Comparison b/w P2P and Grid
14Grid Computing Applications
- Distributed supercomputing use grid to solve
very large problems that can not be solved on a
single system and need lots of CPU, memory, etc - High-Throughput Computing use grid to schedule
large numbers of loosely coupled or independent
tasks, with the goal of putting unused processor
cycles - On-Demand computing use grid capabilities to
meet short-term requirements for resources that
cannot be cost-efficiently or conveniently
located locally - Data-Intensive Computing use grid for
synthesizing new information from data that is
maintained in geographically distributed
repositories, digital libraries, and databases - Collaborative Computing use grid to access
securely a set of distributed services by various
remote clients
15Grid Types
- Computational Grid
- Distributed super computing
- High throughput
- Data Grid
- Service Grid
- On Demand
- Collaborative
- Multimedia
16Grid Toolkits
- Globus
- Condor-G
- Legion
- Nimrod-G
- Ninf-G
- NetSolve
- GridSim
17Grid Projects
- TeraGrid
- NAREGI
- GRIDS
- grasp
- DATAGRID
- UNICORE Plus
- DATATAG
- Detailed listing can be found at
- http//gridcafe.web.cern.ch/gridcafe/gridproject
s/projects.html
18Grid Components
- Portal user interface
- Security
- Broker
- Scheduler
- Data management
- Job Resource Management System (RMS)
- Others (like IPC, Accounting, ...)
19- GRID Resource Management System
20Grid RMS Functions
- Standard interface for using remote resources
- Coordinate all Local resource management system
within multiple or distributed Virtual
Organizations (VOs) - Job Submission
- Resource Discovery
- Resource Selection
- Scheduling
- Co-allocation
- Job Monitoring, etc.
21Challenges in RMS
- Scalability As Grid size increases, it is
necessary to decentralize their services to avoid
bottlenecks and ensure scalability. - Adaptability As the availability of resources
may fluctuate due to connection/disconnection of
computing resources, the system needs to adapt
itself to this changes. - Reliability The system should be able to
tolerate failures and recover from them. - Manageability management includes various
aspects, such as complexity, resource management,
fault tolerance, and performance analysis.
22PROPOSED SOLUTION
- Dynamic, Decentralized
- Self-organizing, Self-adaptive
23Economic Framework
- Different economic Approaches
- English Auction (First-price, open cry,
ascending) - Dutch Auction (Open cry, descending)
- Vickrey Auction (Second-price, sealed bid)
- .
-
- Reference of Behnazs work
24Software Agent
- An agent is a computer system that is situated
in some environment, and that is capable
autonomous actions in this environment in order
to meet its design objectives (Wooldridge
Jennings)
- Agents are normally defined with help of their
properties - Autonomy
- Intelligence
- Social Ability
- Reactivity
- Mobility
Relevance to work
25Adaptive Agent-based Grid RMS
- Develop a Grid-based application
- Use Idle Available Resource of the
underlying network - Focus will be on Resource Management System
(RMS) of the Grid - To make RMS, Adaptive apply Software Agents
26Our Focus
- Resource management the process of managing
available resources and system workloads in
highly dynamic environment. Resource management
includes resource discovery, resource selection,
resource access. - Self-organization system should be able to
reconfigure itself in order to provide the
necessary resources for the tasks are currently
processed in the system - Self-adaptation system should be able to adapt
itself according to changing environment - Fault tolerance system should be recovered in
case of node failures. (error detection, error
isolation and error correction)
27System Model
- Consumers nodes executing tasks and looking for
additional resources - Producers nodes lying idle and looking for
additional jobs to execute - Matchmakers mediator between consumers and
producers allowing tasks to be delegated - Receives queries-offers
- Matches what can be match
- Reports the result back
- Matchmaker decides on basis of PRICE offered by
producer and affordable price by a consumer - Price will be different for different commodities
(CPU cycles, bandwidth, storage, memory etc)
28Matchmaking Process
Matchmaker Agent
Advertisement DB
Service request
Capability description
Result of matching
Requester Agent
Provider Agent
Resources
Jobs
29Dynamic Adaptive Matchmaking Process
30How to Promote/Demote a node?
Price will be different for different commodities
(CPU cycles, bandwidth, storage, memory etc)
31Future work
- Design simplest agent structure
- Achieve System Adaptation
- Migration tasks between segments to balance
workload - Communication between matchmakers
- Design an architecture for node reconfiguration
- Design of robust mechanism for failure handling
at a node -
- .
- ..
- Testing the mechanism in the most realistic
distributed environment such as Planet Lab.
32(No Transcript)