Title: P2P Discovery of Computational Resources forGrid Applications
1P2P Discovery of Computational Resources
for Grid Applications
- Adeep S. Cheema (Microsoft)
- Moosa Muhammad (Motorola)
- Indranil Gupta (UIUC)
Dept. of Computer Science University of Illinois
at Urbana-Champaign (UIUC)
2Grid vs. P2P Complementary Areas
- Grid computing (great applications)
- Dedicated H/W Grids
- SETI_at_Home
- Background Grids
- Emphasis and hype so far on deployability, not on
scale and reliability - Peer-to-Peer (P2P) Computing (great technologies)
- Decentralized (no server), Scalable (1000s of
nodes), Reliable (network loss, node failure) - Emphasis and hype so far has been on scale and
reliability, not on (legal) deployability - Foster et al, Ledlie et al called for a
convergence of these two areas - Eliminate the hype, increase the user base
3Setting Background Grids
- Background Grid A group of Grid clusters where
jobs run in the background on each machine - Scale Grid job CPU utilization down/up according
to CPU free time percentage, RAM free percentage - Conservative approach
- Makes sense in undergraduate labs better use of
resources without disturbing user (CSIL Lab at
CS.UIUC) - Problem? Resource Discovery
- Sample Query Find me a machine with gt 1.4 GHz
Intel P4, gt 40 CPU idle, gt 512 MB RAM free,
running Linux - Do this is in a Grid network with (tens of)
1000s of hosts - With each hosts CPU-free , RAM-free changing
all the time
4Requirements and Idea
- Need a solution that is
- Scalable support a large pool of diverse
resources - Efficient and Fully Decentralized low background
bandwidth, support quick queries times, quick
updates - Robust support frequent updates, frequent host
failures, accurate replies to queries - Basic Idea Adapt P2P technologies to provide a
substrate that Background Grid applications can
build on top of - But need to modify P2P technology by adding
expressive naming of resources
5One secondAny Off-The-Shelf Solutions?
- Existing P2P Technologies
- DHTsDistributed Hash Tables Pastry, Chord,
CAN, Kelips - Support efficient resource discovery, but require
unique names for resources - Also support range queries (find u2.mp3)
- Problem No expressive naming of resources
- How do you name a resource that is 2.3 GHz P4,
30 CPU-idle, 128 MB RAM-free for efficient
querying, and for frequent and efficient updates
(e.g., change CPU-idle to 40)? - Existing Grid Resource Discovery
- GRIP Globus
- Matchmaking Condor
- Ontology-based, Iamnitchi and Foster
- Problem scalability and reliability not (really)
addressed
6Background Grids Workload Traces
An aside
- Workstations in CSIL (undergraduate) lab at
CS/UIUC - 6 candidate machines, monitored every minute, for
a 3 week period. Over 3000 machine hours of
traces. - Monitored Parameters CPU Idle, RAM free, disk
space free
7Short-term Behavior
- Bursty CPU utilization
- Interspersed with long periods of inactivity
- Low average load
8Long-term Behavior
- Temporal and diurnal patterns
- Maintenance related patterns
9Background Grids Behavior Summary
- CPU utilization bursty, dynamic and low in
long-term - RAM free varies a reasonable amount over time
- Disk space free linear with a small negative
slope and occasional small positive jumps - Because the RAM-free and disk-free vary over
time, Background Grids make sense!
10Resource Tuples
Back to our problem
- ltip,port,cpu-speed,tot-ram,cpu-idle,ram-free,disk-
free,gt - Insert, Update, and Query in a distributed manner
- Maintain the resource tuples among the hosts
themselves (cooperative, no server, decentralized)
11Pastry A Crash Course
- Pastry Rowstron and Druschel 03 a P2P resource
discovery system - Each host mapped to a unique identifier
(nodeID) derived by hashing hosts ip address
(SHA-1 or MD-5 ) ? load balancing and scale - All nodeIDs located on a logical ring
- Each host knows its next few successors and
predecessors on ring - Each host also knows a few other far-away hosts
on the ring - A message for a destination nodeID is routed
through these neighbors. Routing to any host
takes O(log N) logical hops in a system with N
hosts. - Each resource has a unique name (resourceID) in
the nodeID space derived by hashing resources
name (SHA-1 or MD-5) ? load balancing and scale - the host with nodeID closest to resourceID is
responsible for the resource - resourceIDs can be inserted and queried and
deleted - Maintenance of neighbors at each host is
autonomous through a heartbeating protocol.
Handles arrival, departure and failure of nodes. - Problem for our resource tuples
ltip,port,cpu,ram,diskgt? Pastry assumes each
resource has a unique, static, one-dimensional
name
12Expressive Naming of Grid Resources
- Split a resource tuple into two parts
- a static part fixed configuration
- CPU type and speed, Total RAM.
- a dynamic part continuously changing parameters
- CPU idle, RAM free, disk free
- Can be extended so the split is different for
different resource tuples
13Expressive Naming of Grid Resources (2)
- For a given resource tupled, derive Pastry
resourceID by - Hashing static part of resource tuple
- Appending dynamic part of resource tuple verbatim
(un-hashed)
14Ringing Resources
Static Part
Dynamic Part
- Why?
- Retain load-balancing yet optimize for querying
- A given hosts resource tuple will lie in a
vicinity within the Pastry ring - As the hosts behavior changes over time, its
resource tuple spans an arc - If dynamic attributes are in same order, can also
search by dynamic attribute, given static
attributes
15Ringing Resources (2)
Static Part
Dynamic Part
- Static part consists of attributes that each have
a discrete range, and usually one that is finite,
e.g., CPU speed is multiples of 100 MHz, and lt 4
GHz - Dynamic part
- Contains attributes with continuous ranges
- Derived by encoding all dynamic attributes into
the 32 dynamic bits - E.g., truncate, other encodings in paper
16When Host Joins Background Grid
- Two steps
- Convert resource tuple into resourceID
- Insert resourceID into Pastry system
- Thats it!
17When Hosts Condition changes
- E.g., CPU idle changes
- Update resourceID in the Pastry system
- Delete old resourceID from Pastry
- Insert new resourceID into Pastry
- Uses single UPDATE message (since old and new
locations close-by) - How often to update? Updates are initiated by
resources either - Periodically at rate URATE, or
- When significant change in dynamic art
- Parameterized as UCHANGEresource UCHANGECPU
8.7 means update oonly when CPU idle changes
by more than 8.7 since last update
18How to Discover a Resource
- E.g., Find me a machine with gt 1.4 GHz Intel P4,
gt 40 CPU idle, gt 512 MB RAM free, running Linux
- Three alternative approaches
- Single-shot send out one Pastry lookup for one
resourceID that satisfies query parameters. Works
well if frequency of updates and number of
resources are high. - Recursive send out one Pastry lookup with a TTL
(time to live) field, with recursive tuning of
the lookups resourceID along the way. - Parallel searching Initiate multiple Pastry
lookups, each for a different resourceID that
satisfies the query. Works well if resources are
limited or contention is high.
19A. Update Frequency
Experimental Results
20000 total hosts 1000 hosts (only) in
Pastry Update threshold on 10 change 1 query /
time unit
All experiments uses CSIL workload traces shown
earlier
Total number of updates (in simulation run) is
low for an update threshold gt 10
20B. Scalability
- 25 of the hosts store stored 70 of the data
(skewed distribution) - The most-loaded host 0.55 of the load, compared
to a system-wide average of 0.1 (scalable) - The bandwidth consumed scales with number of host
resources injected
21C. Short-Term Query Performance
A single-shot search (for different CPU
idle) returns sufficient number of results!
- Search bandwidth goes down quickly
- with time difference between queries
22D. Long-Term Query Performance
- A sample Grid application that requires 12 hosts
of specific configurations for a 10 hour period - Horizontal blue line
- Overprovisioning Brown curve shows total
resources - Yellow curve shows extra resources (wastage)
- P2P Resource Discovery-Based Pink curve
- Grid application demand always satisfied. No
wastage due to overprovisioning.
23Summary
- Grid applications need substrates that provide
both reliability and scalability (1000s of
nodes) - P2P technologies exist, but need to be adapted to
build such substrates - This paper
- Background Grids use idle CPU, free RAM etc.
- Resource Discovery answer queries looking for
such resources - Solution Adapt Pastry P2P Overlay by using
expressive naming of resources - Traces collected from CS Undergraduate Lab at
UIUC - Resulting Substrate is robust, scalable,
reliable, and beats overprovisioning for
long-running Grid applications
24Questions and Queries?