Title: Slicing with SHARP
1Slicing with SHARP
- Jeff Chase
- Duke University
2Federated Resource Sharing
- How do we safely share/exchange resources across
domains? - Administrative, policy, security, trust
domains or sites - Sharing arrangements form dynamic Virtual
Organizations - Physical and logical resources (data sets,
services, applications).
3Location-Independent Services
dynamic server set
Clients
request routing
varying load
- The last decade has yielded immense progress on
building and deploying large-scale adaptive
Internet services. - Dynamic replica placement and coordination
- Reliable distributed systems and clusters
- P2P, indirection, etc.
Example services caching network or CDN,
replicated Web service, curated data, batch
computation, wide-area storage, file sharing.
4Managing Services
Servers
Service Manager
Clients
sensor/actuator feedback control
- A service manager adapts the service to changes
in load and resource status. - e.g., gain or lose a server, instantiate
replicas, etc. - The service manager itself may be decentralized.
- The service may have contractual targets for
service quality - Service Level Agreements or
SLAs.
5An Infrastructure Utility
Resource efficiency Surge protection Robustness P
ay as you grow Economy of scale Geographic
dispersion
Resource pool
- In a utility/grid, the service obtains resources
from a shared pool. - Third-party hosting providers - a resource
market. - Instantiate service wherever resources are
available and demand exists. - Consensus we need predictable performance and
SLAs.
6The Slice Abstraction
- Resources are owned by sites.
- E.g., node, cell, cluster
- Sites are pools of raw resources.
- E.g., CPU, memory, I/O, net
- A slice is a partition or bundle or subset of
resources. - The system hosts application services as
guests, each running within a slice.
Site A
Service S1
Site B
Service S2
Site C
7Slicing for SLAs
- Performance of an application depends on the
resources devoted to it Muse01. - Slices act as containers with dedicated bundles
of resources bound to the application. - Distributed virtual machine / computer
- Service manager determines desired slice
configuration to meet performance goals. - May instantiate multiple instances of a service,
e.g., in slices sized for different user
communities. - Services may support some SLA management
internally, if necessary.
8Example Cluster Batch Pools in Cluster-on-Demand
(COD)
- Partition a cluster into isolated virtual
clusters. - Virtual cluster owner has exclusive control over
servers. - Assign nodes to virtual clusters according to
load, contracts, and resource usage policies. - Example service a simple wrapper for SGE batch
scheduler middleware to assess load and
obtain/release nodes.
request
Service Managers (e.g., SGE)
COD Cluster
grant
Dynamic Virtual Clusters in a Grid Site Manager
HPDC 2003 with Laura Grit, David Irwin, Justin
Moore, Sara Sprenkle
9A Note on Virtualization
- Ideology for the future Grid adapt the grid
environment to the service rather than the
service to the grid. - Enable user control over application/OS
environment. - Instantiate complete environment down to the
metal. - Dont hide the OS its just another replaceable
component. - Requires/leverages new underware for
instantiation. - Virtual machines (Xen, Collective, JVM, etc.)
- Net-booted physical machines (Oceano, UDC, COD)
- Innovate below the OS and alongside it
(infrastructure services for the control plane).
10SHARP
- Secure Highly Available Resource Peering
- Interfaces and services for external control of
federated utility sites (e.g., clusters). - A common substrate for extensible
policies/structures for resource management and
distributed trust management. - Flexible on-demand computing for a site, and
flexible peering for federations of sites - From PlanetLab to the New Grid
- Use it to build a resource dictatorship or a
barter economy, or anything in between. - Different policies/structures may coexist in
different partitions of a shared resource pool.
11Goals
- The question addressed by SHARP is how do the
service managers get their slices? - How does the system implement and enforce
policies for allocating slices? - Predictable performance under changing
conditions. - Establish priorities under local or global
constraint. - Preserve resource availability across failures.
- Enable and control resource usage across
boundaries of trust or ownership (peering). - Balance global coordination with local control.
- Extensible, pluggable, dynamic, decentralized.
12Non-goals
- SHARP does NOT
- define a policy or style of resource exchange
- E.g., barter, purchase, or central control (e.g.,
PLC) - care how services are named or instantiated
- understand the SLA requirements or specific
resource needs of any application service - define an ontology to describe resources
- specify mechanisms for resource control/policing.
13Resource Leases
request
Site A Authority
S1 Service Manager
grant
ltleasegt ltissuergt As public key lt/issuergt
ltsigned_partgt ltholdergt S1s public key
lt/holdergt ltrsetgt resource description
lt/rsetgt ltstart_timegt lt/start_timegt
ltend_timegt lt/end_timegt ltsngt unique ID
at Site A lt/sngt lt/signed_partgt
ltsignaturegt As signature lt/signaturegt lt/leasegt
14Agents (Brokers)
request
request
S1 Service Manager
Site A Authority
grant
grant
- Introduce agent as intermediary/middleman.
- Factor policy out of the site authority.
- Site delegates control over its resources to the
agent. - Agent implements a provisioning/allocation policy
for the resources under its control.
15Leases vs. Tickets
- The site authority retains ultimate control over
its resources only the authority can issue
leases. - Leases are hard contracts for concrete
resources. - Agents deal in tickets.
- Tickets are soft contracts for abstract
resources. - E.g., You have a claim on 42 units of resource
type 7 at 3PM for two hoursmaybe. (also signed
XML) - Tickets may be oversubscribed as a policy to
improve resource availability and/or resource
utilization. - The subscription degree gives configurable
assurance spanning continuum from a hint to a
hard reservation.
16Service Instantiation
instantiate service in virtual machine
7
Site
redeem ticket
request
6
5
1
grant ticket
grant lease
2
request
3
grant ticket
4
Agent
Service Manager
Like an airline ticket, a SHARP ticket must be
redeemed for a lease (boarding pass) before the
holder can occupy a slice.
17Ticket Delegation
Transfer of resources, e.g., as a result of a
peering agreement or an economic transaction.
The site has transitive trust in the delegate.
Agents may subdivide their tickets and delegate
them to other entities in a cryptographically
secure way. Secure ticket delegation is the basis
for a resource economy. Delegation is
accountable if an agent promises the same
resources to multiple receivers, it may/will be
caught.
18Peering
Sites may delegate resource shares to multiple
agents. E.g., Let my friends at UNC use 20 of
my site this week and 80 next weekend. UNC can
allocate their share to their local users
according to their local policies. Allocate the
rest to my local users according to my policies.
Note tickets issued at UNC self-certify their
users.
19A SHARP Ticket
ltticketgt ltsubticketgt ltissuergt As public
key lt/issuergt ltsigned_partgt
ltprincipalgt Bs public key lt/principalgt
ltagent_addressgt XML RPC redeem_ticket()
lt/agent_addressgt ltrsetgt resource
description lt/rsetgt ltstart_timegt
lt/start_timegt ltend_timegt lt/end_timegt
ltsngt unique ID at Agent A lt/sngt
lt/signed_partgt ltsignaturegt As signature
lt/signaturegt lt/subticketgt ltsubticketgt
ltissuergt Bs public key lt/issuergt
ltsigned_partgt lt/signed_partgt ltsignaturegt
Bs signature lt/signaturegt lt/subticketgt lt/ticket
gt
20Tickets are Chains of Claims
claimID a holder A a.Rset a.term
claimID b holder B issuer A parent
a b.rset/term
21A Claim Tree
anchor
The set of active claims for a site forms a claim
tree. The site authority maintains the claim
tree over the redeemed claims.
40
ticket T
25
8
final claim
9
3
3
10
22Ticket Distribution Example
A 40
- Site transfers 40 units to Agent A
B 8
D 3
E 3
C 25
H 7
- B and C further subdivide resources
F 9
resource space
- C oversubscribes its holdings in granting to H,
creating potential conflict
conflict
G 10
t0
t
time
23Detecting Conflicts
A 40
100
B 8
D 3
E 3
40
C 25
H 7
resource space
F 9
26
25
8
G 10
10
9
7
3
3
t0
t
time
24Conflict and Accountability
- Oversubscription may be a deliberate strategy, an
accident, or a malicious act. - Site authorities serve oversubscribed tickets
FCFS conflicting tickets are rejected. - Balance resource utilization and conflict rate.
- The authority can identify the accountable agent
and issue a cryptographically secure proof of its
guilt. - The proof is independently verifiable by a third
party, e.g., a reputation service or a court of
law. - The customer must obtain a new ticket for
equivalent resource, using its proof of rejection.
25Agent as Arbiter
- Agents implement local policies for apportioning
resources to competing customers. Examples - Authenticated client identity determines
priority. - Sell tickets to the highest bidder.
- Meet long-term contractual obligations sell the
excess.
26Agent as Aggregator
Agents may aggregate resources from multiple
sites.
Example PlanetLab Central
- Index by location and resource attributes.
- Local policies match requests to resources.
- Services may obtain bundles of resources across
a federation.
27Division of Knowledge and Function
Agent/Broker
Authority
Service Manager
Knows the Application Instantiate app Monitor
behavior SLA/QoS mapping Acquire
contracts Renew/release
Guesses global status Availability of
resources What kind How much Where (site
grain) How much to expose about resource
types? About proximity?
Knows local status Resource status Configuration
Placement Topology Instrumentation Thermals, etc.
28Issues and Ongoing Work
- SHARP combines resource discovery and brokering
in a unified framework. - Configurable overbooking degree allows a
continuum. - Many possibilities exist for SHARP agent/broker
representations, cooperation structures,
allocation policies, and discovery/negotiation
protocols. - Accountability is a fundamental property needed
in many other federated contexts. - Generalize accountable claim/command exchange to
other infrastructure services and applications. - Bidding strategies and feedback control.
29Conclusion
- Think of PlanetLab as an evolving prototype for
planetary-scale on-demand utility computing. - Focusing on app services that are light resource
consumers but inhabit many locations network
testbed. - Its growing organically like the early Internet.
- Rough consensus and (almost) working code.
- PlanetLab is a compelling incubator and testbed
for utility/grid computing technologies. - SHARP is a flexible framework for utility
resource management. - But its only a framework.
30Performance Summary
- SHARP prototype complete and running across
PlanetLab - Complete performance evaluation in paper
- 1.2s end-to-end time to
- Request resource 3 peering hops away
- Obtain and validate tickets, hop-by-hop
- Redeem ticket for lease
- Instantiate virtual machine at remote site
- Oversubscription allows flexible control over
resource utilization versus rate of ticket
rejection
31Related Work
- Resource allocation/scheduling mechanisms
- Resource containers, cluster reserves,
Overbooking - Cryptographic capabilities
- Taos, CRISIS, PolicyMaker, SDSI/SPKI
- Lottery ticket inflation
- Issuing more tickets decreases value of existing
tickets - Computational economies
- Amoeba, Spawn, Muse, Millenium, eOS
- Self-certifying trust delegation
- PolicyMaker, PGP, SFS
32Physical cluster
COD servers backed by configuration database
Network boot Automatic configuration Resource
negotiation
Dynamic virtual clusters
Database-driven network install
33SHARP
- Framework for distributed resource management,
resource control, and resource sharing across
sites and trust domains
Challenge SHARP Approach
Maintain local autonomy Sites are ultimate arbiters of local resources Decentralized protocol
Resource availability in the presence of agent failures Tickets time out (leases) Controlled oversubscription
Malicious actors Signed tickets Audit full chain of transfers
3440
claimID a holder A issuer A parent
a.rset/term
anchor
40
8
ticket T
25
3
claimID b holder B issuer A parent
a b.rset/term
3
final claim
25
8
future claim
9
10
claimID c holder C issuer B parent
b c.rset/term
10
9
3
3
7
T
claim tree at time t0
claim delegation
t0
t
subclaim(claim c, claim p) ? c.issuer
p.holder ? c.claimID p
? contains(p.rset, c.rset) ?
subinterval(c.term, p.term) contains(rset p,
rset c) ? c.type p.type ? c.count ?
p.count subinterval(term c, term p) ?
p.start ? c.start ? p.end ? p.start ? c.end ?
p.end
ticket(c0,,cn) ? ? ci, i 0..n-1
subclaim(ci1, ci) anchor(c0)
anchor(claim a) ? a.issuer a.holder
a.parent null
agent
35Mixed-Use Clustering
Virtual clusters
BioGeometry batch pool
SIMICS/Arch batch pool
Internet Web/P2P emulations
Student semester project
Somebodys buggy hacked OS
Physical clusters
Vision issues leases on isolated partitions of
the shared cluster for different uses, with
push-button Web-based control over software
environments, user access, file volumes, DNS
names.
36Grids are federated utilities
- Grids should preserve the control and isolation
benefits of private environments. - Theres a threshold of comfort that we must reach
before grids become truly practical. - Users need service contracts.
- Protect users from the grid (security cuts both
ways). - Many dimensions
- decouple Grid support from application
environment - decentralized trust and accountability
- data privacy
- dependability, survivability, etc.
37COD and Related Systems
Other cluster managers based on database-driven
PXE installs
Oceano
hosts Web services under dynamic load.
Dynamic clusters
OS-Agnostic
COD
NPACI Rocks
Netbed/Emulab
Flexible configuration Open source
configures Linux compute clusters.
configures static clusters for emulation
experiments.
COD addresses hierarchical dynamic resource
management in mixed-use clusters with pluggable
middleware (multigrid).
38Dynamic Virtual Clusters
Web interface
Reserve pool (off-power)
COD database
COD Manager
Virtual Cluster 2
negotiate configure
Allocate resources in units of raw
servers Database-driven network install
Pluggable service managers Batch schedulers (SGE,
PBS), Grid PoP, Web services, etc.
Virtual Cluster 1
39Enabling Technologies
DHCP host configuration
Linux driver modules, Red Hat Kudzu, partition
handling
DNS, NIS, etc. user/group/mount configuration
NFS etc. network storage automounter
PXE network boot
IP-enabled power units
Power APM, ACPI, Wake-on-LAN
Ethernet VLANs
40Recent Papers on Utility/Grid Computing
- Managing Energy and Server Resources in Hosting
Centers ACM Symposium on Operating Systems
Principles 2001 - Dynamic Virtual Clusters in a Grid Site Manager
IEEE High-Performance Distributed Computing
2003 - Model-Based Resource Provisioning for a Web
Service Utility USENIX Symposium on Internet
Technologies and Systems 2003 - An Architecture for Secure Resource Peering ACM
Symposium on Operating Systems Principles 2003 - Balancing Risk and Reward in a Market-Based Task
Manager IEEE High-Performance Distributed
Computing 2004 - Designing for Disasters USENIX Symposium on File
and Storage Technologies 2004 - Comparing PlanetLab and Globus Resource
Management Solutions IEEE High-Performance
Distributed Computing 2004 - Interposed Proportional Sharing for a Storage
Service Utility ACM Symposium on Measurement and
Modeling of Computer Systems 2004