Title: Lessons from LEADVGrADS Demo
1Lessons from LEAD/VGrADS Demo
- Yang-suk Kee, Carl Kesselman
- ISI/USC
2Outline
- SC06 Demo Summary
- New Features of VGES
- Year-5 Development and Research Plans
- VGES Support for SC07 Demo
3LEAD/VGrADS Demo at SC06
- The first integration of LEAD/VGrADS software
stacks - Identified functionalities and requirements of
core components - Demonstrated the resource slot concept and the
QBETS (BQP) potential - Showed slot-based scheduling using performance
model
4DAG Constraint
Workflow Configuration Service
Schedule toward a workflow deadline
Virtual Grid Execution System
Workflow
Annotated DAG
Performance Model
LEAD Resource Broker
Create Services
Portal
LEAD BPEL Workflow Engine
App. Factory
Launch Services
Application Service (per task)
Run job
Scheduler Mapper
Job Notification
Run workflow one step at a time
Workflow and File Status
Batch Queue Prediction
Event Broker
myLEAD (subscribes to messages from the broker
and knows what magic to do with input/output
files and talks to RLS/DRS
Adaptation
LEADLinked Environments for Atmospheric Discovery
5Schedule toward a workflow deadline
(Reserved)
Virtual Grid Execution System
GT4 GRAM
Resource Broker
PBS
Performance Model
(Reserved)
(Reserved)
Scheduler Mapper
Batch Queue Prediction
6New Features of Current VGES
- Language
- Support of resource equivalence (limited
implementation) - WS-GRAM schema wrapper for execution on the
personalized resources - Execution system
- Probabilistic guarantee of resource binding
- Resource orchestration and personalization
7Resource Equivalence
- Specifying exchangeable constraints
- Provides flexibility in resource discovery
- Specifies constraints with precedence in order of
appearance - PE Opteron ltgt 4 Itanium
- vgdl ClusterOf (node) 4 node
Processor PE
8WS-GRAM Schema Wrapper
- Providing abstract job description
- Hides WS-GRAM schemas that are irrelevant for
specifying applications - Application-related WS-GRAM schema
- argument, count, directory, environment,
executable, job, jobType, library, path, stderr,
stdin, stdout - cf) host, factoryEndpoint
9Guarantee of Resource Binding
- Deterministic guarantee
- Batch with advance reservation
- Probabilistic guarantee
- Predicts resource availability for
batch-scheduled resources - Models resource allocation of individual resource
providers as a random variable with a binomial
distribution
10Resource Actualization
acquire
2
PBS
LSF
Resource actualization engine
bind
1
Condor
GRAM
3
check
4
notification
WS-GRAM
submit
6
Application launcher
PBS
launch
5
update
7
8
notification
vgES
Cluster
11vgdlCluserOf (nd) 4 lt100000_at_100gt
ndProcessorP4
P1
P2
described
unavailable
submit
sdsc (p0.90) ncsa (p0.85) iu (p0.70) ada
(p0.65)
P4
P3
900 A.M
1100 A.M
(cleanup)
select
discovered
active
Time
1000 A.M
900 A.M
bind
(activate)
P1
P2
sdsc (p0.90) ncsa (p0.85) iu (p0.70)
bound
inactive
(actualize)
P4
P3
910 A.M
955 A.M
sdsc
ncsa
iu
12Year-5 Plans
- Extended implementation of slot allocation
- Support of various resource managers (e.g., PBS,
LSF, Load-leveler, Condor) - Personalization over multiple clusters
- Consistent resource slot provisioning
- Provides efficient resource scheduling techniques
- Tradeoffs between quality, availability, and cost
- Slot optimization
- Optimization of inter/intra slot allocation
- Deploying to as many TeraGrid sites as possible
13Consistent Resource Provisioning
- Motivation
- Can we get a slot for a specified time period in
practice? - Limitation in both number of processors and wall
time - Goals
- Exploring offline/online algorithms
- Presents system sub-slot schedules to users
14Resource Slot Provisioning Problem
LooseBag for 2 days
Slot duration
S-slot
U-slot
Slot size
MaxCPU
This slot will be never satisfied!
MaxWallTime
15Resource Slot Provisioning Problem
LooseBag for 2 days
S-slot
S-slot
S-slot
S-slot
U-slot
S-slot
S-slot
16VGES Support for the SC07 Demo
- Resource equivalence
- Enables flexible resource discovery
- Provides more reliable resource discovery service
- New semantic of binding
- Separates slot binding from actual resource
allocations - Enables the LEAD workflow manager to exploit
parallelism - Probabilistic guarantee of binding
- Provides high slot availability virtually
- Minimizes resource allocation failures due to
late resource arrivals
17VGES Support for the SC07 Demo
- Support of various resource managers
- Plugs in Loadleveler (Bigred) and LSF (Tungsten)
- Covers most resource managers in TeraGrid
- Callback mechanisms for resource arrivals
- Provides asynchronous event notification
- Lessens the burdens on both the client and the
server - Consistent resource provision for LooseBag slots
- Provisions resources proactively
- Realizes slot in practice