Title: Scheduling
1Scheduling Resource ManagementWorking Group
- Jennifer Schopf Bill Nitzberg, co-chairs
- www.cs.nwu.edu/jms/sched-wg/
- sched-wg_at_gridforum.org
- Grid Forum 4 Meeting
- Microsoft / Redmond, WA
- July 10-12, 2000
2Scheduling Working GroupObjectives Progress
3High-level OverviewSolve Grid Resource
Management
- Who? -- Developers
- What? -- Agreements / standards
- Capabilities, general protocols, APIs
- Why? -- Interoperability
- Reserving, allocating, using resources
- Managing resources (owners pt-of-view)
- Support co-scheduling diverse resources
- Enable "better" "use of" resources
4Charter
- Look at what is done today, gather
requirements - ...refining protocols, interactions, etc.
- ...work to standardize APIs
- Current Focus Areas
- Advance reservations
- Super scheduling
- Token definition
- information proper nouns, semantics,
representation
5Progress
- Metascheduler Query and Reservation Interface,
SchedRFC2, Snell Clement - Describing Grid Allocations, SchedRFC3, Karl
Czajkowski - Advance Reservation API, SchedRFC4, Roy Sander
- Metacomputing Resource Reservations, SchedRFC5,
Chapin Snell - Security Requirements of the Scheduling Working
Group, SchedRFC6, Jackson
6Scheduling Working GroupA Brief History
7Advance Reservation Co-Scheduling Workshop,
May 1999
- Defined reservation
- Resource start end duration
- Enumerated desired capabilities
- de-coupled from job submission
- unique printable reservation ID
- query/response - returns list of available slots
- hard and soft reservations
- Enumerated harder stuff to put off til later,
e.g., guarantee, cost model
8Grid Forum 1 (NASA Ames)June 1999
- Initial Charter
- Solve Grid Resource Management
- Three focus areas
- Advance reservations
- Super scheduling
- Resource specification (semantics tokens)
9Grid Forum 2 (Northwestern)October 1999
- Refined charter
- Requested
- lists of tokens from different groups
- architecture pictures of existing systems
- Discussed What is X?
- e.g., job, scheduler
10Grid Forum 3 (UCSD)March 2000
- Adopted charter refocused
- Decided not to work on architecture
- Developed Super-scheduler Model (10 steps)
- Gave overviews of advance reservation systems
(GARA, Maui, PBS, LSF) - Commitments to draft several SchedRFCs
11Grid Forum 4 (Microsoft)July 2000 Agenda
- Mon 1130-1230p Introduction
- Mon 300-430p Query Interface
- Tue 900-1030a Resource Acquisition Steps
- Tue 1030-1200p Security Requirements
- Tue 130-300p Advance Reservation API
- Tue 330-500p Scheduling Information
12Process
- Sched RFC Overview (
- focus on understanding the RFC rather than
correcting it - Gather discussion items during/after overview
- Prioritize discuss each item
- Next steps (last 5 minutes)
13Query Interface
- Goal Refine interface to a scheduler to answer
the question(s) - When will my job start?
- What times can a reservation be guaranteed?
- SchedRFC 2, sections 1-3, pink book p. 1-2
14Resource Acquisition Steps
- Goal List the basic steps and capabilities
involved in resource reservation, acquisition,
and use - SchedRFC 3 / Resource Acquisition Steps
- Sections 1-2, pink book, p. 6-9
- SchedRFC 5 / Reservation Operations
- First half, pink book, p. 27-29
15Security Requirements
- Goal Ensure interoperability between GF security
standards and GF scheduling standards - SchedRFC 6 (not 4), pink book p. 33-35
- Email from Keith Jackson, June 26
16Advance Reservation API
- Goal Refine API for advance reservations
- SchedRFC 2 / Res. Interface
- section 4-5, pink book p. 3-5
- SchedRFC 4 / GARA
- create, modify, status, bind, callbacks, cancel
- SchedRFC 7 / CCS
17Scheduling Information
- Goal Standardize scheduling information -- the
tokens or labels, semantics, and their
representation - Representing Compute Resources
- GIS WG paper, pink book p. 36-47
- SchedRFC 3 / Allocation Properties
- sections 3-4, pink book p. 9-11
- SchedRFC 5 / Maui
- second half, pink book p. 30-32
18Query InterfaceNotes
- SchedRFC 2, sections 1-3, pink book p. 1-2
19List of Issues to Discuss
- Earliest time?
- Time as just another resource
- UserID - defer til People
- When do you check against allocations?
- Security -- whos allowed to ask this?
- Ability to extend query (and list of common
stuff) - Reservations for compute only or include
bandwidth? - Return a unique reservation key at the time of
Query? - Optional stuff
- How long is return info good for? unique
reservation key - How about adding quality of estimate
20QUERY UserID TimeNeeded ResourceList--
StartTime OtherInfo
- Can we merge these two interfaces - talk about
them 1 at a time - Is it possible for TimeNeeded to be just another
resource - move time into ResourceList?
- 30 seconds of cpu 2 minutes wall clock (RPS
can do this) - 200kbs for 1 hour
- Most of the time, a user wants the set of all
specified resources over a fixed allocation
period (duration) - 30 seconds of cpu over 2 minutes of wall clock
is capacity, and is expressed in ResourceList - TimeNeeded - AllocationPeriod -- Wall clock
time - could it be start / end / duration?
- What about a full calendaring mechanism?
- UserID - defer til People
21QUERY(UserID Duration ResourceList
OtherInfo)returns -- (StartTime OtherInfo)
- UserID - defer til People discussion
- Duration is wall clock time
- ResourceList - defer til Sched Info discussion
- StartTime is a wall clock absolute time
- issue of synchronization of clocks
- OtherInfo
- extensible list of other stuff
22Next Steps
- Quinn will draft next version
- May be value in the query in specifying whether
you are interested in a future reservation or
not. - Dave Jackson will draft a list of OtherInfo
23Resource Acquisition StepsNotes
- Super Scheduler Model, GF3
- SchedRFC 3, Sections 1-2, pink book, p. 6-9
- SchedRFC 5, First half, pink book, p. 27-29
24Goal of 10 Steps
- Conceptual only, consistent language and
terminology (grounded in current practices) - Express core of current practices
- Originally from users point of view
25Super-scheduler Model(from GF3)
- Resource discovery - get list of potential
systems S - 1. where authorized, 2. min. application
requirements, 3. which machines meet min.
requirements - Choose best system s
- 4. Query gather (for all s in S) which
machine is best (e.g., when will J finish on s,
how much will J cost on s) - 5. select best systems s
- Run J on s
- 6. (optional) Advance Reservation
- 7. submit J to s
- 8. setup (staging)
- 9. monitor progress (maybe go back to 4.)
- 10. find out J is done, 11. clean up
26Stuff to Add/Modify
- Other Operations
- atomicly change a reservation
- maintain history of interactions (and state)
- cancel job
- dynamically modify job
- Monitor Progress
- Async. Notification (e.g., callbacks)
- Have to handle legacy systems (that might not let
you do anything besides poll) - Handle when the scheduler cancels a job
- Resource Discovery
- may need to be expanded
- Select best system
- Scheduler state (e.g., usage to date, progress)
- Think about an xyz object which exists through
states 1-11
27Next Steps
- Jennifer Schopf volunteered to be primary contact
for merging this all into a document (an RFC) - 10 Steps might be the master document that
points to all the other RFCs
28Security RequirementsNotes
- SchedRFC 6 (not 4), pink book p. 33-35
- Email from Keith Jackson, June 26
29Notes
- Credential refresh
- credential management
- Where should the content of this document reside?
- Security WG is building a usage scenerios and
their security requirements document - Scheduler will have to both
- act as a principal
- act on behalf of a user
- Query
- When will this job run? gives a lot of policy
information - Where a request is coming from may also affect
whether the request is allowed - part of the authorization question
30Need for limited delegation
- Goal ability to perform limited delegation in
schedulers - Security WG would like a well known collection
of the services scheduling provides, e.g., - querying when my job will start
- starting a job
- Delegating for querying when my job will start
- may have to have an interaction
- User I want to do the query, what credentials do
I need to give you - Scheduler I need access to your application,
file systems you have access to, - User then generates only those credentials
necessary for that particular service - Also need to restrict monitoring and canceling
jobs - Delegating to the scheduler and to the
application that will run
31Limited Delegation, cont.
- Examples of limited delegation
- Globus has on/off bit
- Kerberos has IP address in ticket
- Microsoft puts some auth. info into Kerberos
ticket - Limited delegation is a long-term future need
- if the scheduling group is far enough along, we
might be able to start looking at achieving it - since schedulers probably wont be trusted
components, well eventually need limited
delegation - keep this in mind when designing schedulers
32What should/can we attack now?
- We need to make unlimited delegation work
- Treat the meta-scheduler as a trusted component
- Scheduling should be asking security
- How do I check if Im allowed to to X?
- There are some examples of how its done, but
nothing agreed to right now - GAA is emerging standard for authorization
- GSS-API for authentication
- Formalize scheduling operations we want to apply
authorization policy - at what point do schedules ask security questions
33Next Steps
- Formalize scheduling operations we want to apply
authorization policy - At what point do schedules ask security questions
- This is the list of services
- Start with a minimal set we need
- This list will get integrated into the Security
requirements doc - Primary contact author is Keith Jackson
34Advance Reservation APINotes
- SchedRFC 2, section 4-5, pink book p. 3-5
- SchedRFC 4 / GARA
- SchedRFC 7 / CCS
35Notes
- Ability to specify complex relationships between
resources (e.g., walltime - request for more than
- Implicit credentials in GARA
- Reservations based on (duration end point)
36MAKE_RESERVATION(ManagerContact UserIDSpec
TimeSpec ResourceList OtherInfo)--
SuccessOrError ResKey OtherInfo
- SuccessOrError
- SUCCESS means it worked
- anything else is an implementation dependent
error code - Maybe standardization of errors should be a
general Grid Forum activity (? Perf. Monitoring) - ODBC?
- UserIDSpec -- should credentials be a parameter?
- Would allow you to have a wallet of credentials
- Helps with thread safety
- Parameters could include who has access to the
reservation, who is making the reservation, ... - TimeSpec
- Quinn will make a proposal to cover the basics
and make it extensible - StartTime, EndTime, Duration (choose 1, 2, or 3)
37CancelReservation(ManagerContact UserIDSpec,
ResKey OtherInfo)-- SuccessOrError OtherInfo
38Next Steps
- Quinn Snell Alain Roy Joern Gehring will
merge and propose a (single) revised API - Primary contact is Quinn Snell by HPDC
- Return value may say I can make my part of the
reservation, but you have to also check with Joe
39Scheduling Information
- Representing Compute Resources
- GIS WG paper, pink book p. 36-47
- SchedRFC 3, sections 3-4, pink book p. 9-11
- SchedRFC 5, second half, pink book p. 30-32
40Notes
- Proper nouns, vocabulary
- e.g., wall time, SPECint92
- name spaces are useful
- Semantics
- (context dependent semantics)
- Schema
- Representation
- Language
- need for logical operations
41Notes
- Need to have both a Requestor and Consumer
identified in a scheduling request - What subset of the properties are required
- It would be nice if two identical resources were
naturally represented the same - Given identical hardware, you may want to provide
different views, because the management policies
of the systems are different - Describing resources, allocations, requirements
are different they may need different
vocabulary, language, - Include units as a parameter, e.g., cpu speed
50 MFLOPS rather than having MFLOPS implicit
42Next Steps
- New RFC
- application requirements description
- resource description
- allocation description
- Primary contact author Karl Cz ( 3 months)
- Also Gregor von Laszewski, Joern Gehring
43GF4 Sched Summary
- Change Sched RFC to Sched Working Document
- Revise working document drafts
- Query Interface
- Resource Acquisition Steps
- Security Requirements
- Advance Reservation API
- Scheduling Information
- New WD Run a job API