Title: Scheduling
1Scheduling Resource ManagementWorking Group
- Jennifer Schopf Bill Nitzberg Uwe
Schwiegelshohn, co-chairs - www.cs.nwu.edu/jms/sched-wg/
- sched-wg_at_gridforum.org
- Global Grid Forum 1
- Amsterdam, The Netherlands
- March, 2001
2High-level OverviewSolve Grid Resource
Management
- Who? -- Developers
- What? -- Agreements / standards
- Capabilities, general protocols, APIs
- Why? -- Interoperability
- Reserving, allocating, using resources
- Managing resources (owners pt-of-view)
- Support co-scheduling diverse resources
- Enable "better" "use of" resources
3Charter
- Look at what is done today, gather
requirements - ...refining protocols, interactions, etc.
- ...work to standardize APIs
4Process
- Review working document (lt 5 minutes)
- Focus on understanding the document (revisions)
rather than correcting it - Gather discussion items during/after review
- Prioritize discuss each item
- Next steps (last 5 minutes)
- Revised documents posted to web site within 2
weeks.
5Plans for GGF-1
- Monday 2-330
- SuperScheduling 10 steps (30 mins)
- Token definition (60 mins)
- information proper nouns, semantics,
representation - Monday 4-530
- Advance reservations
- Tuesday 4-540
- New topics - give J,B or U a heads up, bring your
5 min. description with you, to be followed by 5
min discussion - Wednesday 11-1230
- Dictionary Group BOF (our tokens are part of this)
6Since GF-5
- Teleconferences (every other week, starting in
Jan.) - Mailing list
- sched-wg_at_gridforum.org
- signup by sending mail to majordomo_at_gridforum.org
- Web site
- http//www.gridforum.org
- http//www.cs.nwu.edu/jms/sched-wg
7Ten Steps for Superscheduling
8Ten Steps for Superscheduling
- Goal List the basic steps and capabilities
involved in resource reservation, acquisition,
and use
9GGF1 Notes -- Doc 8
- Suggestion create web template to help with
gathering examples - Requirements definition
- perhaps should go before step 1
- could be broken into 2 parts (requirements for
everytime its run vs. requirements for this
particular run) - not clear this includes dynamic requirements,
e.g., I want 10 cpus on hostA or 20 cpus on
hostB -- are there any requirements that arent
known until run time? - Document doesnt address dynamic adaptation
- Uwe, Anand, Fraucesco Prelz will draft
- Iterative suggestion via Mary Thomas Keith
Jackson
1010 Steps Notes, cont.
- Setup Cleanup
- Staging has both a setup cleanup
- Doc currently only refers to stage out under
cleanup, need to include other stuff (e.g.,
breaking down an environment) - Steps connotes a temporal ordering, perhaps
change the title to 10 Things
11Next step for 10 steps
- Small wording changes (steps implies ordering)
- Comments from group on flow-picture waiting
- Will be bumped into donecategory when these are
agreed on
12Tokens
- There is general token discussion as part of the
dictionary group to take place weds - Within scheduling we will also need a group - to
be talked about tomorrow
13Attributes for Communication about Scheduling
Instances
14Purpose of this document
- Purpose of this document to list some important
features of lower level scheduler - Higher level scheduler may ask - Heres an
attribute - which lower level scheduler has it? - Define a set of lower level attributes for
matchmaking by higher level scheduler (?) - Define a common set of attributes that every
lower level scheduler should provide
15- The set of attributes that a lower level
scheduler should supply to a higher level
scheduler - capabilities, not interfaces
- capabilities of the scheduler, NOT the resource
- end goal - NOT a classification of schedulers
- attributes will have values - yes, no, number
- list of questions that a higher level scheduler
asks of a lower level scheduler
16Example
- Can I assign a node/resource exclusively using
your lower level scheduler?
17Agreement
- Document is useful
- Is existing set correct? Are they important?
Are the defs correct? Do we need to add more?
18Are the current list ok?
- List is all necessary, but doesnt cover minimal
set
19Things to add
- BW resv/QoS - do you have the ability to let me
reserve the bandwidth? - Can you tell me when a job will start?
- (What about monitoring or performance
prediction?) - (what about interface to resources)
- Can you have confidence info about resource
access? - Are you authorized to talk to me?
20This document
- This is a list of tokens, not an API
- One possible next step - turn it into an API
21Bill Suggests
- Change focus to Im about to write a high level
scheduler - what are the attributes that a lower
level scheduler has (now) that I can take
advantage of - (this would address things like- how semantics
about a resource ties in to these queries to a
scheduler)
22Appendix
- Appendix of terms that should not be addressed in
this document, those things with an explicit
decision to ignore - Maybe have a better overview of things to divide
up the territory - road map document
23Next Step
- Add new attributes to document
- Send suggestions to uwe-
- uwe.schwiegelshohn_at_uni-dortmund.de
- Clarify focus of document
- Telecon to discuss this soon
24Advance Reservation API
25Why an API and not a protocol?
- An API is what the C programmer sees
- A protocol is what the network sees
- Not yet advanced enough for a protocol, so weve
started by talking about an API
26- Two-phase thing- desire to have soft vs hard resv
- Soft resv-
- airline reservation held for 24 hours,
- fancy restaurants dont let you reserve without
calling back, must have 2 phase commit - policy decision
- Current 2 phase commit, user specifies when
commit must be made, but this may be specified by
scheduler as well - Maybe claim call should return yes/no/come back
to me at this time
27- How probable is need for fancy restaurant
reservation? - Since on the fly we think we can extend the API
to handle this situation, this comment taken
back...
28terminology
- What about changing terminology - some confusion
between bind, claim, commit, confirm - Perhaps add a sub-section for common definitions
- Maybe define them as X1, X2 and X3 instead of
using common terms that carry baggage
29Do we need a 2-phase commit call in this API (for
a single resource)
- Globus project says they need a keep alive
feature
30Evaluation of completeness
- Do you have scenarios to use to evaluate this
API? - we want to support co-scheduling yada yadaAdd
1-3 scenarios (simple) to this document - Reservations on 4 CPUs on 2 machines
31- Does this fit other schedulers?
- Epema says yes for prun (delft, nl)
- yes again
32Is the API generic enough?
- Current match to bw brokers
- What about mass storage systems?
- there is a disk resource in one impl. of this API
- what about the need for a specific path name?
- User asks I need 8 gig of fast storage, would
be nice if this could be returned without just a
callback - Policy field could be used for some of this info
perhaps
33Appendix addition- to be addressed in the future
(?)
- A reservation system might be able to revoke a
portion of the reservation, and the user should
be able to decide which portion gets taken away
(?) (return dialog to resolve conflicts) - This may be more common with storage
- This (only?) happens when you overbook
- Call-back specified more clearly could do this
- The API covers this case although a more
convenient addition to the API to cover this
might be recommended in the future
34Semantics Question
- Purpose of an API is to write a general program
- independent of resource type, but because of
semantics (esp with bind) this may not make
sense, so does it make sense to have resource
type as a parameter instead of just having a
separate call for a resource type? - If semantics of bind are diff enough based on rsc
type, then maybe they should be separated out
into diff calls? - This is just a naming issue, so lets not deal
with it
35- How do call backs get handled?
- Different thread/write program to handle
interrupts right now (with one implementation)
36- add info on feedback in response to the bind
- does that go here or in another API
- in existing CPU systems this is not part of the
adv resv system - This might make resv useless in some cases if it
isnt specified
37What needs to happen next?
- 1 sent API vs Protocol
- Perhaps add a sub-section for common definitions
(claim, bind, etc) - Completeness scenarios
- Specify semantics of calls a bit clearer
- add info on feedback in response to the bind
(does that go here or in another API) - Appendix (ignore now) additions -
- no fancy restaurant resv. thing
- mass storage example
38When is it done?
- Make the minor cosmetic changes, and this
document is done - Implementation(s) are implemented
- Alain and Volker are working on one currently
- other volunteers?
- Well check with Charlie for more procedural stuff
39Scheduling Resource ManagementWorking Group
- Jennifer Schopf Bill Nitzberg Uwe
Schwiegelshohn, co-chairs - www.cs.nwu.edu/jms/sched-wg/
- sched-wg_at_gridforum.org
- Global Grid Forum 1
- Amsterdam, The Netherlands
- March, 2001
40High-level OverviewSolve Grid Resource
Management
- Who? -- Developers
- What? -- Agreements / standards
- Capabilities, general protocols, APIs
- Why? -- Interoperability
- Reserving, allocating, using resources
- Managing resources (owners pt-of-view)
- Support co-scheduling diverse resources
- Enable "better" "use of" resources
41Charter
- Look at what is done today, gather
requirements - ...refining protocols, interactions, etc.
- ...work to standardize APIs
42Since GF-5
- Teleconferences (every other week, starting in
Jan.) - Mailing list
- sched-wg_at_gridforum.org
- signup by sending mail to majordomo_at_gridforum.org
- Web site
- http//www.gridforum.org
- http//www.cs.nwu.edu/jms/sched-wg
43Plans for GGF-1
- Monday 2-330
- SuperScheduling 10 steps (30 mins)
- Token definition (60 mins)
- information proper nouns, semantics,
representation - Monday 4-530
- Advance reservations
- Tuesday 4-540
- New topics - give J,B or U a heads up, bring your
5 min. description with you, to be followed by 5
min discussion - Wednesday 11-1230
- Dictionary Group BOF (our tokens are part of this)
44Ten Steps for Superscheduling Summary
- Goal List the basic steps and capabilities
involved in resource reservation, acquisition,
and use - Next Steps
- Small wording changes (steps implies ordering)
- Comments from group on flow-picture waiting
- Will be bumped into donecategory when these are
agreed on
45Attributes for Communication about Scheduling
Instances
- GoalDefine a set of attributes that a higher
level scheduler can ask of a lower-level
scheduler - Next Steps
- Clarify focus of document
- Add new attributes to document
- Telecon to discuss this soon
46Advance Reservation API
- Why an API and not a protocol?
- Not yet advanced enough for a protocol, so weve
started by talking about an API - Next steps
- Minor editorial adjustments
- Completeness scenarios
- Add Appendix (ignore now items)
- Move to done before next GGF
47Advance Reservation Protocol
- Allow advance reservations ebtween systems (NOT
and API) - Proposed by Volker Sander, Keith Jackson, Karl
cz, John Karpovich
48Todays Meeting - New Topics
49Tokens
- We need a list of tokens and definitions to
coordinate with the dictionary group - (BOF tomorrow at 11am)
- This list has been started online already
- www.sdsc.edu/mthomas/GF/terms/
sched-terms.cgi - We propose drafting a document to move this
effort forward
50Current words
- account dispatch process advance reservation fair
share processor allocation job quality of
services application load queues application
scheduler load balancing quota application
software machine scheduler reservation bandwidth
memory resource batch meta-schedulers scheduler
batching node scheduling co-allocation
partitioning staging cpu perfect load balance
user disk priority user account
51Architecture?
- Why would we want one?
- To give the discussion a framework
- We punted - low hanging fruit was enough
- Which schedulers support adv resv?
52- What about annotating 10 steps with other WG
interactions? - Document proposal to be sent to WG
53Resource models
- Resource management and resource specification
language - might be part of the protocol
- MDS schema
- not just terms, but values/hierarchy
- what information is important
54- Is there a need for a document to describe what a
user expects from a scheduling system? - Who is the user?
- Will there always be a sw layer between the user
and the scheduler? - Do we need to address user involvement or not?
- What about economic models? - see accounting
55- Do we need a paper to address user involvement?
- Karl cz says - everyone will need to do it, so we
probably dont need a paper - what about doing control flow?
- is this a sched-wg topic of a user interface
issue? - Who is the user?
- What about delegation?
- Maybe not yet...
56Scheduling WG Summary
- Three documents discussed
- Ten Actions for SuperScheduling, Arch/Framework,
J. Schopf - basically done, only minor edits suggested
- Lower Level Scheduling Attributes,
Syntax/Language, U. Schwiegelshohn - in progress, refocusing, to be discussed in
telecon - Advance Reservation API, API, A. Roy, V. Sander
- basically done, only minor edits suggested
- New areas suggested for attention
- Advance reservation Protocol, Protocol, A. Roy,
V. Sander, K. Czajkowski, J. Karpovich - Scheduling Dictionary, Syntax/Language, J. Schopf