Scheduling - PowerPoint PPT Presentation

1 / 129
About This Presentation
Title:

Scheduling

Description:

Generalizing acquisition modes. Service guarantees. Access schedule ... Keep in mind with multi-resource requests partial acquirement is a likelihood. 37 ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 130
Provided by: billni
Category:

less

Transcript and Presenter's Notes

Title: Scheduling


1
Scheduling Resource ManagementGGF3 Area
  • Jennifer Schopf Bill Nitzberg, co-directors
  • www.cs.nwu.edu/jms/sched-wg/
  • sched-wg_at_gridforum.org
  • GGF 3 Meeting
  • Frascati, Italy
  • 7-10 Oct 2001

2
Meeting SummaryScheduling Resource
ManagementGGF3 Area
  • Bill Nitzberg Jennifer Schopf, co-directors
  • www.cs.nwu.edu/jms/sched-wg/
  • sched-wg_at_gridforum.org
  • GGF 3 Meeting
  • Frascati, Italy
  • 7-10 Oct 2001

3
Active Groups
  • Ready for review
  • 10 Actions for Superscheduling in review
  • Advance Reservation API in review
  • Scheduler Attributes WG 13
  • Standardizing Run my job
  • BOF Scheduling Command Line Interface 15
  • BOF DRM Application API 20
  • WG Grid Resource Management Protocol 31
  • Others
  • WG Scheduling Dictionary no meeting
  • BOF Scheduling Optimization 23

4
WG Scheduler Attributes13 people
  • Presented revised document
  • Minor revisions suggested
  • Suggestion to look at Data Grid attributes
    (perhaps as next step)
  • Document is ready to enter
  • GGF publication process

5
Grid Resource Management Protocol WG31 people
  • Presented revised document (SchedWD 12.1) focused
    on requirements
  • Good feedback and discussion on the overall space
    of issues
  • Next steps
  • Better grouping capabilities vs. requirements of
    the protocol
  • Include both requirements and non-requirements
  • Identify low hanging fruit capabilities and
    draft protocols for them
  • Perhaps a name change (which is more specific)
  • As the protocol is not intended for
    administrative management of resource managers
    (e.g., defining policy)

6
BOF Distributed Resource Management Application
API20 people
  • Proposed charter
  • standard API for run my job
  • Base API on existing work from existing DRM
    systems (Intel, Sun, Veridian, Globus, )
  • Timeframe 9 months
  • Discussion and comments were positive
  • Good focus if its both quick, simple, and based
    on existing systems
  • Should be easily extendible for web services
    (portals)
  • Next steps
  • Finalize charter via sched-wg email list
  • Work via email telecons to develop 1st draft

7
BOF Scheduling Command Line Interface15 people
  • Presented qprep document
  • Standard command line for Run my job
  • Discussion
  • User focused
  • Compared with POSIX Globus approaches
  • Next steps
  • Finalize charter via sched-wg email
  • Expand involvement

8
BOF Scheduling Optimization23 people
  • Proposal for a Research Group
  • Enable "better" "use of" resources
  • Issues
  • Metrics/constraints for better
  • Algorithms
  • Information gathering
  • Long term goal better optimization for real
    schedulers
  • Next steps
  • Discuss charter via sched-wg email

9
Scheduling AreaObjectives Progress
10
High-level OverviewSolve Grid Resource
Management
  • Who? Developers
  • What? -- Agreements / standards
  • Capabilities, general protocols, APIs
  • Why? -- Interoperability
  • Reserving, allocating, using resources
  • Managing resources (owners pt-of-view)
  • Support co-scheduling diverse resources
  • Enable "better" "use of" resources

11
Charter
  • Look at what is done today, gather
    requirements
  • ...refining protocols, interactions, etc.
  • ...work to standardize APIs and protocols

12
Process
  • Review working document (lt 5 minutes)
  • Focus on understanding the document (revisions)
    rather than correcting it
  • Gather discussion items during/after review
  • Prioritize discuss each item
  • Next steps (last 5 minutes)
  • Revised documents posted to web site within 2
    weeks.

13
GGF-3 Agenda
  • Monday
  • 1200-130 Grid Resource Management WG
  • Tuesday
  • 1000-1100 Scheduling Command Line API BOF
  • 1130-1330 Scheduling Optimization BOF
  • 1430-1530 Scheduling Attributes WG
  • 1600-1730 Distributed Resource Management
    Application API BOF
  • Not being presented here
  • Scheduling Dictionary WG

14
Grid Resource Management Working Group
(GRM)www.cs.northwestern.edu/jms/sched-wg/grm-w
g.htmlsched-wg_at_gridforum.orgInterim Chair
Bill Nitzberg bill_at_computer.org
  • Grid Resource Management Protocol Requirements
  • Authors Karl Czajkowski Volker Sander
  • SchedWD 12.1

15
History -- Grid RM Protocol Requirements
  • Child of Advance Reservation API work
  • Charter expanded from Advance Reservation to
    Resource Management
  • GGF-2 consensus you basically have to do most of
    a general RM protocol to do Advance Reservations
    anyway
  • Charter focused to start with Requirements

16
Why Protocol? ? Interoperability
  • Command Line standards
  • API (Application Programmer Interface)
  • Protocol
  • Language/Syntax

17
A Protocol can have Multiple APIsE.g., TCP/IP
  • TCP/IP APIs include BSD sockets, Winsock, System
    V streams,
  • The protocol provides interoperability programs
    using different APIs can exchange information
  • I dont need to know remote users API

Application
Application
WinSock API
Berkeley Sockets API
TCP/IP Protocol Reliable byte streams
Slide courtesy of Globus Tutorial --
www.globus.org
18
An API can have Multiple ProtocolsE.g., Message
Passing Interface
  • MPI provides portability any correct program
    compiles runs on a platform
  • Does not provide interoperability all processes
    must link against same SDK
  • E.g., MPICH and LAM versions of MPI

Slide courtesy of Globus Tutorial
www.globus.org
19
Grid RMPerhaps a Document Series
  • Grid RM Protocol Requirements
  • Grid RM Protocol Operations
  • Capabilities and semantics
  • Leverage Other Grid Services Standards
  • Transport services
  • Security
  • Language
  • Grid RM Protocol Bindings

20
RequirementsSchedWD 12.1
  • Control
  • Extensibility
  • Notification
  • Reliability
  • Protocol timers
  • Negotiation
  • Hierarchy
  • Secure messaging
  • Security language
  • Resource language

21
1. Control
  • Remote resources (diff. administrative domains)
  • Coordination
  • Wording may imply accting resource (consumption)
    check on this are we considering this? How
    does accounting piece interface here?
  • Consumption reserving and utilizing, which is
    meant here?
  • Consumption negotiation of rscs vs negotiation
    of capabilities

22
Charter question
  • clarify that this is a clients running job
    focus not a management interface, this is
    resource access and assignment
  • Run my job from the resource consumers perspective

23
2. Extensibility
  • Bill says tastes great, less filing (of course
    we want it extensible)
  • There will be issues in charging and accounting
    that will be difficult to address
  • For example switch in accounting methodology
    from charge per cpu hour vs congestion based
    charging
  • So we might want to be clear if this is not
    included in what we mean by extensibility

24
3. Notification
  • Asynchronous as well as synchronous

25
4. Reliability
  • What do we mean by reliable semantics
  • One possible meaning default does what you
    think it should
  • Another is that at every point a protocol can
    fail it is recognized and taken care of in the
    proper way (aka you dont submit a job twice, or
    not at all)
  • reliable semantics is a confusing term,
    reliability semantics might be the better way to
    say this (as a first guess)

26
5. Protocol timers
  • Low level enough bill wants to skip over )

27
6. Negotiation
28
7. Hierarchy
  • Simple version if you end up stacking rms on
    top of each other it would be nice if they all
    talked the same protocol

29
8. Secure messaging
  • How much of this should be a requirement as
    opposed to a suggestion?
  • (Keep alives in particular)
  • What about data protection?
  • Integrity cant forge it
  • Protection (confidentiality) cant see it

30
9. Security language
  • Requirement vs supports?
  • Need a broker to be able to delegate
  • What about binding of authorization to a
    collection of resources?
  • Maybe a single capability that is a collection of
    services
  • This may be already being discussed in Sec Area
  • This binding needs to be determined

31
10. Resource language
  • What do we mean by structured language?
  • This needs to be clarified
  • It should not imply a strict hierarchy, for
    example

32
Other topics in the document
  • Monitoring notification
  • Reliable protocol semantics
  • Soft-state/keep-alive support
  • Generalizing resource types
  • Generalizing acquisition modes
  • Service guarantees
  • Access schedule
  • Delayed commitment
  • Dynamic binding

33
Other topics in the document (2)
  • Embedded resource language
  • Communication model

34
  • Authors have been asked to restructure doc
  • Qualities of the protocol
  • Capabilities that the protocol should support
  • Stuff we should work on right away, stuff that
    can wait a bit.

35
Delayed commit - 4.3, page 5
  • Why is this a requirement
  • This has come up in past meeting
  • Saying protocol has to be confirmed may be a
    simplifying assumption this should be clarified
  • Two examples
  • Fancy dinner resv which is confirm or lose it
  • Airline you can cancel it but it is confirmed
    automatically

36
Delayed commit - 2
  • How does accounting tie into delayed or revocable
    requests
  • Rephrase how much do we need to worry about
    accounting issues in this protocol?
  • We probably need to identify when in protocol you
    can cancel a request
  • Keep in mind with multi-resource requests partial
    acquirement is a likelihood

37
  • With language a multi-request may be nested
    inside, and many of these issues can be hidden

38
Quality of info discussion
  • It will become important that would allow the
    collection of metadata about requests
  • for example how often a request is granted
    (monitoring info)
  • Quality info EG ontime airline example, 5 star
    system for hotels
  • Express this quality in the language?

39
Next Steps
  • Another draft, additional authors(?)
  • Sched-wg_at_gridforum.org for comments
  • New version 2 weeks, telecon shortly thereafter
    (we hope)
  • 1) Which of these are required vs suggested?
    Minimal set?
  • 2) What about use cases or examples?

40
Scheduling command line API (SCLA) BOF
  • Joe WerneProposed new working group

41
Scheduling Command Line API (SCLA) BOF
  • Proposed New Working Group (SRM Area)
  • Led by J. Werne and J. Schopf
  • Standardizing a command line interface to
    schedulers

42
U.S. DoD Uniform Command-Line Interfaces for Job
Submission and Data Archivinghttp//www.pstoolki
t.org
  • Joe Werne, Michael Gourlay, Chris Meyer, Chris
    Bizon
  • Colorado Research Associates Division, NorthWest
    Research Associates, Inc.
  • Aram Kevorkian (Chair, Metacomputing Working
    Group), Bill Asbury,
  • Winfried Bernhard, Anthony DelSorbo, Mark Dotson,
    Joseph Robichaux (ASC), Virginia Bedford (ARSC),
    Bradford Blasing (AHPCRC), Dan Duffy, Rebecca
    Fahey, Jeff Hensley, David Sanders (ERDC), Mitch
    Murphy (MHPCC), John Skinner,
  • Ray Sheppard (NAVO), Steve Thompson (ARL)
  • DoD High Performance Computing Modernization
    Office (HPCMO)
  • and the users who have provided valuable
    feedback

43
DoD HPCMP Challenge Program
(an incubator for grid-related problem solving)
  • Multi-platform and multi-center compatibility
  • Run-time data transfer and migration off-line
  • Automated and Semi-automated error recovery
  • Automated job preparation and submission
  • Remote post-processing and visualization
  • Distance collaboration

44
The Problem
Site A
Code A
Site B
Code B
Code C
Site C
45
The Solution
Archive is a Routine that Tr anslates
from Your code to The native syntax And
semantics, Because archive Figures that You
have better Things to do with Your
tianslates from Your code to The native
syntax And semantics, to do with Your
tianslates from Your code to The native
syntax And semantics, me like Sc me
like Sc Ie nce! Because
archive Figures that You have better Things
to do with is a Routine that Tr anslates
from Your code to The native syntax And
semantics, Because archive Figures that You
have better Things to do with
Site A
Site B
Site C
46
DoD HPCMO Metacomputing Working Group Initiative
Goals
  • Provide translation tools to all DoD users in the
    form of a uniform command-line interface.
  • Write nimble translation tools that are modular,
    maintainable, and reusable.
  • Shepherd implementation on all major U.S.
    super-computer centers not just the DoD centers.

47
DoD HPCMO Metacomputing Working Group Initiative
Methods 1. Open Source
  • Enhance uniformity
  • Permit user implementation and modification
  • Maximize community support
  • Avoid duplicated effort

48
DoD HPCMO Metacomputing Working Group Initiative
Methods 2. Perl
  • Advanced text-processing capability
  • Ample support from open-source community
  • POD man pages, html, LaTeX, FrameMaker

www.perldoc.org
49
Tools for Uniform SuperComputing (TUSC) qprep
qprep simplifies and unifies job submission to
queues qprep is not a replacement for existing
queuing systems, e.g., PBS, NQS, rather,
qprep is a translator between them.
50
Tools for Uniform SuperComputing (TUSC) qprep
qprep will work in one of two ways, both of which
are intended to be familiar to users 1.
Command-line arguments qprep nodes512
walltime2400 script 2. Pseudo-comment
directives in script preamble PSTQ
nodes512 PSTQ walltime2400 qprep will
edit the script, translating the preamble to the
native queuing system. To the extent possible,
our plan is to implement qprep via a translation
table.
51
Tools for Uniform SuperComputing (TUSC) qprep
qprep prototype code is currently running on T3E,
O2K, O3K ( Compaq) systems at six supercomputer
centers ERDC, NAVO, ASC, AHPCRC, PSC,
SDSC. qprep specification for the more general
routine (which currently does not exist) has
grown from 1. User feedback 2. Review
of NQS, PBS, LSF, GRD functionality qprep
specification contains 39 directives for job
environment (4), reporting (5), job control (3),
qprep control (4), limit (11), job dependence
(11), pass through (1)
52
Tools for Uniform SuperComputing (TUSC) qprep
JOB ENVIRONMENT DIRECTIVES
accountaccount_string usernameuserlist
user_at_host,user_at_host, exportvarsy
export shell variables to script
shellshell_name REPORTING DIRECTIVES
stderrfile stdoutfile keepstderr,stdout
maila,b,e,r,t abort,begin,end,rerun,
routed mailtouserlist
53
Tools for Uniform SuperComputing (TUSC) qprep
JOB CONTROL DIRECTIVES queuename
jobnamename checkpoint(nsnumber)
neverqueue shut downevery (number)
minutes qprep CONTROL DIRECTIVES silenty
do not write job identifier when
submitting to queue outscriptoutfile
by default, outfile is filename.pst submity
erasey if submity, erase translated
script after submission
54
Tools for Uniform SuperComputing (TUSC) qprep
LIMIT DIRECTIVES nodesnumber
cpuspernodenumber walltimetime
maximum walltime for script processcputimetime
maximum CPU time for any single process
in script totalcputimetime total CPU
time used by all single processes in script
procfilesizesize maximum total size of
files for single process totalfilesizesize
maximum total size of all filles for all
processes tapemt(abcdefgh)number
maximum number of tape drives in the device
class nicenumber nice level for
script when resources are shared
processmemorynumber maximum memory used
by a single process (shared)
totalmemorynumber total memory used by
all processes in script (shared)
55
Tools for Uniform SuperComputing (TUSC) qprep
JOB-DEPENDENCE DIRECTIVES synccountnumber
synchronize (number) jobs executed by 1st
job syncwithnumber synchronize with
job in which synccount is set
afterjobid,jobid, run after
specified jobs have begun afterokjobid,jobid,
run after specified jobs have ended
w/out errors afternotokjobid,jobid,
run after specified jobs have ended w/errors
afteranyjobid,jobid, run after
specified jobs have ended /- errors
dependsonnumber run after (number)
before dependencies satisfied
beforejobid,jobid, permit specified
jobs after current job begins
beforeokjobid,jobid, permit
specified jobs after current ends w/out errors
beforenotokjobid,jobid, permit
specified jobs after current ends w/errors
beforeanyjobid,jobid, permit
specified jobs after current ends /- errors
56
Tools for Uniform SuperComputing (TUSC) qprep
EXTRA DIRECTIVES passthrougharguments
pass quote-delimited commands directly to
underlying queuing system
57
(No Transcript)
58
Tools for Uniform SuperComputing (TUSC) qprep
qprep implementation will follow along lines of
archive
  • qprep will be written in two layers
  • qprep.pm will be a Perl module containing
    subroutines that are site-independent.
  • local.pm will be a Perl module which contains
    subroutines that depend on the local system.
  • local.pm will intentionally contain as little
    code as possible and ample comment statements to
    facilitate implementation at a new site.

59
Example local.pm (from archive routine)
sub local_put my (host,path,file)_at__ my
(line,status) line /bin/cp -f file path
2gt1 status? return(line,status) su
b directory_exists my (host,path)_at__ my
(direxists) if ( -d path ) direxists
"true" else direxists "false"
return(direxists) sub get_file_size my
(host,path,file)_at__ my (size) size
-s "pathfile" return(size) sub
local_migrate my (host,path,file)_at__
print "local_migrate COMMENT Doing
nothing.\n" print "If this were a real
archival system, this would migrate your
file.\n" return(0)
60
Learn more at www.pstoolkit.org
61
Learn more at www.pstoolkit.org
62
PST Schedule
Over the next 12 months, tools will be
implemented, released, and refined based on user
input, and the number of centers that support PST
will increase, aided by the open source
philosophy. May 18, 2001 Initial release of
archive man page, establish www.pstoolkit.org
email groups June 18, 2001 Advertise PST at
DoD HPC UGC July 15, 2001 Initial release of
qprep man page. October, 2001 Implement archive
on ERDC, NAVO, ASC, ARL, AHPCRC, MHPCC, SDSC,
PSC December, 2001 qprep version 1.0
release. May, 2002 Initial release of MD and
SET layers. October, 2002 Full release of
TUSC, MD, and SET layers. ACT PEP layer
codes will be released as contributed by users.
63
Notes and Next Steps
  • What about Globusrun?
  • Doesnt allow users to define own
  • Globus may not address the attributes issues
    addressed here
  • All these sites will not be running a Globus
    gatekeeper so using the Globus code is probably
    the wrong way to go about this

64
What about the posix standards?
  • Standard doesnt go far enough (possibly)
  • Written before smp nodes existed thereby
    ambiguous
  • What are the 5-6 things needed that arent in the
    posix standard, agree on those, and these could
    get added in.
  • However. LSF does not follow posix standard

65
People to coordinate with
  • Fabrizio Pacini, (Datamat), fpacini_at_datamat.it
  • Mike Russell (U of Chicago, Cactus),
    russell_at_cs.uchicago.edu
  • Jenny Schopf (for a better Globus contact.)
    jms_at_mcs.anl.gov

66
Next Step
  • Write charter, determine if there are people to
    help

67
Scheduling Optimization BOF
  • Vincenzo DiMartino and Marco Mililotti
  • Proposed New Research Group

68
BOF Scheduling Optimization
  • Proposed Research group on scheduling
    Optimization techniques
  • SRM-OPT Scheduling and Resource Management Area
  • (area chairs Bill Nitzberg, Jennifer Schopf)
  • Discussion coordinators
    Vincenzo Di Martino, Marco Mililotti

69
BOF Scheduling Optimization
  • BOF goal
  • To gather interest and requirements and .
  • to start a Research Group on scheduling
    optimization Open to anyone in this room,
  • Please specify your level of future involvement
    in the R.G. in the BOF participant list

70
Tentative program (subject to be changed on the
fly)
  • 15 discussion on Scheduling Optimization
    meaning.
  • 30 discussion on Technicalities to obtain
    optimal Scheduling
  • 15 discussion Interaction with the others WGs
    and RGs and crossfertilization.
  • 30 discussion Research Group Milestones and
    Organization active members and interested
    members.
  • 30 GGAS experience presentation

71
15 What is the meaning of scheduling
optimizaton
  • The art of running as much job as possible with
    the minimum usage of resources
  • How to avoid resources request conflict between
    Jobs
  • How to maximize the GRID computing total
    troughput without to penalize the single
    job/user.
  • To negotiate different cost/performance to
    predictable users
  • To keep at minimum the Geographical area network
    load

72
Scheduling new techniques and practice.
Evolutionary Algorithms
  • Genetic Algorithms
  • Genetic Programming
  • Evolutionary programming

Tabù search- reactive tabù search
Swarm Intelligence Agent based systems Particle
swarm optimization Ant colony
73
WG and RG tight binded
All the SRM wg and rg
  • Performance wg
  • Application RG
  • Architecture wg

74
Milestones and Research Group activity plan
October 01 gtgt E-mail distribution list and www
site November 01gtgt Research group draft
document December 2001 ? Prototype Software for
R.G. repository Two month R.G progress revision
process. GGF meeting and activity
synchronization
75
Notes/suggestions
  • Common vocabulary
  • What are constraints
  • What are the metrics
  • High level optimization topics
  • What are the constraints? How do we do smart
    choices? What do users want?

76
Output of a RG/WG papers - what papers will be
discussed here?
  • Possibilities
  • There are many different approaches to scheduling
    - taxonomy
  • What are the constraints that are possible for
    optimization
  • How should constraints be described?
  • How to capture/suggest flexibility to a current
    scheduling system

77
Layered approach
  • First goal - identify other groups
  • Paper list of capabilities that are required to
    do good optimization, what constraints to be used
    - dynamic, etc.
  • Paper what is being used in the current
    scheduling systems? How does this relate to
    research in optimization?
  • Longer term goal
  • Development of a simulator to test environment
  • Needs constraints, language, etc
  • Longest term goal get better optimization into
    functioning schedulers

78
Other possibilities- information
  • What do you assume is being furnished by the
    users?
  • Paper what information is needed by the
    optimization tech is being used
  • Paper What meta-data is supplied by the
    resources (nodes,hw,nw) in a grid? (whats
    currently being reported?)
  • Paper How are jobs being described? How should
    they be described in the future?

79
Suggestion
  • Paper comparing current schedulers optimization
    techniques -
  • This needs classification of techniques perhaps
  • Two views - compare user-optimized vs system
    optimized

80
Potential focus?
  • Different resources will have different
    schedulers/optimization techniques - how to
    decide between them
  • Given a single users running a job - might want
    to let user know what the choice in selecting
    this is (if a choice is available)

81
Next Steps
  • Define a charter Identify which paper topics
    should be looked at first
  • Since we cant do this on the fly - who wants to
    do it in email?
  • Bruno Volckaert has a related paper.
  • 8-10 people would like to discuss further on the
    maillist

82
Scheduling AttributesWorking Group
(SG)http//www.cs.northwestern.edu/jms/sched-wg
/sa-wg.htmlsched-wg_at_gridforum.orgChair
Uwe.Schwiegelshohn_at_udo.edu
  • Attributes for Communication between Scheduling
    Instances
  • Authors Uwe Schwiegelshohn Ramin Yahyapour
  • SchedWD 10.5

83
(No Transcript)
84
(No Transcript)
85
(No Transcript)
86
(No Transcript)
87
(No Transcript)
88
(No Transcript)
89
(No Transcript)
90
Notes
  • Should we include required job length in the
    document, or is this part of the resource
    description
  • Attendees (including Uwe) 13
  • Number who read the paper 1

91
Migration Attribute
  • Does this mean within the local management
    sphere or to the outside?
  • Answer outside (as this is only dealing with
    scheduler to scheduler interactions)
  • Migratable means I can stop the job and package
    it
  • What about I can receive a packaged job?
  • May need 2 attributes I can pack I can unpack
    and run
  • What about stop, move, restart someplace else in
    the system?
  • This is not migration, this is checkpointing
    and restarting
  • Perhaps it should be broken down as
  • Checkpoint continue
  • Checkpoint stop or stop checkpoint
  • Checkpoint, migrate, restart

92
Migration Attribute, cont.
  • You may not have to capture all state to migrate
    (e.g., you could forward messages in flight)
  • Checkpointing implies you can turn the system off
    (and the checkpoint is still OK) this is not
    necessarily true with migration.

93
Notes
  • High level scheduler vs. Low level scheduler?
  • Defined by behavior (a la client/server)
  • Whats the data model?
  • Some attributes are intrinsic, other attributes
    may be derived by combining the attributes of
    lower level schedulers
  • How you combine the attributes is not the subject
    of this document

94
Notes
  • Do we lose anything by restricting ourselves to a
    static heirarchy (e.g., all schedulers can be
    represented as a DAG).

95
Data Grid Scheduling Attributes
  • Document appears close enough for CPU-type
    scheduling, perhaps it should be done
  • It doesnt appear complete for data-type
    attributes
  • For example
  • Guaranteed data transfer completion (reliability)
    e.g., after the job is done, the stage-out will
    definitely happen
  • May need some timeframe (e.g., in less than 200
    years)
  • There are probably others

96
Notes
  • How about the opposite of guaranteed
    completion?
  • Perhaps it just doesnt set the attribute
  • Perhaps this assumption should be added to the
    document

97
Next Steps
  • Add
  • restart attribute
  • Assumption that lack of assertion of an attribute
    is equivalent to asserting the negative of the
    attribute
  • This document enters formal document process
  • Then
  • Architecture that makes use of this
  • API (maybe protocol) that uses this
  • Identify what existing schedulers provide
  • Will need a prototype to know if this is useful

98
- BOF -DRMAA Distributed Resource Management
Application APIProposed Co-chairsJohn
Tollefsrud john.tollefsrud_at_eng.sun.com,
SunBill Nitzberg bill_at_computer.org, Veridian
  • www.cs.northwestern.edu/jms/sched-wg
  • sched-wg_at_gridforum.org

99
Proposed Scope Run a Job API(Steps from Ten
Actions when SuperScheduling, GGF SchedWD 8.5,
J.M. Schopf, July 2001)
  • Phase 1 Resource Discovery
  • Step 1 Authorization Filtering
  • Step 2 Application requirement definition
  • Step 3 Minimal requirement filtering
  • Phase 2 System Selection
  • Step 4 Gathering information (query)
  • Step 5 Select the system(s) to run on
  • Phase 3 Run job
  • Step 6 (optional) Make an advance reservation
  • Step 7 Submit job to resources
  • Step 8 Preparation Tasks
  • Step 9 Monitor progress (maybe go back to 4)
  • Step 10 Find out J is done
  • Step 11 Completion tasks

100
Why API? ? Code Re-Use
  • Command Line standards
  • Script re-use
  • API (Application Programmer Interface)
  • Code re-use
  • Protocol
  • Interoperability
  • Language/Syntax
  • Re-use interoperability

101
An API can have Multiple ProtocolsE.g., Message
Passing Interface
  • MPI provides portability any correct program
    compiles runs on a platform
  • Does not provide interoperability all processes
    must link against same SDK
  • E.g., MPICH and LAM versions of MPI

Slide courtesy of Globus Tutorial
www.globus.org
102
A Protocol can have Multiple APIsE.g., TCP/IP
  • TCP/IP APIs include BSD sockets, Winsock, System
    V streams,
  • The protocol provides interoperability programs
    using different APIs can exchange information
  • I dont need to know remote users API

Application
Application
WinSock API
Berkeley Sockets API
TCP/IP Protocol Reliable byte streams
Slide courtesy of Globus Tutorial --
www.globus.org
103
(No Transcript)
104
(No Transcript)
105
(No Transcript)
106
(No Transcript)
107
(No Transcript)
108
(No Transcript)
109
(No Transcript)
110
(No Transcript)
111
(No Transcript)
112
(No Transcript)
113
(No Transcript)
114
(No Transcript)
115
(No Transcript)
116
Proposed Focus
  • Timing API defined in 9 months
  • E.g., Jul02 DRMAA v1.0 GWD submitted for review
  • Standardize existing systems
  • Dont invent something new

117
Next Steps
  • Solicit participation ? YOU ARE HERE
  • Especially resource management vendors
    application developers
  • Finalize charter, milestones,
  • Discuss via sched-wg email list
  • Submit to GGF chair to form DRMAA working group
  • Draft/standardize API

118
Proposed Charter
  • Develop an API specification for the submission
    and control of jobs to one or more Distributed
    Resource Management (DRM) systems.
  • The scope of this specification is all the high
    level functionality which is necessary for an
    application to consign a job to a DRM system
    including common operations on jobs like
    termination or suspension.
  • The objective is to facilitate the direct
    interfacing of applications to today's DRM
    systems by application's builders, portal
    builders, and Independent Software Vendors (ISVs).

119
Notes
  • Good focus if its both quick, simple, and based
    on existing systems
  • If it drags along, then maybe we should do a more
    in depth process
  • Is this applicable from a web service point of
    view?
  • XML? SOAP objects? Or some such

120
Scheduling Working GroupA Brief History
121
Advance Reservation Co-Scheduling Workshop,
May 1999
  • Defined reservation
  • Resource start end duration
  • Enumerated desired capabilities
  • de-coupled from job submission
  • unique printable reservation ID
  • query/response - returns list of available slots
  • hard and soft reservations
  • Enumerated harder stuff to put off til later,
    e.g., guarantee, cost model

122
Grid Forum 1 (NASA Ames)June 1999
  • Initial Charter
  • Solve Grid Resource Management
  • Three focus areas
  • Advance reservations
  • Super scheduling
  • Resource specification (semantics tokens)

123
Grid Forum 2 (Northwestern)October 1999
  • Refined charter
  • Requested
  • lists of tokens from different groups
  • architecture pictures of existing systems
  • Discussed What is X?
  • e.g., job, scheduler

124
Grid Forum 3 (UCSD)March 2000
  • Adopted charter refocused
  • Decided not to work on architecture
  • Developed Super-scheduler Model (10 steps)
  • Gave overviews of advance reservation systems
    (GARA, Maui, PBS, LSF)
  • Commitments to draft several SchedRFCs

125
Grid Forum 4 (Microsoft)July 2000
  • Changed Sched RFC to Sched Working Document
  • Revised working document drafts
  • Query Interface
  • Resource Acquisition Steps
  • Security Requirements
  • Advance Reservation API
  • Scheduling Information
  • Suggested new working documents
  • 10 Steps Run a job API

126
Grid Forum 5 (Boston) October 2000
  • Generic Grid Resource Description combined to
    become the new advance reservation API document
  • Ten Steps for Superscheduling ever closer to
    done
  • Security Requirements of the Scheduling Working
    Group passed on to Security as a usage scenario
  • Grid Query and Reservation Interface

127
GGF-1 (Amsterdam)March 2001
  • Three documents discussed
  • Ten Actions for SuperScheduling, Arch/Framework,
    J. Schopf
  • basically done, only minor edits suggested
  • in progress, refocusing, to be discussed in
    telecon
  • Advance Reservation API, API, A. Roy, V. Sander
  • basically done, only minor edits suggested
  • Lower Level Scheduling Attributes,
    Syntax/Language, U. Schwiegelshohn
  • New areas suggested for attention
  • Advance reservation Protocol, Protocol, A. Roy,
    V. Sander, K. Czajkowski, J. Karpovich
  • Scheduling Dictionary, Syntax/Language, J. Schopf

128
GGF-2 (Washington, DC)July 2001
  • Scheduling Dictionary
  • Collecting words
  • New co-chairs Mary Roehrig, Wolfgang Ziegler,
    and Jennifer Schopf
  • Scheduling Attributes
  • Presented and discussed draft document
  • Advance Reservation Protocol
  • Presented draft document
  • Refocused to Grid Resource Mgmt Protocol
  • Decided to attack requirements first.

129
GGF-3 (Frascati, IT)October 2001
  • WG Grid Resource Management
  • WG Scheduling Attributes
  • WG Scheduling Dictionary
  • New co-chairs, no GGF3 meeting
  • 3 potential new activities (BOFs)
  • Scheduling Command Line Interface
  • Scheduling Optimization
  • Distributed Resource Management Application API
Write a Comment
User Comments (0)
About PowerShow.com