Grids and Grid Scheduling - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Grids and Grid Scheduling

Description:

Grids and Grid Scheduling. Jennifer M. Schopf. Argonne National Lab ... How many people know what Grids and Grid computing are? ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 22
Provided by: jennife62
Category:
Tags: grid | grids | scheduling

less

Transcript and Presenter's Notes

Title: Grids and Grid Scheduling


1
Grids and Grid Scheduling
  • Jennifer M. Schopf
  • Argonne National Lab
  • UK National eScience Center (NeSC)

2
Questions for you-
  • How many people know what Grids and Grid
    computing are?
  • How many people are familiar with Globus?
  • How many have heard of OGSA/OGSI or WS-RF?

3
What is a Grid
  • Resource sharing
  • Computers, storage, sensors, networks,
  • Sharing always conditional issues of trust,
    policy, negotiation, payment,
  • Coordinated problem solving
  • Beyond client-server distributed data analysis,
    computation, collaboration,
  • Dynamic, multi-institutional virtual orgs
  • Community overlays on classic org structures
  • Large or small, static or dynamic

4
Not A New Idea
  • Late 70s Networked operating systems
  • Late 80s Distributed operating system
  • Early 90s Heterogeneous computing
  • Mid 90s - Metacomputing
  • Then the Grid Foster and Kesselman, 1999
  • Also called parallel distributed computing

5
Why is this hard/different?
  • Lack of central control
  • Where things run
  • When they run
  • Shared resources
  • Contention, variability
  • Communication
  • Different sites implies different sys admins,
    users, institutional goals, and often strong
    personalities

6
So why do it?
  • Computations that need to be done with a time
    limit
  • Data that cant fit on one site
  • Data owned by multiple sites
  • Applications that need to be run bigger, faster,
    more

7
Broader Context
  • Grid Computing has much in common with major
    industrial thrusts
  • Business-to-business, Peer-to-peer, Application
    Service Providers, Storage Service Providers,
    Distributed Computing, Internet Computing
  • Sharing issues not adequately addressed by
    existing technologies
  • Complicated requirements run program X at site
    Y subject to community policy P, providing access
    to data at Z according to policy Q
  • High performance unique demands of advanced
    high-performance systems

8
Relation to Other Approaches
  • Distributes computing
  • Generally a client-server model
  • Parallel computing
  • Limited to one machine/site
  • Peer-to-peer technologies
  • Limited scope and mechanisms
  • Enterprise-level distributed computing
  • Limited cross-organizational support
  • Web services
  • Not dynamic

9
Grids Pre-WS
  • Before web services, all functionality was
    develped using different protocols, error
    systems, etc
  • Job Submission
  • GT2 GRAM
  • Modified RPC protocol
  • File Transfers
  • GT2 GridFTP
  • FTP protocol (wuftp server)
  • Monitoring
  • GT2 MDS2
  • LDAP protocol

10
Web-Service Based Grids
  • Use industry standards for internet interactions
    for distributed computing space
  • NOT related to web browsers
  • Protocol SOAP
  • Language/Schema WSDL and XML
  • All functionality shares protocols, error
    handling, etc
  • For Globus Toolkit, underpinnings are now WS-RF -
    the web service resource framework standard
  • Oasis standard
  • Support from IBM, HP, Sun, Microsoft, Axis, etc.

11
Grids for us
  • We may want to start within one administrative
    domain
  • Not really a Grid but a starting point
  • We likely want to stick with old tech (GT2?) as
    tried-true technology
  • Plan to move to WS-based Grid in next phase
  • One first step will need to be decision about
    what resources we have, scope of problem

12
Any basic Grid questions?
13
Grid Scheduling
  • Grid scheduling
  • Resources over multiple administrative domains
  • Pick resource set (1 or co-scheduling)
  • Assign tasks within that resource
  • User is currently the most common Grid
    Scheduler
  • GGF defined 10 steps currently performed by
    Grid-level schedulers

14
Context, cont.
  • Grid schedulers arent Local Resource Managers
    (LRMS)
  • no ownership or control over resources
  • jobs get submitted to LRMS as user
  • Grid scheduler doesnt have control or often even
    info about job submitted at this level
  • Some project one-off solutions but nothing really
    production yet

15
In a nutshell...
16
Our interests...
17
2. Application Definition
  • User
  • Generally user defined
  • Often inaccurate, incomplete
  • Ideally
  • Smart compilers or other tools to automatically
    generate information about application
    requirements and runtimes
  • Todays systems
  • User defined at the command line info or Condor
    ClassAds

18
Application Definition for Us
  • Predict resource requirements for different
    algorithms
  • We will need basic data about what should run
    where
  • NOT flop counts memory, disk, etc will also
    matter
  • Basic benchmarks on known machines might work?
  • How will we evaluate tradeoffs?

19
4. Information Gathering
  • Dynamic searches to match resources with
    application requirements
  • User
  • Might use Grid Info System (GIS) like the Globus
    MDS or might just know

20
Information Gathering For Us
  • What data do we need?
  • What is the right way to collect it?
  • How long will it remain valid (update rates)
  • What are the dependencies?

21
5. System Selection
  • Matching between resources and application
    information
  • May involve 2 steps choosing resource(s) and
    then mapping tasks within that choice
  • Closely tied to what data you have available
  • Users
  • Best estimate
  • What is needed
  • Matches based on current information, using
    variance information and other predictions
  • Todays systems
  • Condor - matchmaking
  • PBS - heuristic algorithms
  • Maui/Silver - submit to local sites, evaluate

22
System Selection For Us
  • What data do we have
  • Selection of resources vs mapping within a
    resource
  • Tradeoffs between algorithm complexity and
    resource strength
  • Optimization function is completion time? Cost?
    Level of detail with a deadline?

23
Resources
  • 2 intro papers by Foster et al.
  • Anatomy of the Grid Physiology of the Grid
  • www.globus.org -- then to publications
  • Grid scheduling
  • Ten Actions When Grid Scheduling, Jennifer M.
    Schopf, Chapter 2 in Grid Resource Management for
    Grid Computing, Kluwer Publishing, October 2003.
  • Ten Actions When SuperScheduling, Jennifer M.
    Schopf, Global Grid Forum Document GFD.04, 2003
  • www.mcs.anl.gov/jms/Pubs/sched.arch.2002.pdf
  • This talk
  • www.mcs.anl.gov/jms/Talks (not there yet)

24
1. Authorization Filtering
  • Where do you have an account?
  • User
  • List in a drawer
  • Ideally
  • A wallet of credentials, smart enough to
    remember my username at different sites as well

25
3. Minimum Requirement Filtering
  • Use static data to limit the search space
  • Used to cut down dynamic queries needed
  • Can be combined with dynamic search (4)
  • User
  • I know I need Linux, dont consider running on
    other machines

26
6. Advance Reservation (Optional)
  • Reserve resources in a guaranteed way
  • Users
  • Call up sys admins and friends (call, on the
    phone)
  • Ideally
  • Automatically done when you submit a job based on
    user requirements
  • Current systems
  • Enabled in PBSPro and Maui

27
7. Job Submission
  • Run the job on the resources selected
  • User
  • Qsub
  • Ideally
  • Make it so
  • Current systems
  • Each has its own API

28
8. Preparation tasks(11. Clean-up tasks)
  • File transfers, directory set ups
  • Users
  • Scp, ftp, mkdir
  • Ideally
  • Automatically done as part of job submission
  • Current systems
  • Condor/DagMan can do file staging

29
9. Monitoring Progress
  • How is my job doing?
  • Should I move it somewhere else?
  • Users
  • qstat
  • Moving is hard to do, so generally not done
  • Ideally
  • System takes care of it based on intuitive
    knowledge of user requirements, and good
    prediction techniques
  • Current Systems
  • Every LRMS has a stat command
Write a Comment
User Comments (0)
About PowerShow.com