Juli Rew - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Juli Rew

Description:

Allocation Holds - Job flagged if a project/division exceeds its 30-day or 90 ... H1 and H2 jobs reordered at a priority above standby but below non-flagged jobs ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 13
Provided by: Julia4
Category:
Tags: flagged | juli | rew

less

Transcript and Presenter's Notes

Title: Juli Rew


1
Scheduler Basics
  • Juli Rew
  • CISL User Forum
  • May 19, 2005

2
Overview
  • IBM Scheduling
  • Life of a Job
  • Submit Filter
  • Batch Priority Scheduler
  • Factors Affecting BPS Job Scheduling
  • LoadLeveler
  • Load Sharing Facility Scheduling
  • LSF Scheduling on Linux Systems
  • Differences from IBM Scheduling

3
IBM Scheduling Life of a Job
llsubmit job
Submit Filter Requirements Processing
BPS Job Ordering
LoadLeveler Job Execution
Build Ordered List of Jobs
Job Starts
Requirements Not Met Reject Job
Requirements Problem Staff Rejects Job
Job Completes
Done
Done
Done
4
Submit Filter Features
  • Checks the LoadLeveler job script for
  • - valid parameters
  • - valid queue name
  • - consistent combinations of features, eg.,
    shared/not_shared, tasks_per_node/node options
  • Moves jobs with allocation holds to hold queues
  • Moves jobs with cutoff projects to standby queue

5
Batch Priority Job Scheduler Features
  • Written at NCAR
  • Orders jobs based on policy
  • Creates separate facilities (Community, Climate
    System Laboratory)
  • Further separates jobs into proposal groups
    (NCAR/UNIV, CCSM/oCSL)
  • Hands the final order list to LoadLeveler
  • Allows for backfilling of jobs to avoid idle
    resources

6
Bluesky Queue Priorities
all_spec all_spec all_spec all_spec
all_sp32 all_sp32 all_sp8 all_sp8
CSL NCAR UNIV csl_sp32 csl_pr32 .. csl_sb32 COM NCAR UNIV com_sp32 com_pr32 .. com_sb32 CSL CCSM oCSL csl_sp8 csl_pr8 .. csl_sb8 COM CCSM oCSL com_sp8 com_pr8 .. com_sb8
interactive, debug, share, test interactive, debug, share, test interactive, debug, share, test interactive, debug, share, test
7
Prioritization of Jobs by BPS
  • all_spec jobs run with the highest priority and
    can access all nodes
  • Below that, all com and csl jobs divided equally
  • Round Robin by Group/User
  • ------------------
  • all_spec
  • ------------------
  • com csl
  • \ /
  • top job
  • 50-50 split not hard

8
Other Factors Affecting Job Scheduling
  • Backfilling - Jobs that will not interfere with
    start of highest priority job allowed to slip in
  • - Sweet spot lt 3 hours and small node count
  • Allocation Holds - Job flagged if a
    project/division exceeds its 30-day or 90-day
    allocation thresholds
  • - H1 and H2 jobs reordered at a priority above
    standby but below non-flagged jobs
  • Special Initiatives - Nodes reserved for
    real-time or other special runs

9
Documentation and Utilities
  • batchview command gives snapshot of current
    ordering
  • Basic information on scheduling given at
  • http//www.scd.ucar.edu/docs/ibm/ref/llsched.html

10
LoadLeveler
  • IBM's batch control job system
  • Allows jobs to be started, stopped, or cancelled
  • Controls allocation of resources (CPU, memory)
  • Allows custom scheduler plug-in (e.g., BPS)
  • Two mutually-exclusive options LoadLeveler
    scheduler or custom scheduler.

11
Load Sharing Facility
  • Commercial product from Platform Computing
  • Currently being used on major Linux platforms
  • Also available for IBM, but still in evaluation
  • Ability to do Hierarchical Fair-Share Scheduling
    with Backfill, based on same facility scheme used
    in BPS
  • Community/CSL facility division implemented
    implicitly within the scheduler rather than
    explicitly by queue name
  • Can schedule among multiple platforms - "Grid

12
Questions?
Write a Comment
User Comments (0)
About PowerShow.com