Faucets Queuing System - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Faucets Queuing System

Description:

Tested on the cool Linux cluster at PPL. Adaptive jobs currently implemented in Charm and MPI ... The code has been checked for stack overflows ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 13
Provided by: sameer6
Learn more at: http://charm.cs.uiuc.edu
Category:
Tags: codes | cool | faucets | html | queuing | system

less

Transcript and Presenter's Notes

Title: Faucets Queuing System


1
Faucets Queuing System
  • Presented by,
  • Sameer Kumar

2
Basic Idea
  • Queuing System to manage Adaptive Jobs
  • Adaptive jobs
  • Jobs that can shrink and expand at runtime
  • Simulations Provided encouraging results
  • Also intended to be a general purpose queuing
    system that supports Generic, non-migratable
    Charm and MPI jobs

3
Adaptive Jobs
  • Jobs that can dynamically increase (expand) or
    decrease (shrink) the number of processors they
    are running on
  • Motivation
  • Improve system utilization
  • Decrease system response time
  • Properties
  • minpe, minimum number of processors required for
    the job, related to the memory requirements of
    the job
  • maxpe, maximum number of processors, related to
    speedup
  • profit, profit from running the job
  • deadline, deadline before which the job should be
    finished

4
Adaptive Job Example
  • Consider a 128 processor system
  • Job A arrives and requests 80 Processors and is
    started on 80 processors
  • Job B then arrives and requests 64 processors
  • In traditional systems Job B will be queued and
    allocated 64 processors only after Job A
    finishes, while part of the system remains idle
  • With adaptive jobs Job A can be shrunk to 64
    processors and Job B can be started and after job
    A finishes Job B can expand and use all the
    processors

5
Adaptive Job Scheduler
  • Adaptive Job Scheduler manages adaptive jobs
  • Three major components
  • Job Manager
  • Accepts jobs, schedules them on the parallel
    system, and frees resources when the job is done
  • Scheduling Strategy
  • A plug-able component that makes decisions on
    which jobs to schedule
  • Database
  • Logs all events that occur in the scheduler and
    can be used in case of a crash

6
Adaptive Job Scheduler
7
Performance
  • Simulation Results on 64 processors with mean job
    execution time of 64.5 sec for utilization
    maximizing
  • strategy
  • Experiments on Linux Cluster on 64 processors and
    mean job execution time of 60 sec

?Arrival Rate MRT Mean Response
Time Utilization Processor utilization Load
Factor (lf) Execution Time?
8
Features
  • Multithreaded for fast response
  • Logs all job related information to a database
  • This helps in crash recovery and,
  • Improves security of the system
  • Uses Unix sockets for communication
  • Unix Sockets improve the efficiency of the system
  • The also restrict access to the scheduler
  • Provides timed termination of the jobs

9
Features (continued)
  • Accepts both batch and interactive jobs
  • MaxPE and MaxTime are parameters to the system
    and can be used to restrict unlimited access to
    the parallel machine
  • Tested on the cool Linux cluster at PPL
  • Adaptive jobs currently implemented in Charm
    and MPI
  • For more details check out
  • http//charm.cs.uiuc.edu/research/faucets/faucets.
    html

10
Super-user Access
  • The Queuing system scheduler runs with super-user
    privileges
  • When a new job arrives it is executed with the
    permissions of the user
  • The code has been checked for stack overflows
  • Direct Access to the parallel machine is blocked
    by removing the permissions for rsh, ssh etc
  • To start a job the scheduler changes its group id
    to a Queuing System group which can access the
    cluster

11
Queuing System Commands
  • Similar to current queuing systems
  • fsub is the command to submit batch jobs to the
    queuing system
  • frun runs jobs interactively
  • fjobs lists the jobs
  • fkill can be used to kill jobs

12
Conclusions and Future Work
  • Queuing system has been tested and is ready to be
    installed on the Turing cluster
  • Make the scheduler manage multiple heterogeneous
    clusters by supporting the concept of queues
  • Some of the queues could be batch and others
    interactive
  • Interactive queues would allocate multiple jobs
    to the same node depending on the utilization of
    the nodes
  • Running the scheduler on SP2 and other
    multiprocessing architectures
  • One of the solutions would be to run the faucets
    scheduler on top of a commercial queuing system
Write a Comment
User Comments (0)
About PowerShow.com