Experiences with SSS software Architecture in a - PowerPoint PPT Presentation

About This Presentation
Title:

Experiences with SSS software Architecture in a

Description:

Experiences with SSS software Architecture in a Production Environment Rick Bradshaw, Narayan Desai, Andrew Lusk, Rusty Lusk, Brian Pellin – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 10
Provided by: RickS179
Learn more at: https://www.csm.ornl.gov
Category:

less

Transcript and Presenter's Notes

Title: Experiences with SSS software Architecture in a


1
Experiences with SSS software Architecture in a
Production Environment
  • Rick Bradshaw, Narayan Desai, Andrew Lusk,
  • Rusty Lusk, Brian Pellin
  • Mathematics and Computer Science Division
  • Argonne National Laboratory

2
The SSS on Chiba Project
  • This was a summer project launched shortly after
    the last face-to-face meeting in June.
  • Outline
  • Definition of Project
  • Motivation
  • Limitations
  • Approach
  • Experiences
  • Status and Plans
  • Distribution

3
Project Definition
  • Chiba City consists of 256 dual processor nodes
    running Linux, with Myrinet and Fast Ethernet
  • Scalability testbed
  • Project determine whether SSS component
    architecture could be used to replace existing
    Chiba City system software, consisting of
  • PBS
  • Maui scheduler
  • Home-grown user software for distributing files
    and executables
  • No shared file system
  • Home-grown system software for managing nodes

4
Motivation
  • Needed better systems software on Chiba City
    cluster
  • In general
  • For testing other SSS components (e.g.
    checkpointing)
  • For enabling Chiba as a testbed for scalable OS
    research
  • Needed to more thoroughly test existing
    ANL-written components
  • Stand-alone components
  • Build-and-Config Manager, Process Manager, Event
    Manager
  • Infrastructure components
  • Service Directory, Communication Library
  • Needed more experience with published XML
    interfaces
  • Had extra programming muscle available over the
    summer

5
Limitations
  • Needed to do this very fast, before summer
    resources evaporated
  • Chiba is in constant use by research computer
    scientists (e.g. developing parallel file system)
    and computational scientists (e.g., physics,
    biology, etc.)

6
Approach
  • Utilize assets on hand
  • Some central components (SD, EM, PM, Comm
    Library)
  • Existing publicized XML interfaces for these
  • Python programmers
  • Write stubs for other essential components
  • Scheduler
  • Nothing fancy
  • Only does FIFO with reservations and backfill
  • QM
  • Interface among user, scheduler, process manager
  • But some extra capabilities
  • Multiple job steps, e.g., to distribute files
  • Specify OS image to be loaded, to support testbed
    function
  • PBS compatibility mode, to allow users to reuse
    their job submission scripts
  • Use restriction syntax for stubs for simplicity
    and speed

7
Experiences
  • At end of summer, after 2-week shakedown, we
    convinced Chiba management to go forward rather
    than reinstall old software. (No more PBS.)
  • Have been running user job mix for about three
    weeks, with no disasters.
  • Shook out some ambiguities in XML specification
    for component interfaces
  • Fixed bugs
  • Found and fixed scalability problems

8
Status and Plans
  • Status
  • Working
  • Collecting user experiences
  • Plans
  • Short term
  • Incorporate other components from Process
    Management Working Group
  • Paul kernel module, LAM support, and CP Manager
  • Craig monitoring and data warehouse
  • Long term
  • Other components from rest of project, especially
    Resource Management Working Group components
  • Provide Chiba for OS experimentation as part of
    normal batch-scheduled jobs, e.g. Sandia group

9
The End
Write a Comment
User Comments (0)
About PowerShow.com