Batch Software at JLAB - PowerPoint PPT Presentation

About This Presentation
Title:

Batch Software at JLAB

Description:

Client is trivial to install on any machine with a JRE no need to install LSF, PBS etc. ... Data transfer is network (socket) copy in Java ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 24
Provided by: chep200
Category:
Tags: jlab | batch | java | jre | software

less

Transcript and Presenter's Notes

Title: Batch Software at JLAB


1
Batch Software at JLAB
  • Ian Bird
  • Jefferson Lab
  • CHEP2000
  • 7-11 February, 2000

2
Introduction
  • Environment
  • Farms
  • Data flows
  • Software
  • Batch systems
  • JLAB software
  • LSF vs. PBS
  • Scheduler
  • Tape software
  • File pre-staging/caching

3
Environment
  • Computing facilities were designed to
  • Handle data rate of close to 1 TB/day
  • 1st level reconstruction only (2 passes)
  • Match average data rate
  • Some local analysis but mainly export of vastly
    reduced summary DSTs
  • Originally estimated requirements
  • 1000 SI95
  • 3 TB online disk
  • 300 TB tape storage 8 RedWood drives

4
Environment - real
  • After 1 year of production running of CLAS
    (largest experiment)
  • Detector is far cleaner than anticipated, which
    means
  • Data volume is less 500 GB/day
  • Data rate is 2.5x anticipated (2.5 kHz)
  • Fraction of good events larger
  • DST sizes are same as Raw data (!)
  • Per event processing time is much longer than
    original estimates
  • Most analysis is done locally no-one is really
    interested in huge data exports
  • Other experiments also have large data rates (for
    short periods)

5
Computing implications
  • CPU requirement is far greater
  • Current farm is 2650 SI95 and will double this
    year
  • Farm has a big mixture of work
  • Not all production small analysis jobs too
  • We make heavy use of LSF hierarchical scheduling
  • Data access demands are enormous
  • DSTs are huge, many people, frequent accesses
  • Analysis jobs want many files
  • Tape access became a bottleneck
  • Farm can no longer be satisfied

6
JLab Farm Layout
Plan - FY 2000
WORK FILE SERVERS
CACHE FILE SERVERS
MASS STORAGE SERVERS
Gigabit Ethernet
Gigabit Ethernet
Fast Ethernet
FARM SYSTEMS
STK Redwood Tape Drives
7
Other farms
  • Batch farm
  • 180 nodes -gt 250
  • Lattice QCD
  • 20 node Alpha (Linux) cluster
  • Parallel application development
  • Plans (proposal) for large 256 node cluster
  • Part of larger collaboration
  • Group want a meta-facility
  • Jobs run on least loaded cluster (wide area
    scheduling)

8
Additional requirements
  • Ability to handle and schedule parallel jobs
    (MPI)
  • Allow collaborators to clone the batch systems
    and software
  • Allow inter-site job submission
  • LQCD is particularly interested in this
  • Remote data access

9
Components
  • Batch software
  • Interface to underlying batch system
  • Tape software
  • Interface to OSM, overcome limitations
  • Data caching strategies
  • Tape staging
  • Data caching
  • File servers

10
Batch software
  • A layer over the batch management system
  • Allow replacement of batch system LSF, PBS (DQS)
  • Constant user interface no matter what the
    underlying system is
  • Batch farm can be managed by the management
    system (e.g. LSF)
  • Build in a security infrastructure (e.g GSI)
  • Particularly to allow remote access securely

11
Batch system - schematic
User processes Submission, query, statistics
Submission interface
Query interface
Job submission system
Database
Batch control system LSF, PBS, DQS, etc.
Batch processors
12
Existing batch software
  • Has been running for 2 years
  • Uses LSF
  • Multiple jobs parameterized jobs (LSF now has
    job arrays, PBS does not have this)
  • Client is trivial to install on any machine with
    a JRE no need to install LSF, PBS etc.
  • Eases licensing issues
  • Simple software distribution
  • Remote access
  • Standardized statistics and bookkeeping outside
    of LSF
  • MySQL based

13
Existing software cont.
  • Farm can be managed by LSF
  • Queues, hosts, scheduler etc.
  • Rewrite in progress to
  • Add PBS interface (and DQS?)
  • Security infrastructure to permit authenticated
    remote access
  • Clean up

14
PBS as alternative to LSF
  • PBS (Portable Batch System NASA)
  • Actively developed
  • Open, freely available
  • Handles MPI (PVM)
  • User interface very familiar to NQS/DQS users
  • Problem (for us) was the (lack of a good)
    scheduler
  • PBS provides only a trivial scheduler, but
  • Provides mechanism to plug in another
  • We were using hierarchical scheduling in LSF

15
PBS scheduler
  • Multiple stages (6), can be used or not as
    required, in arbitrary order
  • Match making matches requirements to system
    resources
  • System priority (e.g. data available)
  • Queue selection (which queue runs next)
  • User priority
  • User share which user runs next, based on user
    and group allocations and usage
  • Job age
  • Scheduler has been provided to PBS developers for
    comments and is under test

16
Mass storage
  • Silo 300 TB Redwood capacity
  • 8 Redwood drives
  • 5 (5) 9840 drives
  • Managed by OSM
  • Bottleneck
  • Limited to a single data mover
  • That node has no capacity for more drives
  • 1 TB tape staging RAID disk
  • 5 TB of NFS work areas/caching space

17
Solving tape access problems
  • Add new drives 9840s
  • Requires 2nd OSM instance
  • Transparent to user
  • Eventual replacement of OSM
  • Transparent to user
  • File pre-staging to the farm
  • Distributed data caching (not NFS)
  • Tools to allow user optimization
  • Charge for (prioritize) mounts

18
OSM
  • OSM has several limitations (and is no longer
    supported)
  • Single mover node is most serious
  • No replacement possible yet
  • Local tapeserver software solves many of these
    problems for us
  • Simple remote clients (Java based) do not need
    OSM except on server

19
Tape access software
  • Simple put/get interface,
  • Handles multiple files, directories etc.
  • Can have several OSM instances, but a unique file
    catalog, transparent to user
  • System fails over between servers
  • Only way to bring 9840s on line
  • Data transfer is network (socket) copy in Java
  • Allows a scheduling/user allocation algorithm to
    be added to tape access
  • Will permit transparent replacement of OSM

20
Data pre-fetching caching
  • Currently
  • Tape stage disk network copy to farm node
    local disk
  • Tape stage disk NFS cache farm
  • But this can cause NFS server problems
  • Plan
  • Dual solaris nodes with
  • 350 GB disk (RAID 0)
  • Gigabit ethernet
  • Provides large cache for farm input
  • Stage out entire tapes to cache
  • Cheaper than staging space, better performance
    than NSF
  • Scaleable as the farm grows

21
JLab Farm Layout
Plan - FY 2000
WORK FILE SERVERS
CACHE FILE SERVERS
MASS STORAGE SERVERS
Gigabit Ethernet
Gigabit Ethernet
Fast Ethernet
FARM SYSTEMS
STK Redwood Tape Drives
22
File pre-staging
  • Scheduling for pre-staging is done by the job
    server software
  • Splits/groups jobs by tape (could be done by
    user)
  • Makes a single tape request
  • Holds jobs while files are staged
  • Implemented by batch jobs that release held jobs
  • Released jobs with data available get high
    priority
  • Reduces job slots blocked by jobs waiting for data

23
Conclusions
  • PBS is a sophisticated and viable alternative to
    LSF
  • Interface layer permits
  • use of same jobs on different systems user
    migration
  • Add features to batch system
Write a Comment
User Comments (0)
About PowerShow.com