Using DataStar Mahidhar Tatineni and Amit Majumdar, SDSC - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Using DataStar Mahidhar Tatineni and Amit Majumdar, SDSC

Description:

... will sit idle, while the big job waits for the required number of nodes to ... Wait time. Jobs increase in priority as they age ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 18
Provided by: sdsc
Category:

less

Transcript and Presenter's Notes

Title: Using DataStar Mahidhar Tatineni and Amit Majumdar, SDSC


1
Using DataStarMahidhar Tatineni and Amit
Majumdar, SDSC
2
DataStar Configuration
  • 15.6 TF, 2528 processors total
  • 11 32-way 1.7 GHz IBM p690
  • 2 nodes 64 GB memory for login and system use
  • 4 nodes 128 GB memory 1 node 256 GB for batch
    scientific computation
  • 3 nodes 128 GB memory for database,
    DiscoveryLink, HPSS
  • 1 node 256 GB memory for interactive use (post
    processing, visualization)
  • 176 8-way 1.5 GHz IBM p655
  • 16 GB memory
  • Batch scientific computation
  • 96 8-way 1.7 GHz IBM p655
  • 32 GB memory
  • Batch scientific computation
  • All nodes Federation switch attached
  • All nodes SAN attached
  • Parallel filesystem 116 TB GPFS 750 TB GPFS-WAN
    (Shared with Blue Gene, TG IA-64 cluster)

3
Logging in
  • SSH2 (Secure Shell 2) client
  • http//www.sdsc.edu/us/consulting/ssh.html
  • For this workshop We willing be using the
    dspoe.sdsc.edu for submitting and running jobs in
    the express queue.
  • ssh ltusernamegt_at_dspoe.sdsc.edu (DataStar)

4
Moving files
  • SCP
  • scp original_file user_at_dspoe.sdsc.edu/to_dir/copi
    ed_file
  • BBFTP
  • gt2 GB
  • http//www.sdsc.edu/us/resources/datastar/getstart
    .htmlmigrate
  • High Performance Storage System (HPSS) /
  • Storage Resource Broker (SRB)

5
File System Structure
  • /home 1GB quota. Backed up. Do Not store large
    files or ouput
  • /scratch Local file system (64 GB on each
    node). Space cleaned after each job.
  • /gpfs 116TB shared parallel file system.
  • NOT backed. Purgable
  • /gpfs-wan Visible for read-write operations via
    TG network (750 TB)
  • HPSS (archival storage) - 25PB of tape capacity
  • http//www.sdsc.edu/us/resources/hpss/

6
Batch/Interactive computing
  • Batch job environment
  • Job Manager Load Leveler (tool from IBM)
  • Job Scheduler Catalina (SDSC internal
    tool)
  • Job Monitoring Various commands
  • Batch Interactive use on different nodes.
  • DataStar Login Nodes
  • dslogin.sdsc.edu
  • dspoe.sdsc.edu
  • dsdirect.sdsc.edu

7
(No Transcript)
8
Queues nodes
  • Start with dspoe (interactive queues)
  • Do production runs from dslogin (normal
    normal32 queues)
  • Use express queues from dspoe to get it right
    now.
  • Use dsdirect for special needs.

9
Loadleveler Commands
  • Show the current queue state llq
  • Submit job to the queue llsubmit
  • Cancel your job in the queue llcancel
  • Special (more useful commands from SDSCs
  • inhouse tool Catalina)
  • showq to look at the status of the queue.
  • show_bf to look at the backfill window
    opportunities
  • Note When a job shows None as the start time
    or says BADRESOURCELIST in the showq output gt
    you are requesting resources which are not
    available on the machine (for example asking for
    too much memory, too many nodes or too many CPUs
    per node)

10
Sample Job Scripts
  • Example files are located here
  • /gpfs/projects/workshop/running_jobs
  • Copy the whole directory
  • Use Makefile to compile the source code.
  • Edit the parameters in the job submission
    scripts.
  • The examples illustrate use of Loadleveler to
    submit, and follow jobs.

11
Backfill window show_bf
  • Scenario Queue draining for running big job(s).
  • Many nodes will sit idle, while the big job waits
    for the required number of nodes to finish
    currently running jobs.
  • Use show_bf command to identify all the idle
  • nodes (and the duration they are available
    for) and use them immediately.

12
SDSC Job Priorities - 1
  • Priorities determined by a number
  • of weighting factors
  • Job size
  • gt 128 nodes (1024 procs) get highest priority
  • Prevents wasted machine dry-outs
  • Favors jobs that can be run no where else
  • Allocation size
  • PIs with 1.2M hours need to run more jobs than
    those with 10k hours

13
SDSC Job Priorities - 2
  • Priorities determined by a number of weighting
    factors (cont.)
  • Priority
  • High, normal, express queues
  • Charge high X2 normal X1 express X1.8
  • 4 nodes reserved 24 7 just for express jobs
  • Wait time
  • Jobs increase in priority as they age
  • Big boost for normal jobs older than 4 days or
    high jobs older than 2 day
  • Boosted just under the large jobs

14
Tips to reduce queue wait time.
  • Every time you submit a job look for
  • any possible backfill windows (use show_bf)
  • Try to estimate your job runtime and ask
  • for exact amount you would need
  • do not just ask for the max 18 hrs
  • If possible scale up your job to more number
  • of processors/nodes.

15
Archival Storage HPSS
  • What is HPSS
  • The centralized, long-term data storage
  • system at SDSC is the
  • High Performance Storage System (HPSS)
  • currently stores more than 3 PB of data (as of
    June 2006)
  • total system capacity of 25 PB of data.
  • Data added at an average rate of 100 TB per month
    (between Aug0 5 and Feb 06).

16
SDSC Resources Applications
  • SDSC provides a wide range of software
    applications installed on the production
    computing platforms. Ranging from finite element
    codes to state-of-the-art visualization packages,
    these applications are available to any
    researcher who has a computing allocation at
    SDSC.
  • The applications are listed on the SDSC
    Applications page
  • http//www.sdsc.edu/us/resources/applications.htm
    l
  • Information is also available on TeraGrid
    Software page
  • http//hpcsoftware.teragrid.org/Software/user/ind
    ex.php

17
SDSC DataStar Applications and Libraries
  • The third party applications and libraries are
    usually located in /usr/local/apps64 (for 64 bit
    applications) and /usr/local/apps32 (for 32 bit
    applications.
  • The /usr/local/apps directory corresponds to the
    64 bit directory.
  • Users must take care to link the correct versions
    of the libraries when compiling, i.e use the 64
    bit libraries when compiling with the q64 option
    or while compiling with OBJECT_MODE64.
  • For example if you need the single precision
    version of the fftw 2.1.5 library and you are
    compiling in 64 bit mode you must link the
    library in
  • /usr/local/apps64/fftw215s/lib/
Write a Comment
User Comments (0)
About PowerShow.com