NCCS User Forum

About This Presentation

Title:

NCCS User Forum

Description:

Incorporation of SCU5 processors into general queue pool ... Building on work by SIVO and GMAO (Brent Swartz) NCCS. Representative GEOS Output ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 65

Provided by: drbil1

Learn more at: https://www.nccs.nasa.gov

Category:

more less

Transcript and Presenter's Notes

Title: NCCS User Forum

1
NCCS User Forum

22 September 2009

2
Agenda
Welcome Introduction Phil Webster, CISTO Chief
SSP Test Matt Koop, User Services
Current System Status Fred Reitz, Operations Lead
Discover Job Monitor Tyler Simon, User Services
NCCS Compute Capabilities Dan Duffy, Lead
Architect
Analysis System Updates Tom Maxwell, Analysis
Lead
PoDS Jules Kouatchou, SIVO
User Services Updates Bill Ward, User Services
Lead
Questions and Comments Phil Webster, CISTO Chief
3
Key Accomplishments

Incorporation of SCU5 processors into general
queue pool
Capability to run large jobs (4000 cores) on
SCU5
Analysis nodes placed in production
Migrated DMF from Dirac (Irix) to Palm (Linux)

4
New NCCS Staff Members

Lynn Parnell, Ph.D. Engineering Mechanics, High
Performance Computing Lead
Matt Koop, Ph.D. Computer Science, User Services
Tom Maxwell, Ph.D. Physics, Analysis System Lead

5
Agenda
Welcome Introduction Phil Webster, CISTO Chief
SSP Test Matt Koop, User Services
Current System Status Fred Reitz, Operations Lead
Discover Job Monitor Tyler Simon, User Services
NCCS Compute Capabilities Dan Duffy, Lead
Architect
Analysis System Updates Tom Maxwell, Analysis
Lead
PoDS Jules Kouatchou, SIVO
User Services Updates Bill Ward, User Services
Lead
Questions and Comments Phil Webster, CISTO Chief
6
Key Accomplishments

Discover/Analysis Environment
Added SCU5 (cluster totals 10,840 compute CPUs,
110 TF)
Placed analysis nodes (dali01-dali06) in
production status
Implemented storage area network (SAN)
Implemented GPFS multicluster feature
Upgraded GPFS
Implemented RDMA
Implemented InfiniBand token network
Discover/Data Portal
Implemented NFS mounts for select Discover data
on Data Portal
Data Portal
Migrated all users/applications to HP
Bladeservers
Upgraded GPFS
Implemented GPFS multicluster feature
Implemented InfiniBand IP network
Upgraded SLES10 operating system to SP2
DMF
Migrated DMF from Irix to Linux
Other

7
Discover 2009 Daily Utilization Percentage
8
Discover Daily Utilization Percentage by
GroupMay August 2009
8/13/09 SCU5 (4,128 cores added)
9
Discover Total CPU ConsumptionPast 12 Months
(CPU Hours)
9/4/08 SCU3 (2,064 cores added) 2/4/09 SCU4
(544 cores added) 2/19/09 SCU4 (240 cores
added) 2/27/09 SCU4 (1,280 cores added) 8/13/09
SCU5 (4,128 cores added)
10
Discover Job Analysis August 2009
11
Discover Job Analysis August 2009
12
Discover Availability
Scheduled Maintenance Jun-Aug 10 Jun - 17 hrs 5
min GPFS (Token and Subnets, 3.2.1-12) 24 Jun -
12 hours GPFS (RDMA, Multicluster, SCU5
integration) 29 Jul - 12 hours GPFS 3.2.1-13,
OFED1.4 , DDN firmware 30 Jul - 2 hours 20
minutes DDN controller replacement 19 Aug - 4
hours NASA AUID transition
Unscheduled Outages Jun-Aug 16 Jun 3 hrs 35
min nodes out of memory 24 Jun 4 hrs 39 min
maintenance extension 6-7 Jul 4 hrs 18 min
internal switch error 13 Jul 2 hrs 59 min
GPFS error 14 Jul 26 min nodes out of
memory 20 Jul 2 hrs 2 min GPFS error 29 Jul
55 min Maintenance extension 19 Aug 2 hrs 45
min maintenance extension
13
Current Issues on DiscoverLogin Node Hangs

Symptom Login nodes become unresponsive.
Impact Users cannot login.
Status Developing/testing solution. Issue arose
during critical security patch installation.

14
Current Issues on DMFPost-Migration Clean-Up

Symptoms Various.
Impact Various.
Status Issues addressed as they are encountered
and reported.

15
Future Enhancements

Discover Cluster
PBS V 10
Additional storage
SLES10 SP2
Data Portal
GDS OPeNDAP performance enhancements
Use of GPFS-CNFS for improved NFS mount
availability

16
Agenda
Welcome Introduction Phil Webster, CISTO Chief
SSP Test Matt Koop, User Services
Current System Status Fred Reitz, Operations Lead
Discover Job Monitor Tyler Simon, User Services
NCCS Compute Capabilities Dan Duffy, Lead
Architect
Analysis System Updates Tom Maxwell, Analysis
Lead
PoDS Jules Kouatchou, SIVO
User Services Updates Bill Ward, User Services
Lead
Questions and Comments Phil Webster, CISTO Chief
17
I/O Study Team

Dan Kokron
Bill Putman
Dan Duffy
Bill Ward
Tyler Simon
Matt Koop
Harper Pryor
Building on work by SIVO and GMAO (Brent Swartz)

18
Representative GEOS Output

Dan Kokron has generated many runs containing
data in order to characterize the GEOS I/O
720 core, quarter degree GEOS with YOTC-like
history
Number of processes that write 67
Total amount of data 225 GB (written to
multiple files)
Average write size 1.7 MB
Running in dnb33
Using Nehalem cores (GPFS with RDMA)
Average Bandwidth
Timing the entire CFIO calls results in a
bandwidth of 3.8 MB/sec
Timing just the NetCDF ncvpt calls results in a
bandwidth of 44.4 MB/sec
Why is this so slow?

19
Kernel Benchmarks

Used open source I/O kernel benchmarks of xdd and
iozone
Achieved over 1 GB/sec to all the new nobackup
file systems
Wrote two representative one-node c-code
benchmarks
Using c writes and appending to files
Using NetCDF writes with chunking and appending
to files
Ran these benchmarks writing out exactly the same
as process 0 in the GEOS run
C-writes Average bandwidth of around 900 MB/sec
(consistent with kernel benchmarks)
NetCDF writes Average bandwidth of around 600
MB/sec
Why is GEOS I/O running so slow?

C-writes Average Bandwidth 900MB/sec
NetCDF-writes Average Bandwidth 600MB/sec
20
Effect of NetCDF Chunking

How does changing the NetCDF chunk size affect
the overall performance?
The table shows runs varying the chunk size for
an average of 10 runs for each chunk size
Used the NetCDF kernel benchmark
The smallest chunk size reproduces the GEOS
bandwidth
As best as we can tell, this is roughly
equivalent to the default chunk size
The best chunk size turned out to be about the
size of the array being written 3MB

Chunk size ( Floats) Chunk size (KB) AverageBandwidth (MB/sec)
1K 4 37
32K 128 262
128K 512 492
512K 2,048 537
1M 4,096 596
2M 8,192 497
3M 12,228 369
6M 24,576 477
10M 40,960 327

References
NetCDF-4 Performance Report, Lee, et. Al.,
June 2008.
NetCDF on-line tutorial
http//www.unidata.ucar.edu/software/netcdf/docs_b
eta/netcdf-tutorial.html
Benchmarking I/O Performance with GEOSdas and
other modeling guru posts
https//modelingguru.nasa.gov/clearspace/message/5
6155615

21
Setting Chunk Size in GEOS

Dan K. ran several baseline runs to make sure we
were measuring things correctly
Turned on chunking and set the chunk size equal
to the write size (1080x721x1x1)
Dramatic improvement in ncvpt bandwidth
Why was the last run so slow?
Because we had a file system hang during that run

File Name Description Ncvpt Bandwidth (MB/sec)
Base Line 1 Base line run with time stamps at each wrote statement 44.47
Base Line 2 Printed out time stamps before and after the call to ncvpt 76.35
Base Line 3 Printing the time stamps moved after the call to ncvpt 64.69
Using NetCDF Chunking Initial run with NetCDF chunking turned on 409.87
Using NetCDF Chunking and Fortran Buffering (1) IO Buffering in the Intel IO library on top of NetCDF chunking 421.23
Using NetCDF Chunking and Fortran Buffering (2) Same as previous run with very different results 45.17
22
What next?

Further explore chunk sizes in NetCDF
What is the best chunk size?
Do you set the chunk sizes for write performance
or for read performance?
Once a file has been written with a set chunk
size, it cannot be changed without rewriting the
file.
Need to better understand the variability seen in
the file system performance
Not uncommon to see a 2x or greater difference in
performance from run to run
Turn the NetCDF kernel benchmark into a
multi-node benchmark
Use this benchmark for testing system changes and
potential new systems
Compare performance across NCCS and NAS systems
Write up results

23
Agenda
Welcome Introduction Phil Webster, CISTO Chief
SSP Test Matt Koop, User Services
Current System Status Fred Reitz, Operations Lead
Discover Job Monitor Tyler Simon, User Services
NCCS Compute Capabilities Dan Duffy, Lead
Architect
Analysis System Updates Tom Maxwell, Analysis
Lead
PoDS Jules Kouatchou, SIVO
User Services Updates Bill Ward, User Services
Lead
Questions and Comments Phil Webster, CISTO Chief
24
Ticket Closure Percentiles1 March to 31 August
2009
25
Issue Parallel Jobs gt 1500 CPUs

Original problem Many jobs wouldnt run at gt
1500 CPUs
Status at last Forum Resolved using a different
version of the DAPL library
Current Status Now able to run at 4000 CPUs
using MVAPICH on SCU5

26
Issue Getting Jobs into Execution

Long wait for queued jobs before launching
Reasons
SCALITRUE is restrictive
Per user per project limits on number of
eligible jobs (use qstat is)
Scheduling policy first-fit on job list ordered
by queue priority and queue time
User services will be contacting folks using
SCALITRUE to assist them in migration away from
this feature

27
Future User Forums

NCCS User Forum schedule
8 Dec 2009, 9 Mar 9 2010, 8 Jun 2010, 14 Sep
2010, and 7 Dec 2010
All on Tuesday
All 200-330 PM
All in Building 33, Room H114
Published
On http//nccs.nasa.gov/
On GSFC-CAL-NCCS-Users

28
Agenda
Welcome Introduction Phil Webster, CISTO Chief
SSP Test Matt Koop, User Services
Current System Status Fred Reitz, Operations Lead
Discover Job Monitor Tyler Simon, User Services
NCCS Compute Capabilities Dan Duffy, Lead
Architect
Analysis System Updates Tom Maxwell, Analysis
Lead
PoDS Jules Kouatchou, SIVO
User Services Updates Bill Ward, User Services
Lead
Questions and Comments Phil Webster, CISTO Chief
29
Sustained System Performance

What is the overall system performance?
Many different benchmarks or peak numbers are
available
Often unrealistic or not relevant
SSP refers to a set of benchmarks that evaluates
performance as related to real workloads on the
system
SSP concepts originated from NERSC (LBNL)

30
Performance Monitoring

Not just for evaluating a new system
Ever wonder if a system change has affected
performance?
Often changes can be subtle and not detected with
normal system validation tools
Silent corruption
Slowness
Find out immediately instead of after running the
application and getting an error

31
Performance Monitoring (contd.)

Run real workloads (SSP) to determine performance
changes over time
Quickly determine if something is broken or slow
Perform data verification
Run automatically on a regular basis as well as
after system changes
e.g. change to a compiler, MPI version, OS update

NERSC SSP Example Chart
32
Meaningful Measurements

How you can help
We need your application and a representative
dataset for your application
Ideally should take 20-30 minutes to run at
various processor counts
Your benefits
Changes to the system that affect your
application will be noticed immediately
Data will be placed on NCCS website to show
system performance over time

33
Agenda
Welcome Introduction Phil Webster, CISTO Chief
SSP Test Matt Koop, User Services
Current System Status Fred Reitz, Operations Lead
Discover Job Monitor Tyler Simon, User Services
NCCS Compute Capabilities Dan Duffy, Lead
Architect
Analysis System Updates Tom Maxwell, Analysis
Lead
PoDS Jules Kouatchou, SIVO
User Services Updates Bill Ward, User Services
Lead
Questions and Comments Phil Webster, CISTO Chief
34
Discover Job Monitor

All data is presented as a current system
snapshot, in 5 min intervals.
Displays system load as a percentage
Displays the number of running jobs and running
cores
Queued jobs and job wait times
Displays current qstat -a output
Interactive Historical Utilization Chart
Message of the day
Displays average number of cores per job
Job Monitor

35
Agenda
Welcome Introduction Phil Webster, CISTO Chief
SSP Test Matt Koop, User Services
Current System Status Fred Reitz, Operations Lead
Discover Job Monitor Tyler Simon, User Services
NCCS Compute Capabilities Dan Duffy, Lead
Architect
Analysis System Updates Tom Maxwell, Analysis
Lead
PoDS Jules Kouatchou, SIVO
User Services Updates Bill Ward, User Services
Lead
Questions and Comments Phil Webster, CISTO Chief
36
Climate Data Analysis

Climate models are generating ever-increasing
amounts of output data.
Larger datasets are making it increasingly
cumbersome for scientists to perform analyses on
their desktop computers.
Server-side analysis of climate model results is
quickly becoming a necessity.

37
Parallelizing Application Scripts

Many data processing shell scripts can be easily
parallelized
MatLab, IDL, etc.
Use task parallelism to process multiple files in
parallel
Each file processed on a separate core within a
single dali node
Limit load on dali (16 cores per node )
Max 10 compute intensive processes per node

Serial Version while ( ) process
another file run.grid.qd.s end
Parallel Version while ( ) process
another file run.grid.qd.s end
38
ParaView

Open-source, multi-platform visualization
application
Developed by Kitware, Inc. (authors of VTK)
Designed to process large data sets
Built on parallel VTK
Client-server architecture
Client Qt based desktop application
Data Server MPI based parallel application on
dali.
Parallel streaming filters for data processing
Large library of existing filters
Highly extensible using plugins
Plugin development required for HDF, NetCDF, OBS
data
No existing climate-specific tools or algorithms
Data Server being integrated into ESG

39
ParaView Client

Qt desktop application that Controls data access,
processing, analysis, and visualization

40
ParaView Client Features
41
Analysis Workflow Configuration

Configure a parallel streaming pipeline for data
analysis

42
ParaView Applications
Polar Vortex Breakdown Simulation
Golevka Asteroid Explosion Simulation
3D Rayleigh-Benard problem
Cross Wind Fire Simulation
43
Climate Data Analysis Toolkit

Integrated environment for data processing, viz,
analysis
Integrates numerous software modules in python
shell
Open source with a large diverse set of
contributors
Analysis environment for ESG developed _at_ LLNL

44
Data Manipulation

Exploits NumPy Array and Masked Array
Adds persistent climate metadata
Exposes NumPy, SciPy, RPy mathematical
operations

Clustering FFT Image processing Linear
algebra Interpolation Max entropy Optimization Sig
nal processing Statistical functions Convolution S
parse matrices Regression Spatial algorithms
45
Grid Support

Spherical Coordinate Remapping and Interpolation
Package
remapping and interpolation between grids on a
sphere
Map between any pair of lat-long grids
GridSpec
Standard description of earth system model grids
To be implemented in NetCDF CF convention
Implemented in CMOR
MoDAVE
Grid visualization

46
Climate Analysis

Genutil Cdutil (PCMDI)
General Utilities for climate data analysis
Statistics, array color manipulation,
selection, etc.
Climate Utilities
time extraction, averages, bounds, interpolation
masking/regridding, region extraction
PyClimate
Toolset for analyzing climate variability
Empirical Orthogonal Functions (EOF) analysis
Analysis of coupled data sets
Singular Vector Decomposition (SVD)
Canonical Correlation Analysis (CCA)
Linear digital filters
Kernel based probability
Density function estimation

47
CDAT Climate Diagnostics

Provides a common environment for climate
research
Uniform diagnostics for model evaluation and
comparison

Taylor Diagram Thermodynamic Plot Performance
Portrait Plot Wheeler-Kalidas Analysis
48
Contributed Packages

PyGrADS (potential)
AsciiData
BinaryIO
ComparisonStatistics
CssGrid
DsGrid
Egenix
EOF
EzTemplate
HDF5Tools
IOAPITools
Ipython
Lmoments
MSU

NatGrid
ORT
PyLoapi
PynCl
RegridPack
ShGrid
SP
SpanLib
SpherePack
Trends
Twisted
ZonalMeans
ZopeInterface

49
Visualization

Visualization and Control System (VCS)
Standard CDAT 1D and 2D graphics package
Integrated Contributed 2D Packages
Xmgrace
Matplotlib
IaGraph
Integrated Contributed 3D packages
ViSUS
VTK
NcVTK
MoDAVE

50
Visual Climate Data Analysis Tools (VCDAT)

CDAT GUI, facilitates
Data access
Data processing analysis
Data visualization
Accepts python input
Commands and scripts
Saves state
Converts keystrokes to python
Online help

51
MoDAVE

Visualization of Mosaic grids
Parallelized using MPI
Integration into CDAT in process
Developed by Tech-X LLNL

Cubed sphere visualization
52
ViSUS in CDAT

Data streaming application
Progressive processing visualization of large
scientific datasets
Future capabilities for petascale dataset
streaming
Simultaneous visualization of multiple ( 1D, 2D,
3D ) data representations

53
VisTrails

Scientific workflow and provenance management
system.
Interface for next version of CDAT
history trees, data pipelines, visualization
spreadsheet, provenance capture

54
Agenda
Welcome Introduction Phil Webster, CISTO Chief
SSP Test Matt Koop, User Services
Current System Status Fred Reitz, Operations Lead
Discover Job Monitor Tyler Simon, User Services
NCCS Compute Capabilities Dan Duffy, Lead
Architect
Analysis System Updates Tom Maxwell, Analysis
Lead
PoDS Jules Kouatchou, SIVO
User Services Updates Bill Ward, User Services
Lead
Questions and Comments Phil Webster, CISTO Chief
55
Background

Scientists generate large data files
Processing the files consists of executing a
series of independent tasks
Ensemble runs of models
All the tasks are run on one CPU

56
PoDS

Task parallelism tool taking advantage of
distributed architectures as well as multi-core
capabilities
For running serial independent tasks across nodes
Does not make any assumption on the underlying
applications to be executed
Can be ported to other platforms

57
PoDs Features

Dynamic assessment of resource availability
Each task is timed
A summary report is provided

58
Task Assignment
Node 1
Command 1 Command 2 Command 3 Command 4 Command
5 Command 6 Command 7 Command 8 Command 9
Node 2
Node 3
Execution File
59
PoDS Usage

pods.py -help execFile CpusPerNode
execFile file listing all the independent tasks
to be executed
CpusPerNode number of CPUs per node. If not
provide, PoDS will automatically use the number
of CPUs available in each node.

60
Simple Example

Randomly generates an integer n between 0 and
109
Loops over n to perform some basic operations
Each time the application is called a different n
is obtained. We want to run the application 150
times.

61
Timing Numbers
Nodes Cores/Node Time (s)
1 1 990
2 496
4 256
8 133
2 1 497
2 247
4 131
8 61
62
More Information

Users Guide on ModelingGuru
https//modelingguru.nasa.gov/clearspace/docs/DOC-
1582
Package available at
/usr/local/other/pods

63
Agenda
Welcome Introduction Phil Webster, CISTO Chief
SSP Test Matt Koop, User Services
Current System Status Fred Reitz, Operations Lead
Discover Job Monitor Tyler Simon, User Services
NCCS Compute Capabilities Dan Duffy, Lead
Architect
Analysis System Updates Tom Maxwell, Analysis
Lead
PoDS Jules Kouatchou, SIVO
User Services Updates Bill Ward, User Services
Lead
Questions and Comments Phil Webster, CISTO Chief
64
Important Contacts