Title: Title Presented to: Add name Date Den Fisher Title
1 UNCLASSIFIED
User Tools for BlueGene/L and Purple
Scott Futral Development Environment
Group Advanced Simulation and ComputingLawrence
Livermore National Laboratory February 24, 2005
This work was performed under the auspices of the
U.S. Department of Energy by the University of
California Lawrence Livermore National Laboratory
under Contract No. W-7405-Eng-48. UCRL-PRES-209937
3
UNCLASSIFIED
2Overview
- Status of Purple code development tools
- Compilers
- Debuggers
- Profiling and performance tools
- Status of BlueGene/L tools
- Some unique aspects of BG/L
- Tools available and possible
- Getting info from LC Web pages
3Purple Development Environment A lot like
White!
- The standard AIX environment available (AIX 5.3)
- IBM XL GNU compilers
- IBM Math libraries (PESSL)
- Standard Unix tools
- Open source and third-party tools and libraries
(Python, Perl, math libs, graphics packages) - TotalView Debugger
- Forward compatibility for applications built on
White - LCRM batch system
4Some differences for Purple that could matter
- Federation Switch and IBM MPI
- SLURM resource management- is it transparent?
- Large Memory pages may have performance impact on
some apps - 64-bit kernel and default 64-bit apps will cause
some changes - Zerofault memory debugger NOT available for
64-bit apps - IBM Purify beta was D.O.A.
- Will other commercial tools be affected?
5Example of Zerofault GUI
6Purple Statement of Work (SOW) Performance Tool
Requirements
- Profiling
- Event Tracing
- Cluster Wide Development Tool API
- Cluster Wide Development Tool GUI
- Lightweight corefiles
- Timer API
7Performance Tools for Purple SOW
- PE Benchmarker (PEB)
- Dynamic Probe Class Library (DPCL)
- Intel Trace Collector (ITC)
- Intel Trace Analyzer (ITA)
8Purple SOW Profiling Requirements
- Profile at source block level (PEB)
- Profiling tool scales to entire system (PEB)
- Profile types
- CPU time (PEB)
- HPM (PEB)
- Page Faults (PEB)
- (MPI?) I/O (ITC)
- MPI send/recv bytes (ITC/PEB)
- Memory usage with source reference (Purify?)
- OpenMP (PEB)
9PE Benchmarker Status
- IBM has provided 4.2 alpha release
- Substantial improvement
- Fixes for large applications
- Improvements to clients
- Some problems identified that will be reported
- Release includes new SOW features
- DPCL/PEB block-level instrumentation
- OpenMP tool
- MPI/LAPI communication volume profiling
- Visualizer GUI features
- Additional Features
10PE Benchmarker Issues
- Java GUI performance problems
- Slow performance with Java
- Potential LLNL network issue
- DPCL is unreliable
- Often requires manual cleanup
- system processes
- user processes
- shared memory segments
- IBM has identified problem scenarios
- LLNL can provide system resources to developers
- Client interface issues
11Purple SOW Tracing Requirements
- Time-stamped event records for all
processes/threads (ITC) - MPI tracing (ITC)
- Lightweight MPI callsite time and count
statistics (?) - Scale to cluster-wide applications (ITC)
- Support all baseline languages (ITC)
- Translate traces to ASCII (ITC)
- Self-defining format (not done by ITC)
- Dynamic activation
- From within a process (ITC)
- From outside a process (ITC?)
- Should not be a default requirement (ITC)
12Brief Tracing Tool History
- Inception
- Vampir Forschungszentrum Juelich
- Pallas Product
- Vampir Wolfgang Nagel, TU-Dresden
- Vampirtrace Pallas HPC Group
- ASCI VampirGuideView Pathforward
- Vampir, VampirNextGeneration TU-Dresden
- Vampirtrace Pallas HPC Group ? Intel
- Post Pathforward
- Vampir, VampirNextGeneration TU-Dresden
- Intel Trace Collector/Analyzer Intel
13Intel Trace Collector/Analyzer Status
- LLNL has not received Purple version of ITC/ITA
from Intel or IBM - Remarks from Intel
- Purple will use a new IBM port of ITC
- ITA trace visualization from an Intel system
- Vampir or VampirNextGeneration on AIX
- If STF tracefile format remains consistent
14Cluster Wide Tool API
- A means for dynamically activating and
deactivating, reading, and resetting data for
profiling, trace, and performance statistic
instrumentation. (DPCL) - The API shall dynamically control, activate, and
deactivate the instrumentation of portions of an
application during execution. (DPCL) - The Universitys strong preference is for the
Open Source DPCL library and run-time
infrastructure
15Cluster Wide Tool GUI
- Vampir / Vampirtrace are specified
- ITC / ITA will address these requirements
- Trace analysis of an MPI/threaded application
running over an entire EDTV and Purple system
16BlueGene/L Development Environment-Some
challenges
- Front End node (FEN) operating system is SuSE
SLES 9 Linux/Power - Compute nodes (CN) have a microkernel operating
system with function shipping to I/O service
nodes - Shared objects are not supported on the CNs
- Threads are not supported on the CNs
- No virtual memory support on the CN
- Double wide FPU (double hummer) on the PowerPC
440 cores - Kernel is still in rapid development, so driver
levels are evolving - There is no way to connect to stdin/stdout in an
interactive fashion
17BlueGene/L Development Environment-Implications
- FENs will have a rich development environment,
with a familiar Linux feel! - Some tools from Linux/Intel environments may not
be available or as robust on Linux/Power.
(Valgrind is one example.) - CNs have limited tools available.
- Code steering via run/proxy is not available.
- Python not available on CNs
- Tools that instrument at compile/link time are
feasible (TAU, mpiP, MPI_prof) - Post-processing analyzers are possible (Paraver,
mpip_insert_src)
18BlueGene/L Tools- Whats available?
- IBM XL compilers (9.1Fortran, 7.0C/C) with
extensions to supportthe SIMD aspect of the
double hummer. - Optimized common vector intrinsics being
integrated into compiler - BLAS/LAPACK
- PAPI access to hardware counters was working on a
previous driver level. - mpiP and MPI_trace for MPI profiling
- TAU profiling layer
- To quote Bob Walkup, "The best performance tool
for BGL is a Power 4 system with HPMcount!"
19BlueGene/L Tools- TotalView Debugger status
- TotalView debugger is ported through a contract
with the vendor Etnus. The main component of the
contract was the purchase of a license and 3
years of support for about 1M. - The initial tool has been delivered and works!
- BGL TV will support most basic debugging
functionality including watchpoints. - IBM and Etnus to provide bug fixes before making
TV available to our users - The target driver for TV Limited Availiability is
driver 040 - Etnus is still working to deliver a milestone
which targets performance and scalability tuning.
Progress is gated by large system availability.
20BlueGene/L Tools- Whats possible?
- David Skinner of LBL is working with us to port
IPM ("IPM is a portable profiling infrastructure
which provides users with a concise report on the
execution of parallel jobs.) - A scalable Opentrace format could be developed
that would enable tracing through TAU to be
analyzed by VNG tool. - With effort and s, more sophisticated tools can
be architected to work on BGL (TotalView!) - Valgrind memory tool in the future through a
TRI-LAB EFFORT if funding is available. This will
have limitations due to memory requirements of
the tool. - Totalview memory debugger capability could be
provided for 250k - A static Python version?
- DPCL (dynamic probe class library)
- PAPI3
- Leverage BlueGene/L consortium effort
21Web Information on Tools
- http//www.llnl.gov/computing/hpc/code/software_to
ols.html