Title: Allen D. Malony,
1Integrating Performance Analysis in the Uintah
Software Development Cycle
- Allen D. Malony,
- Sameer Shende malony,sameer_at_cs.uoregon.edu
- Department of Computer andInformation Science
- Computational Science Institute
- University of Oregon
J. Davison de St. Germain,Allan Morris, Steven
G. Parker dav,amorris,sparker_at_cs.utah.edu Depa
rtment of Computer Science School of
Computing University of Oregon
2Outline
- Scientific software engineering
- C-SAFE and Uintah Computational Framework (UCF)
- Goals and design
- Challenges for performance technology integration
- TAU performance system
- Role of performance mapping
- Performance analysis integration in UCF
- TAU performance mapping
- X-PARE
- Concluding remarks
3Scientific Software (Performance) Engineering
- Modern scientific simulation software is complex
(?) - Large development teams of diverse expertise
- Simultaneous development on different system
parts - Iterative, multi-stage, long-term software
development - Need support for managing complex software
process - Software engineering tools for revision control,
automated testing, and bug tracking are
commonplace - In contrast, tools for performance engineering
are not - evaluation (measurement, analysis, benchmarking)
- optimization (diagnosis, tracking, prediction,
tuning) - Incorporate performance engineering methodology
and support by flexible and robust performance
tools
4Utah ASCI/ASAP Level 1 Center (C-SAFE)
- C-SAFE was established to build a problem-solving
environment (PSE) for the numerical simulation of
accidental fires and explosions - Combine fundamental chemistry and engineering
physics - Integrate non-linear solvers, optimization,
computational steering, visualization, and
experimental data verification - Support very large-scale coupled simulations
- Computer science problems
- Coupling multiple scientific simulation codes
with different numerical and software properties - Software engineering across diverse expert teams
- Achieving high performance on large-scale systems
5Example C-SAFE Simulation Problems
?
Heptane fire simulation
Typical C-SAFE simulation with a billion degrees
of freedom and non-linear time dynamics
Material stress simulation
6Uintah Problem Solving Environment (PSE)
- Enhanced SCIRun PSE
- Pure dataflow ? component-based
- Shared memory ? scalable multi-/mixed-mode
parallelism - Interactive only ? interactive plus standalone
- Design and implement Uintah component
architecture - Application programmers provide
- description of computation (tasks and variables)
- code to perform task on single patch
(sub-region of space) - Components for scheduling, partitioning, load
balance, - Follow Common Component Architecture (CCA) model
- Design and implement Uintah Computational
Framework (UCF) on top of the component
architecture
7Uintah High-Level Component View
8Uintah Parallel Component Architecture
9Uintah Computational Framework (UCF)
- Execution model based on software (macro)
dataflow - Exposes parallelism and hides data transport
latency - Computations expressed a directed acyclic graphs
of tasks - consumes input and produces output (input to
future task) - input/outputs specified for each patch in a
structured grid - Abstraction of global single-assignment memory
- DataWarehouse
- Directory mapping names to values (array
structured) - Write value once then communicate to awaiting
tasks - Task graph gets mapped to processing resources
- Communications schedule approximates global
optimal
10Uintah Task Graph (Material Point Method)
- Diagram of named tasks (ovals) and data (edges)
- Imminent computation
- Dataflow-constrained
- MPM
- Newtonian material point motion time step
- Solid values defined at material point
(particle) - Dashed values defined at vertex (grid)
- Prime () values updated during time step
11Example Taskgraphs (MPM and Coupled)
12Uintah PSE
- UCF automatically sets up
- Domain decomposition
- Inter-processor communication with
aggregation/reduction - Parallel I/O
- Checkpoint and restart
- Performance measurement and analysis (stay tuned)
- Software engineering
- Coding standards
- CVS (Commits Y3 - 26.6 files/day, Y4 - 29.9
files/day) - Correctness regression testing with bugzilla bug
tracking - Nightly build (parallel compiles)
- 170,000 lines of code (Fortran and C tasks
supported)
13Performance Technology Integration
- Uintah presents challenges to performance
integration - Software diversity and structure
- UCF middleware, simulation code modules
- component-based hierarchy
- Portability objectives
- cross-language and cross-platform
- multi-parallelism thread, message passing, mixed
- Scalability objectives
- High-level programming and execution abstractions
- Requires flexible and robust performance
technology - Requires support for performance mapping
14TAU Performance System Framework
- Tuning and Analysis Utilities
- Performance system framework for scalable
parallel and distributed high-performance
computing - Targets a general complex system computation
model - nodes / contexts / threads
- Multi-level system / software / parallelism
- Measurement and analysis abstraction
- Integrated toolkit for performance
instrumentation, measurement, analysis, and
visualization - Portable performance profiling/tracing facility
- Open software approach
15TAU Performance System Architecture
Paraver
EPILOG
16Performance Analysis Objectives for Uintah
- Micro tuning
- Optimization of simulation code (task) kernels
for maximum serial performance - Scalability tuning
- Identification of parallel execution bottlenecks
- overheads scheduler, data warehouse,
communication - load imbalance
- Adjustment of task graph decomposition and
scheduling - Performance tracking
- Understand performance impacts of code
modifications - Throughout course of software development
- C-SAFE application and UCF software
17Uintah Performance Engineering Approach
- Contemporary performance methodology focuses on
control flow (function) level measurement and
analysis - C-SAFE application involves coupled-models with
task-based parallelism and dataflow control
constraints - Performance engineering on algorithmic (task)
basis - Observe performance based on algorithm (task)
semantics - Analyze task performance characteristics in
relation to other simulation tasks and UCF
components - scientific component developers can concentrate
on performance improvement at algorithmic level - UCF developers can concentrate on bottlenecks not
directly associated with simulation module code
18Task Execution in Uintah Parallel Scheduler
- Profile methods and functions in scheduler and in
MPI library
Task execution time dominates (what task?)
Task execution time distribution per process
MPI communication overheads (where?)
- Need to map performance data!
19Semantics-Based Performance Mapping
- Associate performance measurements with
high-level semantic abstractions - Need mapping support in the performance
measurement system to assign data correctly
20Hypothetical Mapping Example
- Particles distributed on surfaces of a cube
Particle PMAX / Array of particles / int
GenerateParticles() / distribute particles
over all faces of the cube / for (int face0,
last0 face lt 6 face) / particles on
this face / int particles_on_this_face
num(face) for (int ilast i lt
particles_on_this_face i) / particle
properties are a function of face / Pi
... f(face) ... last
particles_on_this_face
21Hypothetical Mapping Example (continued)
int ProcessParticle(Particle p) / perform
some computation on p / int main()
GenerateParticles() / create a list of
particles / for (int i 0 i lt N i) /
iterates over the list / ProcessParticle(Pi)
- How much time (flops) spent processing face i
particles? - What is the distribution of performance among
faces? - How is this determined if execution is parallel?
22No Performance Mapping versus Mapping
- Typical performance tools report performance with
respect to routines - Does not provide support for mapping
- TAUs performance mapping can observe performance
with respect to scientists programming and
problem abstractions
TAU (w/ mapping)
TAU (no mapping)
23Uintah Task Performance Mapping
- Uintah partitions individual particles across
processing elements (processes or threads) - Simulation tasks in task graph work on particles
- Tasks have domain-specific character in the
computation - interpolate particles to grid in Material Point
Method - Task instances generated for each partitioned
particle set - Execution scheduled with respect to task
dependencies - How to attribute execution time among different
tasks? - Assign semantic name (task type) to a task
instance - SerialMPMinterpolateParticleToGrid
- Map TAU timer object to (abstract) task (semantic
entity) - Look up timer object using task type (semantic
attribute) - Further partition along different domain-specific
axes
24Task Performance Mapping (Profile)
Mapped task performance across processes
Performance mapping for different tasks
25Task Performance Mapping (Trace)
Work packet computation events colored by task
type
Distinct phases of computation can be identifed
based on task
26Task Performance Mapping (Trace - Zoom)
Startup communication imbalance
27Task Performance Mapping (Trace - Parallelism)
Communication / load imbalance
28Comparing Uintah Traces for Scalability Analysis
29Performance Tracking and Reporting
- Integrated performance measurement allows
performance analysis throughout development
lifetime - Applied performance engineering in software
design and development (software engineering)
process - Create performance portfolio from regular
performance experimentation (couple with software
testing) - Use performance knowledge in making key software
design decision, prior to major development
stages - Use performance benchmarking and regression
testing to identify irregularities - Support automatic reporting of performance bugs
- Enable cross-platform (cross-generation)
evaluation
30XPARE - eXPeriment Alerting and REporting
- Experiment launcher automates measurement /
analysis - Configuration and compilation of performance
tools - Instrumentation control for Uintah experiment
type - Execution of multiple performance experiments
- Performance data collection, analysis, and
storage - Integrated in Uintah software testing harness
- Reporting system conducts performance regression
tests - Apply performance difference thresholds (alert
ruleset) - Alerts users via email if thresholds have been
exceeded - Web alerting setup and full performance data
reporting - Historical performance data analysis
31XPARE System Architecture
32Scaling Performance Optimizations (Past)
Last year initial correct scheduler
Reduce communication by 10 x
Reduce task graph overhead by 20 x
ASCI NirvanaSGI Origin 2000 Los AlamosNational
Laboratory
33Scalability to 2000 Processors (Current)
ASCI NirvanaSGI Origin 2000 Los AlamosNational
Laboratory
34Concluding Remarks
- Modern scientific simulation environments
involves a complex (scientific) software
engineering process - Iterative, diverse expertise, multiple teams,
concurrent - Complex parallel software and systems pose
challenging performance analysis problems that
require flexible and robust performance
technology and methods - Cross-platform, cross-language, large-scale
- Fully-integrated performance analysis system
- Performance mapping
- Neet to support performance engineering
methodology within scientific software design and
development - Performance comparison and tracking
35Acknowledgements
- Department of Energy (DOE), ASCI
AcademicStrategic Alliances Program (ASAP) - Center for the Simulation of Accidental Fires
andExplosions (C-SAFE), ASCI/ASAP Level 1
center, University of Utahhttp//www.csafe.utah.
edu - Computational Science Institute, ASCI/ASAPLevel
3 projects with LLNL / LANL,University of
Oregonhttp//www.csi.uoregon.edu - ftp//ftp.cs.uoregon.edu/pub/malony/Talks/ishpc200
2.ppt