Allen D. Malony, - PowerPoint PPT Presentation

About This Presentation
Title:

Allen D. Malony,

Description:

Integrating Performance Analysis in the Uintah Software Development Cycle ... optimization (diagnosis, tracking, prediction, tuning) ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 35
Provided by: allend7
Category:

less

Transcript and Presenter's Notes

Title: Allen D. Malony,


1
Integrating Performance Analysis in the Uintah
Software Development Cycle
  • Allen D. Malony,
  • Sameer Shende malony,sameer_at_cs.uoregon.edu
  • Department of Computer andInformation Science
  • Computational Science Institute
  • University of Oregon

J. Davison de St. Germain,Allan Morris, Steven
G. Parker dav,amorris,sparker_at_cs.utah.edu Depa
rtment of Computer Science School of
Computing University of Oregon
2
Outline
  • Scientific software engineering
  • C-SAFE and Uintah Computational Framework (UCF)
  • Goals and design
  • Challenges for performance technology integration
  • TAU performance system
  • Role of performance mapping
  • Performance analysis integration in UCF
  • TAU performance mapping
  • X-PARE
  • Concluding remarks

3
Scientific Software (Performance) Engineering
  • Modern scientific simulation software is complex
    (?)
  • Large development teams of diverse expertise
  • Simultaneous development on different system
    parts
  • Iterative, multi-stage, long-term software
    development
  • Need support for managing complex software
    process
  • Software engineering tools for revision control,
    automated testing, and bug tracking are
    commonplace
  • In contrast, tools for performance engineering
    are not
  • evaluation (measurement, analysis, benchmarking)
  • optimization (diagnosis, tracking, prediction,
    tuning)
  • Incorporate performance engineering methodology
    and support by flexible and robust performance
    tools

4
Utah ASCI/ASAP Level 1 Center (C-SAFE)
  • C-SAFE was established to build a problem-solving
    environment (PSE) for the numerical simulation of
    accidental fires and explosions
  • Combine fundamental chemistry and engineering
    physics
  • Integrate non-linear solvers, optimization,
    computational steering, visualization, and
    experimental data verification
  • Support very large-scale coupled simulations
  • Computer science problems
  • Coupling multiple scientific simulation codes
    with different numerical and software properties
  • Software engineering across diverse expert teams
  • Achieving high performance on large-scale systems

5
Example C-SAFE Simulation Problems
?
Heptane fire simulation
Typical C-SAFE simulation with a billion degrees
of freedom and non-linear time dynamics
Material stress simulation
6
Uintah Problem Solving Environment (PSE)
  • Enhanced SCIRun PSE
  • Pure dataflow ? component-based
  • Shared memory ? scalable multi-/mixed-mode
    parallelism
  • Interactive only ? interactive plus standalone
  • Design and implement Uintah component
    architecture
  • Application programmers provide
  • description of computation (tasks and variables)
  • code to perform task on single patch
    (sub-region of space)
  • Components for scheduling, partitioning, load
    balance,
  • Follow Common Component Architecture (CCA) model
  • Design and implement Uintah Computational
    Framework (UCF) on top of the component
    architecture

7
Uintah High-Level Component View
8
Uintah Parallel Component Architecture
9
Uintah Computational Framework (UCF)
  • Execution model based on software (macro)
    dataflow
  • Exposes parallelism and hides data transport
    latency
  • Computations expressed a directed acyclic graphs
    of tasks
  • consumes input and produces output (input to
    future task)
  • input/outputs specified for each patch in a
    structured grid
  • Abstraction of global single-assignment memory
  • DataWarehouse
  • Directory mapping names to values (array
    structured)
  • Write value once then communicate to awaiting
    tasks
  • Task graph gets mapped to processing resources
  • Communications schedule approximates global
    optimal

10
Uintah Task Graph (Material Point Method)
  • Diagram of named tasks (ovals) and data (edges)
  • Imminent computation
  • Dataflow-constrained
  • MPM
  • Newtonian material point motion time step
  • Solid values defined at material point
    (particle)
  • Dashed values defined at vertex (grid)
  • Prime () values updated during time step

11
Example Taskgraphs (MPM and Coupled)
12
Uintah PSE
  • UCF automatically sets up
  • Domain decomposition
  • Inter-processor communication with
    aggregation/reduction
  • Parallel I/O
  • Checkpoint and restart
  • Performance measurement and analysis (stay tuned)
  • Software engineering
  • Coding standards
  • CVS (Commits Y3 - 26.6 files/day, Y4 - 29.9
    files/day)
  • Correctness regression testing with bugzilla bug
    tracking
  • Nightly build (parallel compiles)
  • 170,000 lines of code (Fortran and C tasks
    supported)

13
Performance Technology Integration
  • Uintah presents challenges to performance
    integration
  • Software diversity and structure
  • UCF middleware, simulation code modules
  • component-based hierarchy
  • Portability objectives
  • cross-language and cross-platform
  • multi-parallelism thread, message passing, mixed
  • Scalability objectives
  • High-level programming and execution abstractions
  • Requires flexible and robust performance
    technology
  • Requires support for performance mapping

14
TAU Performance System Framework
  • Tuning and Analysis Utilities
  • Performance system framework for scalable
    parallel and distributed high-performance
    computing
  • Targets a general complex system computation
    model
  • nodes / contexts / threads
  • Multi-level system / software / parallelism
  • Measurement and analysis abstraction
  • Integrated toolkit for performance
    instrumentation, measurement, analysis, and
    visualization
  • Portable performance profiling/tracing facility
  • Open software approach

15
TAU Performance System Architecture
Paraver
EPILOG
16
Performance Analysis Objectives for Uintah
  • Micro tuning
  • Optimization of simulation code (task) kernels
    for maximum serial performance
  • Scalability tuning
  • Identification of parallel execution bottlenecks
  • overheads scheduler, data warehouse,
    communication
  • load imbalance
  • Adjustment of task graph decomposition and
    scheduling
  • Performance tracking
  • Understand performance impacts of code
    modifications
  • Throughout course of software development
  • C-SAFE application and UCF software

17
Uintah Performance Engineering Approach
  • Contemporary performance methodology focuses on
    control flow (function) level measurement and
    analysis
  • C-SAFE application involves coupled-models with
    task-based parallelism and dataflow control
    constraints
  • Performance engineering on algorithmic (task)
    basis
  • Observe performance based on algorithm (task)
    semantics
  • Analyze task performance characteristics in
    relation to other simulation tasks and UCF
    components
  • scientific component developers can concentrate
    on performance improvement at algorithmic level
  • UCF developers can concentrate on bottlenecks not
    directly associated with simulation module code

18
Task Execution in Uintah Parallel Scheduler
  • Profile methods and functions in scheduler and in
    MPI library

Task execution time dominates (what task?)
Task execution time distribution per process
MPI communication overheads (where?)
  • Need to map performance data!

19
Semantics-Based Performance Mapping
  • Associate performance measurements with
    high-level semantic abstractions
  • Need mapping support in the performance
    measurement system to assign data correctly

20
Hypothetical Mapping Example
  • Particles distributed on surfaces of a cube

Particle PMAX / Array of particles / int
GenerateParticles() / distribute particles
over all faces of the cube / for (int face0,
last0 face lt 6 face) / particles on
this face / int particles_on_this_face
num(face) for (int ilast i lt
particles_on_this_face i) / particle
properties are a function of face / Pi
... f(face) ... last
particles_on_this_face
21
Hypothetical Mapping Example (continued)
int ProcessParticle(Particle p) / perform
some computation on p / int main()
GenerateParticles() / create a list of
particles / for (int i 0 i lt N i) /
iterates over the list / ProcessParticle(Pi)
  • How much time (flops) spent processing face i
    particles?
  • What is the distribution of performance among
    faces?
  • How is this determined if execution is parallel?

22
No Performance Mapping versus Mapping
  • Typical performance tools report performance with
    respect to routines
  • Does not provide support for mapping
  • TAUs performance mapping can observe performance
    with respect to scientists programming and
    problem abstractions

TAU (w/ mapping)
TAU (no mapping)
23
Uintah Task Performance Mapping
  • Uintah partitions individual particles across
    processing elements (processes or threads)
  • Simulation tasks in task graph work on particles
  • Tasks have domain-specific character in the
    computation
  • interpolate particles to grid in Material Point
    Method
  • Task instances generated for each partitioned
    particle set
  • Execution scheduled with respect to task
    dependencies
  • How to attribute execution time among different
    tasks?
  • Assign semantic name (task type) to a task
    instance
  • SerialMPMinterpolateParticleToGrid
  • Map TAU timer object to (abstract) task (semantic
    entity)
  • Look up timer object using task type (semantic
    attribute)
  • Further partition along different domain-specific
    axes

24
Task Performance Mapping (Profile)
Mapped task performance across processes
Performance mapping for different tasks
25
Task Performance Mapping (Trace)
Work packet computation events colored by task
type
Distinct phases of computation can be identifed
based on task
26
Task Performance Mapping (Trace - Zoom)
Startup communication imbalance
27
Task Performance Mapping (Trace - Parallelism)
Communication / load imbalance
28
Comparing Uintah Traces for Scalability Analysis
29
Performance Tracking and Reporting
  • Integrated performance measurement allows
    performance analysis throughout development
    lifetime
  • Applied performance engineering in software
    design and development (software engineering)
    process
  • Create performance portfolio from regular
    performance experimentation (couple with software
    testing)
  • Use performance knowledge in making key software
    design decision, prior to major development
    stages
  • Use performance benchmarking and regression
    testing to identify irregularities
  • Support automatic reporting of performance bugs
  • Enable cross-platform (cross-generation)
    evaluation

30
XPARE - eXPeriment Alerting and REporting
  • Experiment launcher automates measurement /
    analysis
  • Configuration and compilation of performance
    tools
  • Instrumentation control for Uintah experiment
    type
  • Execution of multiple performance experiments
  • Performance data collection, analysis, and
    storage
  • Integrated in Uintah software testing harness
  • Reporting system conducts performance regression
    tests
  • Apply performance difference thresholds (alert
    ruleset)
  • Alerts users via email if thresholds have been
    exceeded
  • Web alerting setup and full performance data
    reporting
  • Historical performance data analysis

31
XPARE System Architecture
32
Scaling Performance Optimizations (Past)
Last year initial correct scheduler
Reduce communication by 10 x
Reduce task graph overhead by 20 x
ASCI NirvanaSGI Origin 2000 Los AlamosNational
Laboratory
33
Scalability to 2000 Processors (Current)
ASCI NirvanaSGI Origin 2000 Los AlamosNational
Laboratory
34
Concluding Remarks
  • Modern scientific simulation environments
    involves a complex (scientific) software
    engineering process
  • Iterative, diverse expertise, multiple teams,
    concurrent
  • Complex parallel software and systems pose
    challenging performance analysis problems that
    require flexible and robust performance
    technology and methods
  • Cross-platform, cross-language, large-scale
  • Fully-integrated performance analysis system
  • Performance mapping
  • Neet to support performance engineering
    methodology within scientific software design and
    development
  • Performance comparison and tracking

35
Acknowledgements
  • Department of Energy (DOE), ASCI
    AcademicStrategic Alliances Program (ASAP)
  • Center for the Simulation of Accidental Fires
    andExplosions (C-SAFE), ASCI/ASAP Level 1
    center, University of Utahhttp//www.csafe.utah.
    edu
  • Computational Science Institute, ASCI/ASAPLevel
    3 projects with LLNL / LANL,University of
    Oregonhttp//www.csi.uoregon.edu
  • ftp//ftp.cs.uoregon.edu/pub/malony/Talks/ishpc200
    2.ppt
Write a Comment
User Comments (0)
About PowerShow.com