Profiling, Performance Tuning, and Design Issues - PowerPoint PPT Presentation

About This Presentation
Title:

Profiling, Performance Tuning, and Design Issues

Description:

Code Optimization Compiler Options for producing the Fastest Executable Using optimization flags when compiling can greatly reduce the runtime of an executable. – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 17
Provided by: fredann
Learn more at: https://eecs.ceas.uc.edu
Category:

less

Transcript and Presenter's Notes

Title: Profiling, Performance Tuning, and Design Issues


1
Profiling, Performance Tuning, and Design Issues
  • Basic Efficiency Guidelines
  • Select best algorithm.
  • How to know? Scalable? Portable?
  • Use efficient libraries when possible
  • Compiler optimizations.
  • Code Optimization

2
Compiler Options for producing the Fastest
Executable
  • Using optimization flags when compiling can
    greatly reduce the runtime of an executable.
  • Each compiler has a different set of options for
    creating the fastest executable .
  • Often the best compiler options can only be
    arrived at by empirical testing and timing of
    your code.
  • A good reference for compiler flags that can be
    used with various architectures is the SPEC web
    site www.spec.org.
  • Read the Compiler manpages.
  • GNU -O3 ffast-math funroll-loops

3
Optimizing Memory Access
  • Memory access more of performance bottleneck than
    processor speed
  • Largest potential for performance improvement
  • Access data to minimize out-of-cache memory use

4
Memory Latencies
  • CPU register 0 cycles
  • L1 cache hit 2-3 cycles
  • L1 cache miss satisfied by L2 cache hit 8-12
    cycles
  • L2 cache miss satisfied from main memory, no TLB
    miss 75-250 cycles
  • TLB miss requiring only reload of the TLB 2000
    cycles
  • TLB miss requiring reload of virtual page page
    fault hundreds of millions of cycles

5
Other Code Optimizations
  • Copy Propagation
  • Constant Folding
  • Dead Code Removal
  • Induction Variable Simplification
  • Function Inlining
  • Loop Invariant Conditionals
  • Variable RenamingLoop Invariant Code Motion
  • Loop Fusion
  • Pushing Loops inside Subroutines
  • Loop Index Dependent Conditionals
  • Loop Unrolling
  • Loop Stride Size
  • Floating Point Optimizations
  • Faster Algorithms
  • External Libraries
  • Assembly Code
  • Lookup Tables

6
Code Optimization References
  • Software Optimizations for High Performance
    Computing by Crawford and Wadleigh
  • High Performance Computing by Kevin Dowd et al
  • Performance Optimization for Numerically
    Intensive Codes by Goedecker and Hoisie

7
Timing and Profiling Codes
  • Need to know where to focus attention
  • Premature Optimization is the root of all evil
  • Donald Knuth
  • The 80-20 rule codes generally spend 80 of
    their time executing 20 of their instructions
  • flat profile shows how much time your program
    spent in each function, and how many times that
    function was called.
  • call graph shows, for each function, which
    functions called it, which other functions it
    called, and how many times.
  • annotated source listing is a copy of the
    program's source code, labeled with the number of
    times each line of the program was executed.

8
GNU gprof
  • The first step in generating profile information
    for your program is to compile and link it with
    profiling enabled use the -pg' option when you
    run the compiler. (This is in addition to the
    options you normally use.)
  • The -pg' option also works with a command that
    both compiles and links
  • cc -o myprog myprog.c utils.c -g -pg
  • Execute code in normal manner
  • ./myprog
  • Create profile with gprof
  • gprof myprog gt myprog.prof

9
Profiling on the Beowulf Cluster
  • Compile
  • pgf77 -Mproffunc program.f
  • pgcc -Mproffunc program.c
  • Run the code
  • To produce a profile data file called pgprof.out.
  • View the execution profile
  • pgprof pgprof.out

10
Pgprof (without x windows)
  • Loading....
  • Datafile pgprof.out
  • Processes 1
  • pgprofgt print
  • Time/ Function
  • Calls Call() Time() Cost() Name
  • --------------------------------------------------
    ----------------------
  • 4100500 0.00 23.43 23 lxi (cdnz3d.f1632)
  • 4100500 0.00 21.90 22 damping (cdnz3d.f2319)
  • 4100500 0.00 21.87 22 leta (cdnz3d.f1790)
  • 4100500 0.00 11.68 12 lzeta (cdnz3d.f1947)
  • 4100500 0.00 11.24 33 sum (cdnz3d.f2107)
  • 250 0.02 5.99 97 page (cdnz3d.f1527)
  • 0.01 2.79 3 tmstep (cdnz3d.f678)
  • pgprofgt quit

11
Overview of PAPI
  • Performance Application Programming Interface
  • The purpose of the PAPI project is to design,
    standardize and implement a portable and
    efficient API to access the hardware performance
    monitor counters found on most modern
    microprocessors.
  • Parallel Tools Consortium project
  • http//www.ptools.org/

12
PAPI Counter Interfaces
  • PAPI provides three interfaces to the underlying
    counter hardware
  • The low level interface manages hardware events
    in user defined groups called EventSets.
  • The high level interface simply provides the
    ability to start, stop and read the counters for
    a specified list of events.
  • Graphical tools to visualize information.

13
(No Transcript)
14
Parallel Communication Profiling
  • A significant factor that affects the performance
    of a parallel application is the balance between
    communication and workload.
  • The challenge of the message passing model is in
    reducing message traffic over the interconnection
    network. Performance analysis tools are needed.
  • Two such tools
  • VAMPIR http//www.pallas.com
  • uses the profile extensions to MPI and permits
    analysis of the message events where data is
    transmitted between processors during execution
    of a parallel program. It has user-interface with
    zooming and filtering.
  • PARAVER http//www.cepba.upc.es/
  • was developed to respond to the basic need to
    have a qualitative perception of the
  • application behavior by visual inspection and
    then to be able to focus on the detailed
    quantitative analysis of the problems.

15
(No Transcript)
16
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com