Software Optimization - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Software Optimization

Description:

The University of New Mexico. Constant Folding and Propagation ... Suppose x[i] and x[i m] maps to same cache location. ( Direct mapped cache) ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 22
Provided by: netchannel
Category:

less

Transcript and Presenter's Notes

Title: Software Optimization


1
Software Optimization
Vikas, Chaudhary
MA 471
2

High-level code
Intermediate code
Object code
Executable
Assembler
Assembly Code
Steps to create an executable
MA 471
3
  • Higher Optimizations
  • Procedure within basic blocks.
  • Procedure within single and nested loop
    structures.
  • Entire procedure including all blocks and
    structures.
  • File (inter-procedural analysis within a source
    file)
  • Cross file (inter-procedural analysis across all
    procedures)

MA 471
4
  • Compiler Options
  • There are no strict rules about what each level
    of optimization means but generally
  • O0 does one to many translations.
  • O1 does basic block optimizations.
  • O2 does loop optimizations.
  • O4 does interfile optimizations.
  • Some compilers also provide odataprefetch to
    indicate that prefetch instructions should be
    inserted to prefetch data from memory to cache.

MA 471
5
  • Increasing Register Pressure
  • When too many registers are needed,
    compilers must store values to memory and
    restores values from memory. This degrades the
    performance.
  • If we generate assembly code from compiler
    via S and see that there is an inordinate number
    of load and store instructions then it is implied
    that compiler is generating too many spills.
  • Use register data type carefully.

MA 471
6
  • Dead Code Elimination
  • Dead code Elimination is merely the removal of
    code that is never used.
  • i0
  • If (i!0) deadcode(i)

MA 471
7
  • Constant Folding and Propagation
  • Constant folding is when expressions with
    multiple constants are folded together and
    evaluated at compile time.
  • a 1 2
  • Will be replaced by a 3.

MA 471
8
Common Subexpression elimination Common
subexpression elimination analyzes lines of code,
determines where identical subexpressions are
used and creates a temporary variable to hold one
instance of these values.
a b (c d) f e (c
d)
 
MA 471
9
  • Strength Reductions
  • Strength reduction means replacing expensive
    operations with cheaper ones.
  • Replacing integer multiplication or division by
    constants with shift operations.
  • Replacing 32-bit integer division by 64-bit
    floating point division.
  • Replacing floating point multiplications by small
    constants with floating point additions.
  • Replacing power function by floating point
    multiplications.

 
MA 471
10
Filling Branch Delay Slots Branch delay slots
are the instructions after a branch that are
always executed. If the compiler is used with no
optimization, it will probably insert a nop into
branch delay slot.
 
MA 471
11
Induction Variable Optimization for (i0 ilt n
i 2) iai i k m Where i is
induction variable. The above code can be
replaced by ic m for (i 0 ilt n i 2)
iai ic ic ic k
 
MA 471
12
  • Loop Fission
  • This technique is often used when an inner
    loop consists of a large number of lines and the
    compiler has difficulty generating code without
    spilling.
  • This technique is also helpful in improving
    cache performance.
  • for(i 0 i lt n i)
  • Yi yi xi xim
  • Suppose xi and xi m maps to same
    cache location. (Direct mapped cache). This will
    cause cache thrashing.

MA 471
13
  • Loop can be split as
  • for(i 0 i lt n i)
  • yi yi xi
  • for(i 0 i lt n i)
  • yi yi xi m
  • This technique might not be very useful when
    cache is n-way set associative.

MA 471
14
  • Loop Unrolling
  • This technique reduces the effect of
    branches, instruction latency, and potentially
    the number of cache misses.
  • Do I 1, N
  • Y(I) X(I)
  • ENDDO
  • After Unrolling
  • NEND 4 (N/4)
  • Do I I, N , 4
  • Y(I) X(I)
  • Y(I 1) X(I 1)
  • Y(I 1) X(I 1)
  • ENDDO
  • Do I NEND1 , N
  • Y(I) X(I)
  • ENDDO

MA 471
15
  • Loading all the values of X before the values of
    Y reduces the possibility of cache thrashing.
  • Amount of unrolling can decrease the number of
    software prefetch instructions.
  • Excessive unrolling will cause data to be spilled
    from register to memory.
  • Unrolling increases size of object code, which
    might cause too many instruction cache misses.

MA 471
16
  • Clock Cycles in an Unrolled Loop

Original order CC Modified order CC
Load X(1) 1 Load X(1) 1
Store Y(1) 7 Load X(2) 2
Load X(2) 8 Load X(3) 3
Store X(2) 14 Load X(4) 4
Load X(3) 15 Store X(1) 7
Store X(3) 21 Store X(2) 8
Load X(4) 22 Store X(3) 9
Store X(4) 28 Store X(4) 10
MA 471
17
  • Loop peeling
  • This technique is used by compilers to
    handle boundary conditions.
  • Do I 1, N
  • if(I .EQ. 1) then
  • XI 0
  • ELSEIF (I. EQ. N) THEN
  • X(I) N
  • ELSE
  • X(I) X (I) Y(I)
  • ENDDO
  • AFTER LOOP PEELING
  • X(1) 0
  • Do I 2, N-1
  • X(I) X(I) Y(I)
  • ENDDO
  • X(N) N

MA 471
18
  • Software Pipelining
  • Software pipelining is a technique for
    recognizing loops such that each iteration in the
    software-pipelined code is made from instructions
    chosen from different iterations of the original
    loop.
  • Iteration 0

Iteration 1
Iteration 2
Iteration 3
MA 471
19
  • Software pipeline is an optimization that is
    impossible to duplicate with high level code
    since the multiple assembly language instruction
    that a single line of high level language creates
    are moved around extensively.
  • Software pipeline is created only at high
    optimization level.

MA 471
20
  • Compiler Speculation with Hardware Support
  • Modern compilers try to speculate either to
    improve the scheduling or to increase issue rate.
  • Hurdle
  • Conditional instructions.
  • In moving instructions across a branch the
    compiler must ensure that exception behavior is
    not changed and dynamic data dependence remains
    same.
  • Compiler also finds out, which registers
    are not being used and those registers are
    renamed.

MA 471
21
  • Thank you!

MA 471
Write a Comment
User Comments (0)
About PowerShow.com