Title: Scalar Optimizations
1Scalar Optimizations
2Roadmap
- Last two lectures
- Iterative data flow analysis
- SSA
- Today selected optimizations using these
techniques - Constant propagation
- Copy propagation
- Code motion for loop invariants
- Partial redundancy elimination
3Constant Propagation
- s x C //for some constant C
-
- u x
- If statement s is the only definition of x
reaching statement u, we can replace x with
constant C - Save a register
- Enable constant folding and dead code elimination
- Can potentially remove conditional branches
- What if more than one definition reaches u?
- Data-flow analysis across basic blocks
- Replacement is iterative
- One replacement may trigger more opportunities
4Using Dataflow Equations
- ConstIn(b) pairs of ltvariable, valuegt that the
compiler can prove to hold on entry to block b - One ltvariable, valuegt for each variable
- ltx,Cgt?ConstIn(b) variable x is guaranteed to
take a constant value C on entry to block b - ltx,NACgt x is guaranteed not to be a constant
- ltx,UNDEFgt we know nothing assertive about x
- ConstIn (b) ? ConstOut(j) for block j ? Pred(b)
- Meet operation for the pairs
- ltx,cgt?ltx,cgt ltx, cgt
- ltx,c1gt?ltx,c2gt ltx, NACgt (c1? c2)
- ltx,cgt?ltx,NACgt ltx, NACgt
- ltx,cgt?ltx,UNDEFgt ltx,cgt
- ltx,UNDEFgt?ltx,NACgt ltx, NACgt
5Using Dataflow Equations
- ConstOut(b) pairs of ltvariable, valuegt on exit
from block b - Initialized to be ConstIn(b) and modified by
processing each statement s in block b in order - s is a simple copy x?y, the value of y decides x
- s is a computation x?y op z, the values of y and
z decide x - ltx,c1 op c2gt ?ConstOut if lty,c1gt and ltz,
c2gt?ConstOut - ltx,NACgt ?ConstOut if either lty,NACgt or ltz,
NACgt?ConstOut - ltx,UNDEFgt ?ConstOut otherwise
- s is a function call or assignment via a pointer
ltx,NACgt ?ConstOut - Optimization opportunity exists only for x s.t.
ltx,Cgt ?ConstIn(b) for some constant C
6Example
ltX,UNDEFgt, ltY,UNDEFgt,
X2 Y3
X3 Y2
ltX,3gt,ltY,2gt,
ltX,2gt,ltY,3gt,
ltX,NACgt,ltY,NACgt,
ZXY
ltX,NACgt,ltY,NACgt,ltZ,NACgt,
7Constant Propagation w/ SSA
- For statements xi C, for some constant C,
replace all xi with C - For xi f(C,C,...,C), for some constant C,
replace statement with xi C - Iterate
8Example SSA
a1 3 d1 2
a 3 d 2
d3 f(d2,d1) a3 f(a2,a1) f1 a3 d3 g1
5 a2 g1 d3 f1 lt g1
f a d g 5 a g d f lt g
T
F
F
T
f2 g1 1
g1 lt a2
f g 1
g lt a
F
T
T
F
f3 f(f2,f1) d2 2
d 2
9Example SSA
a1 3 d1 2
d3 f(d2,d1) a3 f(a2,a1) f1 a3 d3 g1
5 a2 g1 d3 f1 lt g1
F
T
f2 g1 1
g1 lt a2
F
T
f3 f(f2,f1) d2 2
10Example SSA
This may continue for a few steps ...
11Scalar Optimizations
- Constant propagation
- Copy propagation
- Code motion for loop invariants
- Partial redundancy elimination
12Copy Propagation
b a c 4b c gt b
d b 2
e a b
- Idea use v for u wherever possible after the
copy statement uv - Benefits
- Can create dead code
- Can enable algebraic simplifications
13Using Dataflow Analysis
- Finding copies in blocks can be represented by a
dataflow analysis framework similar to the one
for constant propagation - A pair ltu,vgt indicates that value is copied from
v to u - Data flow direction?
- Forward analysis
- Meet operator?
- CopyIn(b) n CopyOut(j) for every predecessor j
of b - Transfer function?
- CopyOut(b) is computed from CopyIn(b) by
processing each operations in b - Similar to constant propagation
14Example 1
CopyIn(b) CopyOut(b)
b a c 4b c gt b
b a c 4a c gt a
ltb,agt
ltb,agt
d b 2
d a 2
ltb,agt
ltb,agt
e a b
e a a
ltb,agt
15Example 2
CopyIn(b) CopyOut(b)
c a b d c e d d
ltd,cgt
ltd,cgt
f a c g e a g d a lt c
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
f d g f gt a
h g 1
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
b g a h lt f
c 2
ltg,egt
ltd,cgt,ltg,egt
ltg,egt
16Example 2
CopyIn(b) CopyOut(b)
c a b d c e c c
ltd,cgt
ltd,cgt
f a c g e a e c a lt c
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
f c e f gt a
h e 1
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
ltd,cgt,ltg,egt
b e a h lt f
c 2
ltg,egt
ltd,cgt,ltg,egt
ltg,egt
17Scalar Optimizations
- Constant propagation
- Copy propagation
- Code motion for loop invariants
- Partial redundancy elimination
18Loop Invariants Code Motion
- A loop invariant expression is a computation
whose value does not change as long as control
stays in the loop - Code motion is the optimization that finds loop
invariants and moves them out of the loop -
- while (i lt limit - 2)
- ?
- t limit - 2
- while (i lt t)
19Part 1 Detecting Loop Invariants
- Mark invariant all statements whose operands
either are constants or have all reaching
definitions outside the loop - How to know this?
- Iterate until there are no more invariants to
mark - Iteratively marking all statements whose operands
either are constants, have all reaching
definitions outside the loop or have only
invariant reaching definitions
20Loop Invariants
i 1
i lt 100
do i 1, 100 k i (n2) do j i,
100 ai,j 100 n 10k j
end end
t
f
t1 n 2 k i t1 j i
f
j lt 100
i i 1
t
t2 100n t3 10 k t4 t2 t3 t5 t4
j j j 1
21Loop Invariants SSA
i1 1
do i 1, 100 k i (n2) do j i,
100 ai,j 100 n 10k j
end end
i2 f(i1,i3) i2 lt 100
f
t
t10 n1 2 k1 i2 t10 j1 i2
f
j2f(j1,j3) j2 lt 100
i3 i2 1
t
Outer loop
t20 100n1 t30 10 k1 t40 t20 t30
t50 t40 j2 j3 j2 1
Inner loop
22Part 2 Code Motion
- An invariant statement s x y z can
sometimes be moved out of the loop - Code can be moved just before the header
- Will dominate the whole loop after code motion
- Three conditions (following slides)
invariant
loop header
loop header
loop body
loop body
23Code Motion
- Condition 1 To move invariant tx op y, either
the block that containing this invariant must
dominate all loop exits, or t must be not
live-out of any loop exit
x 1
u lt v
x 2 u u 1
Invariant
v v 1 v lt 20
jx
Violation of Condition 1
24Code Motion
- Condition 2 To move invariant tx op y, it must
be the only definition of t in the loop
x 1
x 3 u lt v
Invariant
x 2 u u 1
v v 1 v lt 20
jx
Violation of Condition 2
25Code Motion
- Condition 3 To move invariant tx op y, no use
of t in the loop is reached by any other
definition of t
x 1
u lt v
k x u u 1
x 2 v v 1 v lt 20
Invariant
Violation of Condition 3
26Code Motion Example
i1 1
i2 f(i1,i3) i2 lt 100
f
t
t10 n1 2 k1 i2 t10 j1 i2
f
j2f(j1,j3) j2 lt 100
i3 i2 1
t
Assuming t1 not live outside the loop-nest, this
stmt is invariant and all three conditions met
t20 100n1 t30 10 k1 t40 t20 t30
t50 t40 j2 j3 j2 1
27Code Motion Example
i1 1 t10 n1 2
i2 f(i1,i3) i2 lt 100
f
t
k1 i2 t10 j1 i2
f
j2f(j1,j3) j2 lt 100
i3 i2 1
t
t20 100n1 t30 10 k1 t40 t20 t30
t50 t40 j2 j3 j2 1
invariant and all conditions met, assuming t2,
t3, t4 not live outside the loop-nest
28Code Motion Example
invariant and all conditions met
29Scalar Optimizations
- Constant propagation
- Copy propagation
- Code motion for loop invariants
- Partial redundancy elimination
30Redundant Expressions
- Expression E is redundant at point p if
- On every path to p, E has been evaluated before
reaching p and none of the constituent values of
E has been redefined after the evaluation - Expression E is partially redundant at point p if
- E is redundant along some but not all paths to p
- To optimize insert code to make it fully
redundant
31Loop Invariants
- Loop invariant expressions are partially
redundant - Available for all loop iterations except for the
very first one - Code motion works by making the expression fully
redundant
a b c
a b c
a b c
redundant
partially redundant
32Partial Redundancy Elimination
- Uses standard data-flow techniques to figure out
where to move the code - Subsumes classical global common sub-expression
elimination and code motion of loop invariants - Used by many optimizing compilers
- Traditionally applied to lexically equivalent
expressions - With SSA support, applied to values as well
33Partial Redundancy Elimination
- May add a block to deal with critical edges
- Critical edge edge leading from a block with
more than one successor to a block with more than
one predecessor
tde at
ade
tde
cde
ct
34Partial Redundancy Elimination
- Code duplication to deal with redundancy
ade
ade t a
B4
B4
B4
cde
cde
ct
Can we find a way to deal with redundancy in
general??
35Lazy Code Motion
- Redundancy common expressions, loop invariant
expressions, partially redundant expressions - Desirable Properties
- All redundant computations of expressions that
can be eliminated with code duplication are
eliminated. - The optimized program does not perform any
computation that is not in the original program
execution - Expressions are computed at the latest possible
time.
36Lazy Code Motion
- Solve four data-flow problems that reveal the
limit of code motion - AVAIL available expressions
- ANTI anticipated expression
- EARLIEST earliest placement for expressions
- LATER expressions that can be postponed
- Compute INSERT and DELETE sets based on the
data-flow solutions for each basic block - They define how to move expressions between basic
blocks
37Lazy Code Motion
z a x gt 3
B1
Can we make this better?
B3
z x y y lt 5
B2
z lt 7
B5
B4
B7
B6
b x y
B8
B9
c x y
Exit
38Lazy Code Motion
z a x gt 3
B1
xy
B3
z x y y lt 5
B2
z lt 7
xy
B5
B4
B7
B6
b x y
B8
Placing computation at these points ensure our
conditions
B9
c x y
Exit
39Local Information
- For each block b, compute the local sets
- DEExpr an expression is downward-exposed
(locally generated) if it is computed in b and
its operands are not modified after its last
computation - UEExpr an expression is upward-exposed if it is
computed in b and its operands are not modified
before its first computation - NotKilled an expression is not killed if none of
its operands is modified in b - f b d
- a b c
- d a e
DEExpr a e, b c UEExpr b d, b c
NotKilled b c
40Local Information
- What do they imply?
- DEExpre ? DEExpr(b) ? evaluating e at the end of
b produces the same result as evaluating it at
the original position in b - UEExpre ? UEExpr(b) ? evaluating e at the entry
of b produces the same result as evaluating it at
the original position in b - NotKilled e ? NotKilled(b) ? evaluating e at
either the start or end of b produces the same
result as evaluating it at the original position - f b d
- a b c
- d a e
DEExpr a e, b c UEExpr b d, b c
NotKilled b c
41Global Information
- Availability
- AvailIn(n0) Ø
- AvailIn(b)?x?pred(b)AvailOut(x), b ? n0
- AvailOut(b)DEExpr(b)?(AvailIn(b)? NotKilled(b))
- Initialize AvailIn and AvailOut to be the set of
expressions for all blocks except for the entry
block n0 - Interpreting Avail sets
- e ? AvailOut(b) ? evaluating e at end of b
produces the same value for e as its most recent
evaluation, no matter whether the most recent one
is inside b or not - AvailOut tells the compiler how far forward e can
move
42Global Information
- Anticipability
- Expression e is anticipated at a point p if e is
certain to be evaluated along all computation
path leaving p before any re-computation of es
operands - AntOut(nf) Ø
- AntOut(b)?x?succ(b)AntIn(x), b ? nf
- AntIn(b)UEExpr(b)?(AntOut(b)?NotKilled(b))
- Initialize AntOut to be the set of expressions
for all blocks except for the exit block nf - Interpreting Ant sets
- e ? AntIn(b) ? evaluating e at start of b
produces the same value for e as evaluating it at
the original position(later than start of b) with
no additional overhead - AntIn tells the compiler how far backward e can
move
43Example
z a x gt 3
B1
B3
z x y y lt 5
B2
z lt 7
B5
B4
B7
B6
b x y
B8
B9
c x y
Exit
44Example Avail
- AvailIn(b)?x?pred(b)AvailOut(x)
- AvailOut(b)DEExpr(b)?(AvailIn(b)?NotKilled(b))
z a x gt 3
B1
B3
z x y y lt 5
B2
z lt 7
xy
xy
B5
B4
xy
xy
B7
B6
b x y
xy
B8
B9
c x y
xy
Exit
45Example Ant
- AntOut(b)?x?succ(b)AntIn(x)
- AntIn(b)UEExpr(b)?(AntOut(b)?NotKilled(b))
z a x gt 3
B1
xy
B3
z x y y lt 5
B2
z lt 7
xy
xy
xy
B5
B4
xy
xy
xy
B7
B6
b x y
xy
B8
xy
xy
B9
c x y
Exit
46Example Avail and Ant
Interesting spots Anticipated but not available
z a x gt 3
B1
xy
B3
z x y y lt 5
B2
z lt 7
xy
xy
xy
xy
xy
B5
B4
xy
xy
xy
xy
xy
B7
B6
b x y
xy
xy
B8
xy
xy
B9
c x y
xy
Exit
47Example EARLIEST
- EARLIEST(i,j) AntIn(j) ? AvailOut(i)??(NotKilled
(i) ? AntOut(i))
z a x gt 3
B1
xy
B3
z x y y lt 5
B2
z lt 7
xy
B5
B4
B7
B6
b x y
B8
B9
c x y
Exit
48Placing Expressions
- Earliest placement
- For an edge lti,jgt in CFG, an expression e is in
Earliest (i,j) if and only if the computation can
legally move to lti,jgt and cannot move to any
earlier edge - EARLIEST(i,j) AntIn(j) ? AvailOut(i)?
- ??(NotKilled(i) ? AntOut(i))
- e ? AntIn(j) we can move e to the start of block
j without generating un-necessary computation - e ? AvailOut(i) no previous computation of e is
available from the exit of i if such an e
exists, it would make the computation on lti,jgt
redundant - e ? (Killed(i) ?AntOut(i)) we cannot move e
further upward - e ? Killed(i) e cannot be moved to an edge ltx,igt
with the same value - e ? AntOut(i) there is another path starting
with edge lti,xgt along which e is not evaluated
with the same value
49Placing Expressions
- Earliest placement
- For an edge lti,jgt in CFG, an expression e is in
Earliest (i,j) if and only if the computation can
legally move to lti,jgt and cannot move to any
earlier edge - EARLIEST(i,j) AntIn(j) ? AvailOut(i)?
- ??(NotKilled(i) ? AntOut(i))
- EARLIEST(n0,j) AntIn(j) ? AvailOut(n0)?
- We can never move e before entry point n0 the
last term is ignored - n0 must be the dummy entry point
50Postpone Evaluations
- We want to delay the evaluation of expressions as
long as possible - Motivation save register usage
- There is a limit to this delay
- Not past the use of the expression
- Not so far that we end up computing an expression
that is already evaluated
51Placing Expressions
- Later (than earliest) placement
- An expression e is in LaterIn(k) if evaluation of
e can be moved through entry to k without losing
any benefit - e ? LaterIn(k) if and only if every path that
reaches k includes an edge ltp,qgt s.t. e?
EARLIEST(p,q), and the path from q to k neither
kills e nor uses e - LaterIn(j) ? i?pred(j)LATER(i,j), j?n0?
- LaterIn(n0) Ø
- LATER(i,j) (EARLIEST(i,j) ? LaterIn(i))?
UEExpr(i), i?pred(j) ? - An expression e is in LATER(i,j) if evaluation of
e can be moved (postponed) to CFG edge lti,jgt - e ? LATER(i,j) if lti,jgt is its earliest
placement, or it can be moved to the entry of i
and there is no evaluation(use) of e in block i
52Example LATER
- LaterIn(j) ? i?pred(j)LATER(i,j), j?n0?
- LATER(i,j) (EARLIEST(i,j) ? LaterIn(i))?
UEExpr(i), i?pred(j)
?
z a x gt 3
B1
xy
B3
xy
z x y y lt 5
B2
z lt 7
xy
xy
B5
B4
xy
B7
B6
b x y
B8
B9
c x y
Exit
53Rewriting Code
- Insert set for each CFG edge
- The computations that LCM should insert on that
edge - Insert(i,j) LATER(i,j) ? LaterIn(j) ?
- e ? Insert(i,j) means an evaluation of e should
be added between block i and block j - Three possible places to add
- Delete set for each block
- The computations that LCM should delete from that
block - Delete(i) UEExpr(i) ? LaterIn(i), i?n0? ?
- The first computation in i is redundant
54Example INSERT DELETE
- Insert(i,j) LATER(i,j) ? LaterIn(j) ?
- Delete(i) UEExpr(i) ? LaterIn(i), i?n0? ?
?
z a x gt 3
B1
B3
xy
z x y y lt 5
B2
z lt 7
xy
B5
B4
xy
B7
B6
b x y
B8
LATER
B9
c x y
Exit
55Rewriting Code
- Insert set for each CFG edge
- The computations that LCM should insert on that
edge - Insert(i,j) LATER(i,j) ? LaterIn(j) ?
- If i has only one successor, insert computations
at the end of i - If j has only one predecessor, insert
computations at the entry of j - Otherwise, split the edge and insert the
computations in a new block between i and j - Delete set for each block
- The computations that LCM should delete from that
block - Delete(i) UEExpr(i) ? LaterIn(i), i?n0? ?
- The first computation in i is redundant remove it
56Inserting Code
- Evaluation placement for x ? INSERT(i,j)
- Three cases
- succs(i) 1 ? insert at end of i
- succs(i) gt 1, but preds(j) 1? insert at
start of j - succs(i) gt 1, preds(j) gt 1 ? create new
block in lti,jgt for x
57Example INSERT DELETE
- Insert(i,j) LATER(i,j) ? LaterIn(j) ?
- Delete(i) UEExpr(i) ? LaterIn(i), i?n0? ?
?
INSERT
z a x gt 3
B1
B3
z x y y lt 5
B2
z lt 7
B5
B4
xy
B7
B6
b x y
B8
B9
c x y
Exit
58Example Rewriting
z a x gt 3
B1
t1 x y z t1 y lt 5
B3
B2
z lt 7
If put a t1xy assignment in B5, xy is fully
redundant in B9
B5
B4
t1 x y
B7
B6
b t1
B8
B9
c t1
Exit
59Lazy Code Motion
- Step1 identify the limit of code motion
- Available expressions
- Anticipated expressions
- Step 2 move expression evaluation up
- Later ones may become redundant
- Step 3 move expression evaluation down
- Delay evaluation to minimize register lifetime
- Step 4 rewrite code
60Lazy Code Motion
- A powerful algorithm
- Finds different forms of redundancy in a unified
framework - Subsumes loop invariant code motion and common
expression elimination - Data-flow analysis
- Composes several simple data-flow analyses to
produce a powerful result
61Summary
- Scalar optimizations
- Constant propagation
- Copy propagation
- Code motion for loop invariants
- Partial redundancy elimination
- Advanced topics
- EAC Ch10.4 combining multiple optimizations and
choosing an order other objectives - Dragon Ch9.7-9.8 region-based data-flow analysis
62Next Lecture (after the midterm)
- Topic register allocation
- References
- Dragon Ch8.8
- EAC Ch13