Title: EECS 583 Lecture 7 Dataflow Analysis Opti I
1EECS 583 Lecture 7Dataflow Analysis Opti I
- University of Michigan
- January 30, 2002
2Announcements
- Homework 1 due next Wednes
- See me if you are stuck
- Office hours 2223 EECS
- Thursday
- 3-5pm
- Friday
- after 2 pm
- Reading for next week
- Design of a Portable Global Code Optimizer, my
MS thesis - Dry reading more of a manual, but all the facts
are there - On the course webpage
3HW 1 Example output
p2 pclear p3 pclear r1 0 r4 r5 r5 b1
PBR(BB2) p1 CMPP_UN(r4 lt 20) p2 CMPP_ON(r4 lt
20) p3 CMPP_ON(r4 gt 20) BRCT p1, b1
BB1
r1 0 r4 r5 r5 b1 PBR(BB2) p1 CMPP_UN(r4
lt 20) BRCT p1, b1
BB1
F
BB2
T
r6 r1 r4 b2 PBR(BB4) BRU b2
r7 r1 r3 DUMMY_BR
BB3
BB2
r6 r1 r4 if p2 b2 PBR(BB4) if p2 BRU b2 if
p2
r7 r1 r3 if p3 DUMMY_BR
BB3
BB4
r2 r7 RTS
r2 r7 RTS
BB4
CD(BB1) 0, CD(BB2) -1 CD(BB3) 1, CD(BB4) 0
4HW 1 Example output (2)
p2 pclear p3 pclear r1 0 r4 r5 r5 b1
PBR(BB2) p1 CMPP_UN(r4 lt 20) p2 CMPP_ON(r4 lt
20) p3 CMPP_ON(r4 gt 20) BRCT p1, b1
BB1
p2 pclear if T p3 pclear if T r1 0 if T r4
r5 r5 if T b1 PBR(BB2) if T p1 CMPP_UN(r4
lt 20) if T p2 CMPP_ON(r4 lt 20) if T p3
CMPP_ON(r4 gt 20) if T r6 r1 r4 if p2 b2
PBR(BB4) if p2 r7 r1 r3 if p3 r2 r7 if
T RTS if T
BB1 or BB5
BB2
r6 r1 r4 if p2 b2 PBR(BB4) if p2 BRU b2 if
p2
r7 r1 r3 if p3 DUMMY_BR
BB3
r2 r7 RTS
BB4
5Loop unrolling Last control flow topic
- Replicate the body of a loop N-1 times (giving N
total copies) - Loop unrolled N times or Nx unrolled
- Enable overlap of operations from different
iterations - Increase potential for ILP (instruction level
parallelism) - 3 variants
- Unroll multiple of known trip count
- Unroll with remainder loop
- While loop unroll
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 1 blt r2 100 Loop
6Loop unroll Type 1
Counted loop All parms known
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 1
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6
r2 is the loop variable, Increment is 1 Initial
value is 0 Final value is 100 Trip count is 100
r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 r2 1 blt r2 100 Loop
r1 MEMr2 1 r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 r2 2 blt r2 100 Loop
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 1 blt r2 100 Loop
Remove r2 increments from first N-1
iterations and update last increment
Remove branch from first N-1 iterations
7Loop unroll Type 2
Counted loop Some parms unknown
tc final initial tc tc / increment rem tc
N fin rem increment
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6
r2 is the loop variable, Increment is ? Initial
value is ? Final value is ? Trip count is ?
RemLoop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 X blt r2 fin RemLoop
r1 MEMr2 X r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 r2 (NX) blt r2 Y Loop
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 r2 X blt r2 Y Loop
Remainder loop executes the leftover iterations
Unrolled loop same as Type 1, and is guaranteed
to execute a multiple of N times
8Loop unroll Type 3
Non-counted loop Some parms unknown
Just duplicate the body, none of the loop
branches can be removed. Instead they are
converted into conditional breaks
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 MEMr2 0 beq r2 0 Exit
pointer chasing, loop var modified in a
strange way, etc.
Can apply this to any loop including a superblock
or hyperblock loop !
r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt 2 MEMr3
0 r6 r2 MEMr2 0 bne r2 0 Loop Exit
Loop r1 MEMr2 0 r4 r1 r5 r6 r4 ltlt
2 MEMr3 0 r6 r2 MEMr2 0 bne r2 0 Loop
9Loop unroll summary
- Goal Enable overlap of multiple iterations to
increase ILP - Type 1 is the most effective
- All intermediate branches removed, least code
expansion - Limited applicability
- Type 2 is almost as effective
- All intermediate branches removed
- Remainder loop is required since trip count not
known at compile time - Need to make sure dont spend much time in rem
loop - Type 3 can be effective
- No branches eliminated
- But operation overlap still possible
- Always applicable (most loops fall into this
category!) - Use expected trip count to guide unroll amount
10Dataflow analysis Optimization
- Control flow analysis
- Treat BB as black box
- Just care about branches
- Now
- Start looking at ops in BBs
- Whats computed and where
- Classical optimizations
- Want to make the computation more efficient
- Get rid of redundancy
- Simplify
- Ex Common Subexpression Elimination (CSE)
- Is r2 r3 redundant?
- Is r4 r5 redundant?
- What if there were 1000 BBs
- Dataflow analysis !!
r1 r2 r3 r6 r4 r5
r4 4 r6 8
r6 r2 r3 r7 r4 r5
11Dataflow analysis introduction
Dataflow analysis Collection of
information that summarizes the
creation/destruction of values in a program.
Used to identify legal optimization
opportunities.
r1 r2 r3 r6 r4 r5
Pick an arbitrary point in the program
Which VRs contain useful data values? (liveness
or upward exposed uses) Which definitions may
reach this point? (reaching defns) Which
definitions are guaranteed to reach this point?
(available defns) Which uses below are
exposed? (downward exposed uses)
r4 4 r6 8
r6 r2 r3 r7 r4 r5
12Live variable (liveness) analysis
- Defn For each point p in a program and each
variable y, determine whether y can be used
before being redefined starting at p - Algorithm sketch
- For each BB, y is live if it is used before
defined in the BB or it is live leaving the block - Backward dataflow analysis as propagation occurs
from uses upwards to defs - 4 sets
- USE set of external variables consumed in the
BB - DEF set of variables defined in the BB
- IN set of variables that are live at the entry
point of a BB - OUT set of variables that are live at the exit
point of a BB
13Liveness example
r1 r2 r3 r6 r4 r5
r2, r3, r4, r5 are all live as they are consumed
later r6 is dead as it is redefined later
r4 is dead, as it is redefined. So is r6. r2,
r3, r5 are live
r4 4 r6 8
r6 r2 r3 r7 r4 r5
What does this mean? r6 r4 r5 is useless, it
produces a dead value !! Get rid of it.
14Compute USE/DEF sets for each BB (liveness)
def is the union of all the LHSs use is all the
VRs that are used before defined
for each basic block in the procedure, X, do
DEF(X) 0 USE(X) 0 for each operation
in sequential order in X, op, do for each
source operand of op, src, do if
(src not in DEF(X)) then USE(X)
src endif endfor
for each destination operand of op, dest, do
DEF(X) dest endfor
endfor endfor
15Example USE/DEF calculation (liveness)
r1 MEMr20 r2 r2 1 r3 r1 r4
r1 r1 5 r3 r5 r1 r7 r3 2
r2 0 r7 23 r1 4
r8 r7 5 r1 r3 r8 r3 r1 2
16Compute IN/OUT sets for all BBs (liveness)
IN set of variables that are live when the BB
is entered OUT set of variables that are live
when the BB is exited
initialize IN(X) to 0 for all basic blocks
X change 1 while (change) do change 0
for each basic block in procedure, X, do
old_IN IN(X) OUT(X) Union(IN(Y)) for
all successors Y of X IN(X) USE(X)
(OUT(X) DEF(X)) if (old_IN ! IN(X))
then change 1 endif
endfor endfor
17Example IN/OUT calculation (liveness)
r1 MEMr20 r2 r2 1 r3 r1 r4
r1 r1 5 r3 r5 r1 r7 r3 2
r2 0 r7 23 r1 4
r8 r7 5 r1 r3 r8 r3 r1 2
18Reaching definition analysis (rdefs)
- A definition of a variable x is an operation that
assigns, or may assign, a value to x - A definition d reaches a point p if there is a
path from the point immediately following d to p
such that d is not killed along that path - A definition of a variable is killed between 2
points when there is another definition of that
variable along the path - r1 r2 r3 kills previous definitions of r1
- Algorithm sketch
- Forward dataflow analysis as propagation occurs
from defs downwards - 4 sets
- GEN set of definitions generated in the BB (ops
not vars !!) - KILL set of definitions killed in the BB
- IN set of definitions reaching the BB entry
- OUT set of definitions reaching the BB exit
19Rdefs example
1 r1 r2 r3 2 r6 r4 r5
defs 1 and 2 reach this point
3 r4 4 4 r6 8
defs 1, 3, 4 reach this point def 2 is killed by
4
5 r6 r2 r3 6 r7 r4 r5
defs 1, 3, 5, 6 reach this point defs 2, 4 are
killed by 5
20Compute GEN/KILL sets for each BB (rdefs)
gen set of definitions created by an
operation kill set of definitions destroyed by
an operation Assume each operation only has 1
destination for simplicity so just keep track of
ops. Compiler uses refs for a more general
solution.
for each basic block in the procedure, X, do
GEN(X) 0 KILL(X) 0 for each operation
in sequential order in X, op, do for each
destination operand of op, dest, do
G op K all ops which define
dest op GEN(X) G (GEN(X)
K) KILL(X) K (KILL(X) G)
endfor endfor endfor
21Example GEN/KILL calculation (rdefs)
r1 MEMr20 r2 r2 1 r3 r1 r4
r1 r1 5 r3 r5 r1 r7 r3 2
r2 0 r7 23 r1 4
r8 r7 5 r1 r3 r8 r3 r1 2
22Compute IN/OUT sets for all BBs (rdefs)
IN set of definitions reaching the entry of
BB OUT set of definitions leaving BB
initialize IN(X) 0 for all basic blocks
X initialize OUT(X) GEN(X) for all basic blocks
X change 1 while (change) do change 0
for each basic block in procedure, X, do
old_OUT OUT(X) IN(X) Union(OUT(Y))
for all predecessors Y of X OUT(X)
GEN(X) (IN(X) KILL(X)) if (old_OUT !
OUT(X)) then change 1
endif endfor endfor
23Example IN/OUT calculation (rdefs)
r1 MEMr20 r2 r2 1 r3 r1 r4
r1 r1 5 r3 r5 r1 r7 r3 2
r2 0 r7 23 r1 4
r8 r7 5 r1 r3 r8 r3 r1 2
24Some things to think about
- Liveness and rdefs are basically the same thing
- All dataflow is basically the same with a few
parameters - Meaning of gen/kill (use/def)
- Backward / Forward
- All paths / some paths (must/may)
- Today we looked at may analysis algorithms
- How do you adjust to do must algorithms?
- Dataflow can be slow
- How to implement it efficiently?
- How to represent the info?
- Predicates
- Throw a monkey wrench into this stuff
- So, how are predicates handled?
25Problem of the day (liveness)
r1 3 r2 r3 r3 r4
r1 r1 1 r7 r1 r2
r2 0
r2 r2 1
r4 r2 r1
r9 r4 r8
26Problem of the day (rdefs)
r1 3 r2 r3 r3 r4
r1 r1 1 r7 r1 r2
r2 0
r2 r2 1
r4 r2 r1
r9 r4 r8