Title: Code Generation
1Code Generation
- The target machine
- Instruction selection and register allocation
- Basic blocks and flow graphs
- A simple code generator
- Peephole optimization
- Instruction selector generator
- Graph-coloring register allocator
2The Target Machine
- A byte addressable machine with four bytes to a
word and n general purpose registers - Two address instructions
- op source, destination
- Six addressing modes
- absolute M M 1
- register R R 0
- indexed c(R) ccontent(R) 1
- ind register R content(R) 0
- ind indexed c(R) content(ccontent(R)) 1
- literal c c 1
3Examples
MOV R0, M MOV 4 (R0), M MOV R0, M MOV 4
(R0), M MOV 1, R0
4Instruction Costs
- Cost of an instruction 1 costs of source and
destination addressing modes - This cost corresponds to the length (in words) of
the instruction - Minimize instruction length also tend to minimize
the instruction execution time
5Examples
MOV R0, R1 1 MOV R0, M 2 MOV 1,
R0 2 MOV 4 (R0), 12 (R1) 3
6An Example
Consider a b c 1. MOV b,
R0 2. MOV b, a ADD c, R0 ADD c,
a MOV R0, a 3. R0, R1, R2 contains 4.
R1, R2 contains the addresses of a, b, c
the values of b, c MOV R1, R0
ADD R2, R1 ADD R2, R0 MOV R1, a
7Instruction Selection
- Code skeleton x y z a
b c d a e MOV y, R0
MOV b, R0 MOV a, R0
ADD z, R0 ADD c, R0 ADD
e, R0 MOV R0, x MOV R0, a
MOV R0, d - Multiple choices a a 1 MOV
a, R0 INC a ADD
1, R0 MOV R0, a
8Register Allocation
- Register allocation select the set of variables
that will reside in registers - Register assignment pick the specific register
that a variable will reside in - The problem is NP-complete
9An Example
t a b t a b t t
c t t c t t / d t t /
d MOV a, R1 MOV a, R0 ADD b, R1 ADD b,
R0 MUL c, R0 ADD c, R0 DIV d, R0 SRDA R0,
32 MOV R1, t DIV d, R0 MOV R1, t
10Basic Blocks
- A basic block is a sequence of consecutive
statements in which control enters at the
beginning and leaves at the end without halt or
possibility of branching except at the end
11An Example
(1) prod 0 (2) i 1 (3) t1 4
i (4) t2 at1 (5) t3 4 i (6) t4
bt3 (7) t5 t2 t4 (8) t6 prod
t5 (9) prod t6 (10) t7 i
1 (11) i t7 (12) if i lt 20 goto (3)
12Flow Graphs
- A flow graph is a directed graph
- The nodes in the graph are basic blocks
- There is an edge from B1 to B2 iff B2 immediately
follows B1 in some execution sequence - B2 immediately follows B1 in program text
- there is a jump from B1 to B2
- B1 is a predecessor of B2, B2 is a successor of B1
13An Example
(1) prod 0 (2) i 1 (3) t1 4
i (4) t2 at1 (5) t3 4 i (6) t4
bt3 (7) t5 t2 t4 (8) t6 prod
t5 (9) prod t6 (10) t7 i
1 (11) i t7 (12) if i lt 20 goto (3)
B0
B1
14Construction of Basic Blocks
- Determine the set of leaders
- the first statement is a leader
- the target of a jump is a leader
- any statement immediately following a jump is a
leader - For each leader, its basic block consists of the
leader and all statements up to but not including
the next leader or the end of the program
15Representation of Basic Blocks
- Each basic block is represented by a record
consisting of - a count of the number of statements
- a pointer to the leader
- a list of predecessors
- a list of successors
16Define and Use
- A three address statement x y z is said
to define x and to use y and z - A name is live in a basic block at a given point
if its value is used after that point, perhaps in
another basic block
17Next-Use Information
i x no assignment to x j y
x Statement j uses the value of x
defined at i
18An Example
b(1), c(1,4), d(2)
(1) a b c a(2,3,5), c(4),
d(2) (2) e a d a(3,5), c(4), e(3)
(3) f e - a a(5), c(4), f(4) (4) e
f c a(5), e(5) (5) g e - a
g(?)
b, c, d are live at the beginning of the block
19Computing Next Uses
- Scan statements i x y op z backward
- Attach to statement i the information currently
found in the symbol table regarding the next uses
and liveness of x, y, and z - In the symbol table, set x to not live and
clear the next uses of x - In the symbol table, set y and z to live and
add i to the next uses of y and z
among blocks
within blocks
20A Simple Code Generator
- Consider each statement in a basic block in turn,
remembering if operands are in registers - Assume that
- each operator has a corresponding target language
operator - computed results can be left in registers as long
as possible, unless - out of registers
- at the end of a basic block
21Register and Address Descriptors
- A register descriptor keeps track of what is
currently in each register - An address descriptor keeps track of the
location(s) where the current value of the name
can be found at run time
22An Example
d (a - b) (a - c) (a -
c) t a - b MOV a,
R0 R0(t) SUB b, R0 t(R0) u a
- c MOV a, R1 R0(t), R1(u) SUB c,
R1 t(R0), u(R1) v t u ADD R1,
R0 R0(v), R1(u) v(R0), u(R1) d
v u ADD R1, R0 R0(d) d(R0) MOV
R0, d
23Code Generation Algorithm
- Consider an instruction of the form x y op
z - Invoke getreg to determine the location L where
the result of y op z will be placed - Determine a current location y of y from the
address descriptor (register location preferred).
If y is not L, generate MOV y, L - Generate op z, L, where z is a current
location of z from the address descriptor. - Update the address and register descriptors for
x, y, z, and L
24Code Generation Algorithm
- Consider an instruction of the form x y
- If y is in a register, change the register and
address descriptors - If y is in memory,
- if x has next use in the block, invoke getreg to
find a register r, generate MOV y, r, and
make r the location of x - otherwise, generate MOV y, x
25Code Generation Algorithm
- Once all statements in the basic block are
processed, we store those names that are live on
exit and not in their memory locations
26The Function getreg
- Consider an instruction of the form x y op
z - If y is in a register r that holds the value of
no other names, and y is not live and no next
uses after this statement, return r - Otherwise, return an empty register r if there is
one - Otherwise, if x has a next use in the block, or
op is an operator requiring a register, find an
occupied register r. Store the value of r, update
address descriptor, and return r - If x has no next use, or no suitable occupied
register can be found, return the memory location
of x
27An Example
d (a - b) (a - c) (a -
c) t a - b MOV a,
R0 R0(t) SUB b, R0 t(R0) u a
- c MOV a, R1 R0(t), R1(u) SUB c,
R1 t(R0), u(R1) v t u ADD R1,
R0 R0(v), R1(u) v(R0), u(R1) d
v u ADD R1, R0 R0(d) d(R0) MOV
R0, d
28Indexing and Pointer Operations
i in Ri i in
Mi i in Si(A) a bi MOV
b(Ri), R MOV Mi, R MOV Si(A), R
MOV b(R), R
MOV b(R), R ai b MOV b, a(Ri) MOV Mi,
R MOV Si(A), R
MOV b, a(R) MOV b, a(R)
p in Rp p in Mp p
in Sp(A) a p MOV Rp, R MOV Mp, R
MOV Sp(A), R
MOV R, R MOV R, R p a
MOV a, Rp MOV Mp, R Mov a, R
MOV a, R
MOV R, Sp(A)
29Conditional Statements
- Condition codes if x lt y goto z CMP x,
y CJlt z - Conditon code descriptors x y z MOV
y, R0 if x lt 0 goto z ADD z,
R0 MOV R0, x CJlt z
30Global Register Allocation
- Keep live variables in registers across block
boundaries - Keep variables frequently used in inner loops in
registers
31Loops
- A loop is a collection of nodes such that
- all nodes in the collection are strongly
connected - the collection of nodes has a unique entry
- An inner loop is one that contains no other loops
32Variable Usage Counts
- Savings
- Count a saving of one for each use of x in loop L
that is not preceded by an assignment to x in the
same block - Save two units if we can avoid a store of x at
the end of a block - Costs
- Cost two units if x is live at the entry or exit
of the inner loop
33An Example
34An Example
use(a, B1) 0, use(a, B2) 1 use(a, B3) 1,
use(a, B4) 0 live(a, B1) 1, live(a, B2)
0 live(a, B3) 0, live(a, B4) 0 save(a)
(0110) 2 ? (1000) 4 save(b) 5
save(c) 3 save(d) 6 save(e) 4
save(f) 4
35An Example
36Register Assignment for Outer Loops
- Apply the same idea for inner loops to
progressively larger loops - If an outer loop L1 contains an inner loop L2, a
name allocated a register in L2 need not be
allocated a register in L1-L2 - If name x is allocated a register in L1 but not
L2, need store x on entrance to L2 and load x on
exit from L2 - If name x is allocated a register in L2 but not
L1, need load x on entrance to L2 and store x on
exit from L2
37Peephole Optimization
- Improve the performance of the target program by
examining and transforming a short sequence of
target instructions - May need repeated passes over the code
- Can also be applied directly after intermediate
code generation
38Examples
- Redundant loads and stores MOV R0, a MOV a, Ro
- Algebraic Simplification x x 0 x
x 1 - Constant folding x 2 3 x 5 y
x 3 y 8
39Examples
- Unreachable code define debug 0 if (debug)
(print debugging information) if 0 ltgt 1
goto L1 print debugging
informationL1 if 1 goto L1 print
debugging informationL1
40Examples
- Flow-of-control optimization goto L1 goto
L2 L1 goto L2 L2 goto L2 goto
L1 if a lt b goto L2 goto L3L1 if a
lt b goto L2 L3 L3
41Examples
- Reduction in strength replace expensive
operations by cheaper ones - x2 ? x x
- fixed-point multiplication and division by a
power of 2 ? shift - floating-point division by a constant ?
floating-point multiplication by a constant
42Examples
- Use of machine Idioms hardware instructions for
certain specific operations - auto-increment and auto-decrement addressing mode
(push or pop stack in parameter passing)
43DAG Representation of Blocks
- Easy to determine
- common subexpressions
- names used in the block but evaluated outside the
block - names whose values could be used outside the block
44DAG Representation of Blocks
- Leaves labeled by unique identifiers
- Interior nodes labeled by operator symbols
- Nodes optionally given a sequence of identifiers,
having the value represented by the nodes
45An Example
(1) t1 4 i (2) t2 at1 (3) t3 4
i (4) t4 bt3 (5) t5 t2 t4 (6) t6
prod t5 (7) prod t6 (8) t7 i 1 (9) i
t7 (10) if i lt 20 goto (1)
46Constructing a DAG
- Consider x y op z. Other statements can be
handled similarly - If node(y) is undefined, create a leaf labeled y
and let node(y) be this leaf. If node(z) is
undefined, create a leaf labeled z and let
node(z) be that leaf
47Constructing a DAG
- Determine if there is a node labeled op, whose
left child is node(y) and its right child is
node(z). If not, create such a node. Let n be the
node found or created. - Delete x from the list of attached identifiers
for node(x). Append x to the list of attached
identifiers for the node n and set node(x) to n
48Reconstructing Quadruples
- Evaluate the interior nodes in topological order
- Assign the evaluated value to one of its attached
identifier x, preferring one whose value is
needed outside the block - If there is no attached identifier, create a new
temp to hold the value - If there are additional attached identifiers y1,
y2, , yk whose values are also needed outside
the block, add y1 x, y2 x, , yk x
49An Example
prod
(1) t1 4 i (2) t2 at1 (3) t3
bt1 (4) t4 t2 t3 (5) prod prod
t4 (6) i i 1 (7) if i lt 20 goto (1)
prod0
(1)
i
b
a
20
i0
4
1
50Arrays, Pointers, Procedure Calls
x ai x ai aj y z x z
ai aj y gt range analysis p w gt
aliasing analysis side effects caused by
procedure calls gt inter-procedural analysis
51Ordering Rules
- Any evaluation of or assignment to an element of
array a must follow the previous assignment of
that array if there is one - Any assignment to an element of array a must
follow any previous evaluation of a
52Ordering Rules
- Any use of any identifier must follow the
previous procedure call or indirect assignment
through a pointer if there is one - Any procedure call or indirect assignment through
a pointer must follow all previous evaluations of
any identifier
53Generating Code From DAGs
t1 a b t2 c d t3 e - t2 t4 t1 - t3
(1) MOV a, R0 (2) ADD b, R0 (3) MOV c,
R1 (4) ADD d, R1 (5) MOV R0, t1 (6) MOV e,
R0 (7) SUB R1, R0 (8) MOV t1, R1 (9) SUB
R0, R1 (10) MOV R1, t4
54Rearranging the Order
t2 c d t3 e - t2 t1 a b t4 t1 - t3
(1) MOV c, R0 (2) ADD d, R0 (3) MOV e,
R1 (4) SUB R0, R1 (5) MOV a, R0 (6) ADD b,
R0 (7) SUB R1, R0 (8) MOV R0, t4
55A Heuristic Ordering for DAG
- Attempt as far as possible to make the evaluation
of a node immediately follow the evaluation of
its left most argument
56Node Listing Algorithm
while unlisted interior nodes remain do begin
select an unlisted node n, all of whose
parents have been listed list n while
the leftmost child m of n has no unlisted
parents and is not a leaf do begin list
m n m end end
57An Example
t7 d e t6 a b t5 t6 - c t4 t5
t7 t3 t4 - e t2 t6 t4 t1 t2 t3
58Generating Code From Trees
- There exists an algorithm that determines the
optimal order in which to evaluate statements in
a block when the dag representation of the block
is a tree - Optimal order here means the order that yields
the shortest instruction sequence
59Optimal Ordering for Trees
- Label each node of the tree bottom-up with an
integer denoting fewest number of registers
required to evaluate the tree with no stores of
immediate results - Generate code during a tree traversal by first
evaluating the operand requiring more registers
60The Labeling Algorithm
if n is a leaf then if n is the leftmost
child of its parent then label(n) 1
else label(n) 0 else begin let
n1, n2, , nk be the children of n ordered by
label so that label(n1) ? label(n2) ? ?
label(nk) label(n) max1? i ? k(label(ni)
i - 1) end
61An Example
For binary interior nodes
62Code Generation From a Labeled Tree
- Use a stack rstack to allocate registers R0, R1,
, R(r-1) - The value of a tree is always computed in the top
register on rstack - The function swap(rstack) interchanges the top
two registers on rstack - Use a stack tstack to allocate temporary memory
locations T0, T1, ...
63Cases Analysis
name
name
64The Function gencode
procedure gencode(n) begin if n is a left leaf
representing operand name and n is the
leftmost child of its parent then print 'MOV'
name ',' top(rstack) else if n is an
interior node with operator op, left
child n1, and right child n2 then if
label(n2) 0 then / case 1 / else if 1?
label(n1) lt label(n2) and label(n1) lt r then /
case 2 / else if 1? label(n2) ? label(n1)
and label(n2) lt r then / case 3 / else /
case 4, both labels ? r / end
65The Function gencode
/ case 1 / begin let name be the operand
represented by n2 gencode(n1) print op
name ',' top(rstack) end / case 2
/ begin swap(rstack) gencode(n2) R
pop(rstack) gencode(n1) print op R
',' top(rstack) push(rstack, R)
swap(rstack) end
66The Function gencode
/ case 3 / begin gencode(n1) R
pop(rstack) gencode(n2) print op R
',' top(rstack) push(rstack, R) end /
case 4 / begin gencode(n2) T
pop(tstack) print 'MOV' top(rstack)
',' T gencode(n1) push(tstack,
T) print op T ',' top(rstack) end
67An Example
gencode(t4) R1, R0 / 2 / gencode(t3)
R0, R1 / 3 / gencode(e) R0, R1 /
0 / print MOV e, R1 gencode(t2) R0
/ 1 / gencode(c) R0 / 0
/ print MOV c, R0 print ADD d,
R0 print SUB R0, R1 gencode(t1) R0
/ 1 / gencode(a) R0 / 0 /
print MOV a, R0 print ADD b, R0 print
SUB R1, R0
-
-
68Multiregister Operations
- Some operations like multiplication, division, or
a function call normally require more than one
register - The labeling algorithm needs to ensure that
label(n) is always at least the number of
registers required by the operation
69Algebraic Properties
commutative
associative
commutative
largest
70Common Subexpressions
- Nodes with more than one parent in a dag are
called shared nodes - Optimal code generation for dags on both a
one-register machine or an unlimited number of
registers machine are NP-complete
71Partitioning a DAG into Trees
- Partition a dag into a set of trees by finding
for each root and shared node n, the maximal
subtree with n as root that includes no other
shared nodes, except as leaves - Determine a code generation ordering for the
trees - Generate code for each tree using the algorithms
for generating code from trees
72An Example
1
3
2
1
6
4
4
3
2
e0
4
4
5
7
7
5
6
6
c0
e0
d0
c0
a0
b0
6
e0
a0
b0
73Dynamic Programming Code Generation
- The dynamic programming algorithm applies to a
broad class of register machines with complex
instruction sets - Machines has r interchangeable registers
- Machines has instructions of the form Ri
Ewhere E is any expression containing operators,
registers, and memory locations. If E involves
registers, then Ri must be one of them
74Dynamic Programming
- The dynamic programming algorithm partitions the
problem of generating optimal code for an
expression into sub-problems of generating
optimal code for the sub-expressions of the given
expression
75Contiguous Evaluation
- We say a program P evaluates a tree T
contiguously if - it first evaluates those subtrees of T that need
to be computed into memory - it then evaluates the subtrees of the root in
either order - it finally evaluates the root
76Optimally Contiguous Program
- For the machines defined above, given any program
P to evaluate an expression tree T, we can find
an equivalent program P' such that - P' is of no higher cost than P
- P' uses no more registers than P
- P' evaluates the tree in a contiguous fashion
- This implies that every expression tree can be
evaluated optimally by a contiguous program
77Dynamic Programming Algorithm
- Phase 1 compute bottom-up for each node n of the
expression tree T an array C of costs, in which
the ith component Ci is the optimal cost of
computing the subtree S rooted at n into a
register, assuming i registers are available for
the computation. C0 is the optimal cost of
computing the subtree S into memory
78Dynamic Programming Algorithm
- To compute Ci at node n, consider each machine
instruction R E whose expression E matches the
subexpression rooted at node n - Determine the costs of evaluating the operands of
E by examining the cost vectors at the
corresponding descendants of n
79Dynamic Programming Algorithm
- For those operands of E that are registers,
consider all possible orders in which the
corresponding subtrees of T can be evaluated into
registers - In each ordering, the first subtree corresponding
to a register operand can be evaluated using i
available registers, the second using i-1
registers, and so on
80Dynamic Programming Algorithm
- For node n, add in the cost of the instruction R
E that was used to match node n - The value Ci is then the minimum cost over all
possible orders - At each node, store the instruction used to
achieve the best cost for Ci for each i - The smallest cost in the vector gives the minimum
cost of evaluating T
81Dynamic Programming Algorithm
- Phase 2 traverse T and use the cost vectors to
determine which subtrees of T must be computed
into memory - Phase 3 traverse T and use the cost vectors and
associated instructions to generate the final
target code
82An Example
Consider a machine with two registers R0 and
R1 and instructions Ri Mj Mi Ri Ri
Rj Ri Ri op Rj Ri Ri op Mj
83An Example
R0 c R1 d R1 R1 / e R0 R0 R1 R1
a R1 R1 - b R1 R1 R0
84Code Generator Generators
- A tool to automatically construct the instruction
selection phrase of a code generator - Such tools may use tree grammars or context free
grammars to describe the target machines - Register allocation will be implemented as a
separate mechanism - Graph coloring is one of the approaches for
register allocation
85Tree Rewriting
ai b 1
ind
memb
const1
ind
regsp
consta
consti
regsp
86Tree Rewriting
- The code is generated by reducing the input tree
into a single node using a sequence of
tree-rewriting rules - Each tree rewriting rule is of the
form replacement ? template action - replacement is a single node
- template is a tree
- action is a code fragment
- A set of tree-rewriting rules is called a
tree-translation scheme
87An Example
regi
?
ADD Rj, Ri
Each tree template represents a computation
performed by the sequence of machines
instructions emitted by the associated action
88Tree Rewriting Rules
89Tree Rewriting Rules
regi ?
ADD c(Rj), Ri
(6)
ADD Rj, Ri
regi ?
(7)
regi
regj
INC Ri
regi ?
(8)
const1
regi
90An Example
ind
memb
const1
ind
regsp
consta
consti
regsp
(1)
MOV a, R0
91An Example
ind
memb
const1
ind
regsp
reg0
consti
regsp
(7)
ADD SP, R0
92An Example
ind
ADD i (SP), R0
memb
const1
reg0
ind
MOV i (SP), R1
(5)
consti
regsp
(6)
93An Example
ind
memb
const1
reg0
(2)
MOV b, R1
94An Example
ind
reg1
const1
reg0
(8)
INC R1
95An Example
ind
reg1
reg0
(4)
MOV R1, R0
96Tree Pattern Matching
- The tree pattern matching algorithm can be
implemented by extending the multiple-keyword
pattern matching algorithm - Each tree template is represented by a set of
strings, each of which represents a path from the
root to a leave - Each rule is associated with cost information
- The dynamic programming algorithm can be used to
select an optimal sequence of matches
97Semantic Predicates
if c 1 then INC Ri else ADD c, Ri
regi
?
regi
constc
The general use of semantic actions and
predicates can provide greater flexibility and
ease of description than a purely grammatical
specification
98Pattern Matching by Parsing
- Use an LR parser to do the pattern matching
- The input tree can be treated as a string by
using its prefix representation ind
consta regsp ind consti regsp memb
const1 - The tree-translation scheme can be converted into
a syntax-directed translation scheme by replacing
the tree templates with their prefix
representations
99Syntax-Directed Translation Scheme
(1) regi ? constc MOV c, Ri (2) regi
? mema MOV a, Ri (3) mem ? mema
regi MOV Ri, a (4) mem ? ind regi
regj MOV Rj, Ri (5) regi ? ind
constc regj MOV c(Rj), Ri (6) regi ?
regi ind constc regj ADD c(Rj), Ri (7)
regi ? regi regj ADD Rj, Ri (8)
regi ? regi const1 INC Ri
100Advantages of Syntax-Directed Translation Scheme
- The parsing method is efficient and well
understood - It is relatively easy to retarget the code
generator - The code generator can be made more efficient by
adding special-case productions
101Disadvantages of Syntax-Directed Translation
Scheme
- A left-to-right order of evaluation is fixed
- The machine description grammar can become
inordinately large - Context free grammar is usually highly ambiguous
102Graph Coloring
- In the first pass, target machine instructions
are selected as though there were an infinite
number of symbolic registers - In the second pass, physical registers are
assigned to symbolic registers using graph
coloring algorithms - During the second pass, if a register is needed
when all available registers are used, some of
the used registers must be spilled
103Interference Graph
- For each procedure, a register-interference graph
is constructed - The nodes in the graph are symbolic registers
- An edge connects two nodes if one is live at a
point where the other is defined
104K-Colorable Graphs
- A graph is said to be k-colorable if each node
can be assigned one of the k colors such that no
two adjacent nodes have the same color - A color represents a register
- The problem of determining whether a graph is
k-colorable is NP-complete
105A Graph Coloring Algorithm
- Remove a node n and its edges if it has fewer
than k neighbors - Repeat the removing step above until we end up
with the empty graph or a graph in which each
node has k or more adjacent nodes - In the latter case, a node is selected and
spilled by deleting that node and its edges, and
the removing step above continues
106A Graph Coloring Algorithm
- The nodes in the graph can be colored in the
reverse order in which they are removed - Each node can be assigned a color not assigned to
any of its neighbors - Spilled nodes can be assigned any color
107An Example
108An Example