Title: Code Generation
1Code Generation
2Code Generation
- The target machine
- Runtime environment
- Basic blocks and flow graphs
- Instruction selection
- Instruction selector generator
- Register allocation
- Peephole optimization
3The Target Machine
- A byte addressable machine with four bytes to a
word and n general purpose registers - Two address instructions
- op source, destination
- Six addressing modes
- absolute M M 1
- register R R 0
- indexed c(R) ccontent(R) 1
- ind register R content(R) 0
- ind indexed c(R) content(ccontent(R)) 1
- literal c c 1
4Examples
MOV R0, M MOV 4 (R0), M MOV R0, M MOV 4
(R0), M MOV 1, R0
5Instruction Costs
- Cost of an instruction 1 costs of source and
destination addressing modes - This cost corresponds to the length (in words) of
the instruction - Minimize instruction length also tend to minimize
the instruction execution time
6Examples
MOV R0, R1 1 MOV R0, M 2 MOV 1,
R0 2 MOV 4 (R0), 12 (R1) 3
7An Example
Consider a b c 1. MOV b,
R0 2. MOV b, a ADD c, R0 ADD c,
a MOV R0, a 3. R0, R1, R2 contains 4.
R1, R2 contains the addresses of a, b, c
the values of b, c MOV R1, R0
ADD R2, R1 ADD R2, R0 MOV R1, a
8Instruction Selection
- Code skeleton x y z a
b c d a e MOV y, R0
MOV b, R0 MOV a, R0
ADD z, R0 ADD c, R0 ADD
e, R0 MOV R0, x MOV R0, a
MOV R0, d - Multiple choices a a 1 MOV
a, R0 INC a ADD
1, R0 MOV R0, a
9Register Allocation
- Register allocation select the set of variables
that will reside in registers - Register assignment pick the specific register
that a variable will reside in - The problem is NP-complete
10An Example
t a b t a b t t
c t t c t t / d t t /
d MOV a, R1 MOV a, R0 ADD b, R1 ADD b,
R0 MUL c, R0 ADD c, R0 DIV d, R0 SRDA R0,
32 MOV R1, t DIV d, R0 MOV R1, t
11Runtime Environments
- A translation needs to relate the static source
text of a program to the dynamic actions that
must occur at runtime to implement the program - Essentially, the relationship between names and
data objects - The runtime support system consists of routines
that manage the allocation and deallocation of
data objects
12Activations
- A procedure definition associates an identifier
(name) with a statement (body) - Each execution of a procedure body is an
activation of the procedure - An activation tree depicts the way control enters
and leaves activations
13An Example
program sort (input, output) var a array
0..10 of integer procedure readarray var
i integer begin for i 1 to 9 do
read(ai) end procedure partition(y, z
integer) integer var i, j, x, v integer
begin end procedure quicksort(m, n
integer) var i integer begin if (n
gt m) then begin I partition(m, n)
quicksort(m, I-1) quicksort (I1, n) end
end begin a0 -9999 a10 9999
readarray quicksort(1,9) end.
14An Example
15Scope
- A declaration associates information with a name
- Scope rules determine which declaration of a name
applies - The portion of the program to which a declaration
applies is called the scope of that declaration
16Bindings of Names
- The same name may denote different data objects
(storage locations) at runtime - An environment is a function that maps a name to
a storage location - A state is a function that maps a storage
location to the value held there
environment
state
name
storage location
value
17Static and Dynamic Notions
18Storage Organization
code
- Target code static
- Static data objects static
- Dynamic data objects heap
- Automatic data objects stack
static data
stack
heap
19Activation Records
returned value
actual parameters
stack
optional control link
optional access link
machine status
local data
temporary data
20Activation Records
returned value and parameters
links and machine status
local and temporary data
returned value and parameters
links and machine status
frame pointer
local and temporary data
stack pointer
21Declarations
P ? offset 0 D D ? D D D ? id
T enter(id.name, T.type, offset)
offset offset T.width T ? integer
T.type integer T.width 4 T ? float
T.type float T.width 8 T ? array
num of T1 T.type array(num.val,
T1.type) T.width num.val ?
T1.width T ? T1 T.type
pointer(T1.type) T.width 4
22Nested Procedures
P ? D D ? D D id T proc id
D S
header
nil
header
i
a
header
x
readarray
header
header
exchange
k
i
quicksort
v
j
partition
23Symbol Table Handling
- Operations
- mktable(previous) creates a new table and
returns a pointer to the table - enter(table, name, type, offset) creates a new
entry for name in the table - addwidth(table, width) records the cumulative
width of entries in the header - enterproc(table, name, newtable) creates a new
entry for procedure name in the table - Stacks
- tblptr pointers to symbol tables
- offset the next available relative address
24Declarations
P ? M D addwidth(top(tblptr),
top(offset)) pop(tblptr) pop(offset) M ? ?
t mktable(nil) push(t, tblptr)
push(0, offset) D ? D D D ? proc id
N D S t top(tblptr)
addwidth(t, top(offset)) pop(tblptr)
pop(offset) enterproc(top(tblptr),
id.name, t) D ? id T
enter(top(tblptr), id.name, T.type,
top(offset)) top(offset)
top(offset) T.width N ? ? t
mktable(top(tblptr)) push(t, tblptr) push(0,
offset)
25Records
T ? record D end T ? record L D end
T.type record(top(tblptr))
T.width top(offset) pop(tblptr)
pop(offset) L ? ? t
mktable(nil) push(t, tblptr)
push(0, offset)
26Basic Blocks
- A basic block is a sequence of consecutive
statements in which control enters at the
beginning and leaves at the end without halt or
possibility of branching except at the end
27An Example
(1) prod 0 (2) i 1 (3) t1 4
i (4) t2 at1 (5) t3 4 i (6) t4
bt3 (7) t5 t2 t4 (8) t6 prod
t5 (9) prod t6 (10) t7 i
1 (11) i t7 (12) if i lt 20 goto (3)
28Control Flow Graphs
- A (control) flow graph is a directed graph
- The nodes in the graph are basic blocks
- There is an edge from B1 to B2 iff B2 immediately
follows B1 in some execution sequence - there is a jump from B1 to B2
- B2 immediately follows B1 in program text
- B1 is a predecessor of B2, B2 is a successor of B1
29An Example
(1) prod 0 (2) i 1 (3) t1 4
i (4) t2 at1 (5) t3 4 i (6) t4
bt3 (7) t5 t2 t4 (8) t6 prod
t5 (9) prod t6 (10) t7 i
1 (11) i t7 (12) if i lt 20 goto (3)
B0
B1
30Construction of Basic Blocks
- Determine the set of leaders
- the first statement is a leader
- the target of a jump is a leader
- any statement immediately following a jump is a
leader - For each leader, its basic block consists of the
leader and all statements up to but not including
the next leader or the end of the program
31Representation of Basic Blocks
- Each basic block is represented by a record
consisting of - a count of the number of statements
- a pointer to the leader
- a list of predecessors
- a list of successors
32DAG Representation of Blocks
- Easy to determine
- common subexpressions
- names used in the block but evaluated outside the
block - names whose values could be used outside the block
33DAG Representation of Blocks
- Leaves labeled by unique identifiers
- Interior nodes labeled by operator symbols
- Interior nodes optionally given a sequence of
identifiers, having the value represented by the
nodes
34An Example
(1) t1 4 i (2) t2 at1 (3) t3 4
i (4) t4 bt3 (5) t5 t2 t4 (6) t6
prod t5 (7) prod t6 (8) t7 i 1 (9) i
t7 (10) if i lt 20 goto (1)
35Constructing a DAG
- Consider x y op z. Other statements can be
handled similarly - If node(y) is undefined, create a leaf labeled y
and let node(y) be this leaf. If node(z) is
undefined, create a leaf labeled z and let
node(z) be that leaf
36Constructing a DAG
- Determine if there is a node labeled op, whose
left child is node(y) and its right child is
node(z). If not, create such a node. Let n be the
node found or created. - Delete x from the list of attached identifiers
for node(x). Append x to the list of attached
identifiers for the node n and set node(x) to n
37Reconstructing Quadruples
- Evaluate the interior nodes in topological order
- Assign the evaluated value to one of its attached
identifier x, preferring one whose value is
needed outside the block - If there is no attached identifier, create a new
temp to hold the value - If there are additional attached identifiers y1,
y2, , yk whose values are also needed outside
the block, add y1 x, y2 x, , yk x
38An Example
prod
(1) t1 4 i (2) t2 at1 (3) t3
bt1 (4) t4 t2 t3 (5) prod prod
t4 (6) i i 1 (7) if i lt 20 goto (1)
prod0
(1)
i
b
a
20
i0
4
1
39Generating Code From DAGs
t1 a b t2 c d t3 e - t2 t4 t1 - t3
(1) MOV a, R0 (2) ADD b, R0 (3) MOV c,
R1 (4) ADD d, R1 (5) MOV R0, t1 (6) MOV e,
R0 (7) SUB R1, R0 (8) MOV t1, R1 (9) SUB
R0, R1 (10) MOV R1, t4
Only R0 and R1 available
40Rearranging the Order
t2 c d t3 e - t2 t1 a b t4 t1 - t3
(1) MOV c, R0 (2) ADD d, R0 (3) MOV e,
R1 (4) SUB R0, R1 (5) MOV a, R0 (6) ADD b,
R0 (7) SUB R1, R0 (8) MOV R0, t4
41A Heuristic Ordering for DAG
- Attempt as far as possible to make the evaluation
of a node immediately follow the evaluation of
its left most argument
42Node Listing Algorithm
while unlisted interior nodes remain do begin
select an unlisted node n, all of whose
parents have been listed list n while
the leftmost child m of n has no unlisted
parents and is not a leaf do begin list
m n m end end
43An Example
t7 d e t6 a b t5 t6 - c t4 t5
t7 t3 t4 - e t2 t6 t4 t1 t2 t3
44Generating Code From Trees
- There exists an algorithm that determines the
optimal order in which to evaluate statements in
a block when the dag representation of the block
is a tree - Optimal order here means the order that yields
the shortest instruction sequence
45Optimal Ordering for Trees
- Label each node of the tree bottom-up with an
integer denoting fewest number of registers
required to evaluate the tree with no stores of
immediate results - Generate code during a tree traversal by first
evaluating the operand requiring more registers
46The Labeling Algorithm
if n is a leaf then if n is the leftmost
child of its parent then label(n) 1
else label(n) 0 else begin let
n1, n2, , nk be the children of n ordered by
label so that label(n1) ? label(n2) ? ?
label(nk) label(n) max1? i ? k(label(ni)
i - 1) end
47An Example
For binary interior nodes
48Code Generation From a Labeled Tree
- Use a stack rstack to allocate registers R0, R1,
, R(r-1) - The value of a tree is always computed in the top
register on rstack - The function swap(rstack) interchanges the top
two registers on rstack - Use a stack tstack to allocate temporary memory
locations T0, T1, ...
49Cases Analysis
name
name
50The Function gencode
procedure gencode(n) begin if n is a left leaf
representing operand name and n is the
leftmost child of its parent then print 'MOV'
name ',' top(rstack) else if n is an
interior node with operator op, left
child n1, and right child n2 then if
label(n2) 0 then / case 1 / else if 1?
label(n1) lt label(n2) and label(n1) lt r then /
case 2 / else if 1? label(n2) ? label(n1)
and label(n2) lt r then / case 3 / else /
case 4, both labels ? r / end
51The Function gencode
/ case 1 / begin let name be the operand
represented by n2 gencode(n1) print op
name ',' top(rstack) end / case 2
/ begin swap(rstack) gencode(n2) R
pop(rstack) gencode(n1) print op R
',' top(rstack) push(rstack, R)
swap(rstack) end
52The Function gencode
/ case 3 / begin gencode(n1) R
pop(rstack) gencode(n2) print op R
',' top(rstack) push(rstack, R) end /
case 4 / begin gencode(n2) T
pop(tstack) print 'MOV' top(rstack)
',' T gencode(n1) push(tstack,
T) print op T ',' top(rstack) end
53An Example
gencode(t4) R1, R0 / 2 / gencode(t3)
R0, R1 / 3 / gencode(e) R0, R1 /
0 / print MOV e, R1 gencode(t2) R0
/ 1 / gencode(c) R0 / 0
/ print MOV c, R0 print ADD d,
R0 print SUB R0, R1 gencode(t1) R0
/ 1 / gencode(a) R0 / 0 /
print MOV a, R0 print ADD b, R0 print
SUB R1, R0
-
-
54Common Subexpressions
- Nodes with more than one parent in a dag are
called shared nodes - Optimal code generation for dags on both a
one-register machine or an unlimited number of
registers machine are NP-complete
55Partitioning a DAG into Trees
- Partition a dag into a set of trees by finding
for each root and shared node n, the maximal
subtree with n as root that includes no other
shared nodes, except as leaves - Determine a code generation ordering for the
trees - Generate code for each tree using the algorithms
for generating code from trees
56An Example
1
3
2
1
6
4
4
3
2
e0
4
4
5
7
7
5
6
6
c0
e0
d0
c0
a0
b0
6
e0
a0
b0
57Dynamic Programming Code Generation
- The dynamic programming algorithm applies to a
broad class of register machines with complex
instruction sets - Machines has r interchangeable registers
- Machines has instructions of the form Ri
Ewhere E is any expression containing operators,
registers, and memory locations. If E involves
registers, then Ri must be one of them
58Dynamic Programming
- The dynamic programming algorithm partitions the
problem of generating optimal code for an
expression into sub-problems of generating
optimal code for the sub-expressions of the given
expression
59Contiguous Evaluation
- We say a program P evaluates a tree T
contiguously if - it first evaluates those subtrees of T that need
to be computed into memory - it then evaluates the subtrees of the root in
either order - it finally evaluates the root
60Optimally Contiguous Program
- For the machines defined above, given any program
P to evaluate an expression tree T, we can find
an equivalent program P' such that - P' is of no higher cost than P
- P' uses no more registers than P
- P' evaluates the tree in a contiguous fashion
- This implies that every expression tree can be
evaluated optimally by a contiguous program
61Dynamic Programming Algorithm
- Phase 1 compute bottom-up for each node n of the
expression tree T an array C of costs, in which
the ith component Ci is the optimal cost of
computing the subtree S rooted at n into a
register, assuming i registers are available for
the computation. C0 is the optimal cost of
computing the subtree S into memory
62Dynamic Programming Algorithm
- To compute Ci at node n, consider each machine
instruction R E whose expression E matches the
subexpression rooted at node n - Determine the costs of evaluating the operands of
E by examining the cost vectors at the
corresponding descendants of n
63Dynamic Programming Algorithm
- For those operands of E that are registers,
consider all possible orders in which the
corresponding subtrees of T can be evaluated into
registers - In each ordering, the first subtree corresponding
to a register operand can be evaluated using i
available registers, the second using i-1
registers, and so on
64Dynamic Programming Algorithm
- For node n, add in the cost of the instruction R
E that was used to match node n - The value Ci is then the minimum cost over all
possible orders - At each node, store the instruction used to
achieve the best cost for Ci for each i - The smallest cost in the vector gives the minimum
cost of evaluating T
65Dynamic Programming Algorithm
- Phase 2 traverse T and use the cost vectors to
determine which subtrees of T must be computed
into memory - Phase 3 traverse T and use the cost vectors and
associated instructions to generate the final
target code
66An Example
Consider a machine with two registers R0 and
R1 and instructions Ri Mj Mi Ri Ri
Rj Ri Ri op Rj Ri Ri op Mj
67An Example
R0 c R1 d R1 R1 / e R0 R0 R1 R1
a R1 R1 - b R1 R1 R0
68Code Generator Generators
- A tool to automatically construct the instruction
selection phrase of a code generator - Such tools may use tree grammars or context free
grammars to describe the target machines - Register allocation will be implemented as a
separate mechanism
69Tree Rewriting
ai b 1
ind
memb
const1
ind
regsp
consta
consti
regsp
70Tree Rewriting
- The code is generated by reducing the input tree
into a single node using a sequence of
tree-rewriting rules - Each tree rewriting rule is of the
form replacement ? template action - replacement is a single node
- template is a tree
- action is a code fragment
- A set of tree-rewriting rules is called a
tree-translation scheme
71An Example
regi
?
ADD Rj, Ri
Each tree template represents a computation
performed by the sequence of machines
instructions emitted by the associated action
72Tree Rewriting Rules
73Tree Rewriting Rules
regi ?
ADD c(Rj), Ri
(6)
ADD Rj, Ri
regi ?
(7)
regi
regj
INC Ri
regi ?
(8)
const1
regi
74An Example
ind
memb
const1
ind
regsp
consta
consti
regsp
(1)
MOV a, R0
75An Example
ind
memb
const1
ind
regsp
reg0
consti
regsp
(7)
ADD SP, R0
76An Example
ind
ADD i (SP), R0
memb
const1
reg0
ind
MOV i (SP), R1
(5)
consti
regsp
(6)
77An Example
ind
memb
const1
reg0
(2)
MOV b, R1
78An Example
ind
reg1
const1
reg0
(8)
INC R1
79An Example
ind
reg1
reg0
(4)
MOV R1, R0
80Tree Pattern Matching
- The tree pattern matching algorithm can be
implemented by extending the multiple-keyword
pattern matching algorithm - Each tree template is represented by a set of
strings, each of which represents a path from the
root to a leave - Each rule is associated with cost information
- The dynamic programming algorithm can be used to
select an optimal sequence of matches
81Semantic Predicates
if c 1 then INC Ri else ADD c, Ri
regi
?
regi
constc
The general use of semantic actions and
predicates can provide greater flexibility and
ease of description than a purely grammatical
specification
82Graph Coloring
- In the first pass, target machine instructions
are selected as though there were an infinite
number of symbolic registers - In the second pass, physical registers are
assigned to symbolic registers using graph
coloring algorithms - During the second pass, if a register is needed
when all available registers are used, some of
the used registers must be spilled
83Interference Graph
- For each procedure, a register-interference graph
is constructed - The nodes in the graph are symbolic registers
- An edge connects two nodes if one is live at a
point where the other is defined
84K-Colorable Graphs
- A graph is said to be k-colorable if each node
can be assigned one of the k colors such that no
two adjacent nodes have the same color - A color represents a register
- The problem of determining whether a graph is
k-colorable is NP-complete
85A Graph Coloring Algorithm
- Remove a node n and its edges if it has fewer
than k neighbors - Repeat the removing step above until we end up
with the empty graph or a graph in which each
node has k or more adjacent nodes - In the latter case, a node is selected and
spilled by deleting that node and its edges, and
the removing step above continues
86A Graph Coloring Algorithm
- The nodes in the graph can be colored in the
reverse order in which they are removed - Each node can be assigned a color not assigned to
any of its neighbors - Spilled nodes can be assigned any color
87An Example
88An Example
89Peephole Optimization
- Improve the performance of the target program by
examining and transforming a short sequence of
target instructions - May need repeated passes over the code
- Can also be applied directly after intermediate
code generation
90Examples
- Redundant loads and stores MOV R0, a MOV a, Ro
- Algebraic Simplification x x 0 x
x 1 - Constant folding x 2 3 x 5 y
x 3 y 8
91Examples
- Unreachable code define debug 0 if (debug)
(print debugging information) if 0 ltgt 1
goto L1 print debugging
informationL1 if 1 goto L1 print
debugging informationL1
92Examples
- Flow-of-control optimization goto L1 goto
L2 L1 goto L2 L2 goto L2 goto
L1 if a lt b goto L2 goto L3L1 if a
lt b goto L2 L3 L3
93Examples
- Reduction in strength replace expensive
operations by cheaper ones - x2 ? x x
- fixed-point multiplication and division by a
power of 2 ? shift - floating-point division by a constant ?
floating-point multiplication by a constant
94Examples
- Use of machine Idioms hardware instructions for
certain specific operations - auto-increment and auto-decrement addressing mode
(push or pop stack in parameter passing)