Code Generation - PowerPoint PPT Presentation

About This Presentation

Code Generation


Code Generation Beyond basic blocks: data flow analysis Basic blocks are nodes in the flow graph Can compute global properties of program as iterative algorithms on ... – PowerPoint PPT presentation

Number of Views:821
Avg rating:5.0/5.0
Slides: 103
Provided by: deep97


Transcript and Presenter's Notes

Title: Code Generation

Code Generation
Front end
Code generator
Code Optimizer
Source program
Intermediate code
Intermediate code
target program
Symbol table
Position of code generator
  • The target program generated must preserve the
    semantic meaning of the source program and be of
    high quality.
  • It must make effective use of the available
    resources of the target machine.
  • The code generator itself must run efficiently.

A code generator has 3 primary tasks
  • Instruction Selection
  • Register Allocation and Assignment
  • Instruction Ordering

Issues in the Design of a Code Generator
  • The most important criteria for the code gen is
    that it produces correct codes.
  • Depend on
  • Input to the code gen(IR)
  • Target program(language)
  • Operating System
  • Memory management
  • Instruction Selection
  • Register allocation and assignment
  • Evaluation order

Issues in the Design of a Code Generator1)Input
to the Code Generator
  • We assume, front end has
  • Scanned, parsed and translate the source program
    into a reasonably detailed intermediate
  • Type checking, type conversion and obvious
    semantic errors have already been detected
  • Symbol table is able to provide run-time address
    of the data objects
  • Intermediate representations may be
  • Three address representation quadruples,
    triples, indirect triples.
  • Linear representation -Postfix notations
  • Virtual machine representation bytecode, Stack
    machine code
  • Graphical representation - Syntax tree, DAG

Issues in the Design of a Code Generator2)Target
  • The instruction-set architecture of the target
    machine has a significant impact on the
    difficulty of constructing a good code generator
    that produces high-quality machine code.
  • The most common target-machine architectures are
  • RISC
  • CISC
  • Stack based.

Issues in the Design of a Code Generator2)Target
Programs(2)RISC machine
  • many registers,
  • three-address instructions,
  • simple addressing modes,
  • and a relatively simple instruction-set

Issues in the Design of a Code Generator2)Target
Programs(3)CISC machine
  • few registers,
  • two-address instructions,
  • a variety of addressing modes,
  • several register classes,
  • variable-length instructions,
  • and instructions with side effects.

Issues in the Design of a Code Generator2)Target
Programs(4)Stack-based machine
  • Operations are done by pushing operands onto a
    stack and then performing the operations on the
    operands at the top of the stack.
  • To achieve high performance the top of the stack
    is typically kept in registers.
  • Stack-based machines almost disappeared because
    it was felt that the stack organization was too
    limiting and required too many swap and copy

Issues in the Design of a Code Generator2)Target
Programs(5)Stack-based machine
  • Stack-based architectures were revived with the
    introduction of the Java Virtual Machine (JVM)
  • The JVM is a software interpreter for Java
    bytecodes, an intermediate language produced by
    Java compilers.
  • The interpreter provides software compatibility
    across multiple platforms.
  • (just-in-time) JIT compilers translate bytecodes
    during run time to the native hardware
    instruction set of the target machine.

Issues in the Design of a Code Generator2)Target
  • The output of the code generator is the target
  • Target program may be
  • Absolute machine language
  • It can be placed in a fixed location of memory
    and immediately executed
  • Re-locatable machine language
  • Subprograms to be compiled separately
  • A set of re-locatable object modules can be
    linked together and loaded for execution by a
  • Assembly language
  • Easier

Issues in the Design of a Code Generator2)Target
Programs(7)Assumptions in this chapter
  • very simple RISC machine
  • CISC-like addressing modes(few)
  • For readability, assembly code as the target

Issues in the Design of a Code Generator3)
Instruction Selection(1)
  • The code generator must map the IR program into a
    code sequence that can be executed by the target
  • The complexity of performing this mapping is
    determined by a factors such as-
  • the level of the IR
  • the nature of the instruction-set
  • the desired quality of the generated code.

Issues in the Design of a Code Generator3)
Instruction Selection(2)1) Level of the IR
  • If the IR is high level-
  • often produces poor code that needs further
  • If the IR is low level-
  • generate more efficient code sequences.

Issues in the Design of a Code Generator3)
Instruction Selection(3)2) Nature of the
instruction-set architecture
  • For example, the uniformity and completeness of
    the instruction set are important factors.
  • If the target machine does not support each data
    type in a uniform manner, then each exception to
    the general rule requires special handling.
  • On some machines, for example, floating-point
    operations are done using separate registers.

Issues in the Design of a Code Generator3)
Instruction Selection(4)3)The quality of the
generated code
  • is determined by its speed and size.
  • Say for 3-addr stat a a 1 the translated
    sequence is
  • LD R0,a
  • Add R0,R0,1
  • ST a,R0
  • Instood ,if the target machine has increment
    instruction (INC), then it would be more
  • we can write inc a
  • We need to know instruction costs in order to
    design good code sequences
  • But ,accurate cost information is often difficult
    to obtain.

Issues in the Design of a Code Generator3)
Instruction Selection(5) 4) Instruction speeds
and machine idioms
  • For example, every three-address statement of the
    form x y z, where x, y, and z are statically
    allocated, can be translated as
  • LD RO, y
  • ADD RO, RO, z
  • ST x, RO
  • This strategy often produces redundant loads and

Issues in the Design of a Code Generator3)
Instruction Selection(6) 4) Instruction speeds
and machine idioms(2)
  • For example, the sequence of three-address
  • a b c
  • d a e
  • can be translated as
  • LD RO, b
  • ADD RO, RO, c
  • ST a, RO
  • LD RO, a
  • ADD RO, RO, e
  • ST d, RO
  • Here, the fourth statement is redundant since it
    loads a value that has just been stored, and
  • so is the third if a is not subsequently used.

Issues in the Design of a Code Generator4)
Register allocation (1)
  • A key problem in code generation is deciding what
    values to hold in what registers.
  • Instructions involving
  • register operands - are usually shorter and
  • Memory operands -larger and comparatively slow.
  • Efficient utilization of register is particularly
    important in code generation.
  • The use of register is subdivided into two sub
  • register allocation- during which we select the
    set of variables that will reside in register at
    a point in the program.
  • register assignment- during which we pick the
    specific register that a variable will reside in.

Issues in the Design of a Code Generator4)
Register allocation (2)
  • For example certain machines require
    register-pairs for some operands and results.
  • M x, y multiplication instruction
  • where x, the multiplicand, is the even register
    of an even/odd register pair and
  • y, the multiplier, is the odd register.
  • The product occupies the entire even/odd register

Issues in the Design of a Code Generator4)
Register allocation (3)
  • D x, y the division instruction
  • where the dividend occupies an even/odd register
    pair whose even register is x
  • the divisor is y.
  • After division, the even register holds the
    remainder and the odd register the quotient.

Issues in the Design of a Code Generator4)
Register allocation (4)
  • Now, consider the two three-address code
    sequences in which the only difference in the
    second statement
  • t a b t a b
  • t t c t t c
  • t t / d t t / d
  • (a) (b)

Issues in the Design of a Code Generator4)
Register allocation (5)
  • The shortest assembly-code sequences for (a) and
    (b) are
  • L R1,a L R0, a
  • A R1,b A R0, b
  • M R0,c A R0, c
  • D R0,d SRDA R0, 32
  • ST R1,t D R0, d
  • ST R1, t
  • (a)
  • Where SRDA stands for Shift-Right-Double-Arithmeti
    c and
  • SRDA RO, 32 shifts the dividend into Rl and
    clears RO so all bits equal its sign bit.

Issues in the Design of a Code Generator5)
Evaluation order
  • It affects the efficiency of the target code.
  • Some computation orders require fewer registers
    to hold intermediate results than others.
  • Picking a best order in the general case is a
    difficult NP-complete problem.
  • Initially, we shall avoid the problem by
    generating code for the three-address statements
    in the order in which they have been produced by
    the intermediate code generator.

The Target Language
  • The target machine and its instruction set is a
    prerequisite for designing a good code generator.
  • In this chapter, we shall use as a target
    language assembly code for a simple computer that
    is representative of many register machines.

A Simple Target Machine Model
  • It is a three-address machine with load and store
    operations, computation operations, jump
    operations, and conditional jumps.
  • Is a byte-addressable machine with n
    general-purpose registers, R0,R1,... ,Rn - 1.
  • Very limited set of instructions
  • Assume that all operands are integers.
  • A label may precede an instruction.
  • Most instructions consists of an operator,
    followed by a target, followed by a list of
    source operands.

  • We assume the following kinds of instructions are
  • Load operations assignment dst addr
  • LD dst, addr
  • LD r, x
  • LD r1,r2
  • Store operationsassignment x r
  • ST x, r

  • Computation operationsOP dst, src1,src2,
  • SUB r1,r2,r3 r1 r2 - r3
  • Unconditional jumps
  • BR L
  • Conditional jumps
  • Bcond r, L,
  • where r is a register, L is a label
  • BLTZ r, L

Assume target machine has a variety of addressing
  • Memory to-memory-
  • Indexed addressing - useful in accessing arrays
  • of the form a(r), where a is a variable and r is
    a register.
  • For example, the instruction LD Rl, a(R2)
  • Rl contents (a contents (R2))

  • 3. Register indexed addressing -useful for
  • For example, LD Rl, 100(R2)
  • Rl contents(100 contents(R2))
  • 4. Indirect addressing -
  • r means the memory location found in the
    location represented by the contents of register
    r and
  • 100(r) means the memory location found in the
    location obtained by adding 100 to the contents
    of r.
  • For example, LD Rl, 100(R2)
  • Rl contents(contents(100 contents(R2)))

  • 5. Immediate addressing
  • The constant is prefixed by .
  • LD Rl, 100 loads the integer 100 into register
  • ADD Rl, Rl, 100 adds the integer 100
    into register Rl. '

  • The three-address statement x y - z can be
    implemented by the machine instructions
  • LD Rl, y
  • LD R2, z
  • SUB Rl, Rl, R2
  • ST x, Rl

  • Suppose a is an array whose elements are 8-byte
    values, perhaps real numbers.
  • Also assume elements of a are indexed starting at
  • three-address instruction b a i by the
    machine instructions
  • LD Rl, i // Rl i
  • MUL Rl, Rl, 8 // Rl Rl 8
  • LD R2, a(Rl) // R2 contents(a contents(Rl))
  • ST b, R2 // b R2

  • the assignment into the array a represented by
    three-address instruction
  • a j c is implemented by
  • LD Rl, c // Rl c
  • LD R2, j // R2 j
  • MUL R2, R2, 8 // R2 R2 8
  • ST a(R2), Rl // contents(a contents(R2)) Rl

  • the three-address statement
  • x p, we can use machine instructions like
  • LD Rl, p // Rl p
  • LD R2, 0(R1) // R2 contents(0 contents(Rl))
  • ST x, R2 // x R2

  • The assignment through a pointer p y is
    similarly implemented in machine code by
  • LD Rl, p // Rl p
  • LD R2, y // R2 y
  • ST 0(R1), R2 // contents(0 contents(Rl)) R2

  • a conditional-jump three-address instruction like
  • if x lt y goto L
  • The machine-code equivalent would be something
  • LD Rl, x
  • LD R2, y
  • SUB Rl, R l , R2
  • BLTZ R l , M

Generate code for the following three-address
statements assuminga and b are arrays whose
elements are 4-byte values.
  • x a i
  • y b j
  • a i y
  • b j x

Program and Instruction Costs
  • A cost with compiling and running a program.
  • some common cost measures are
  • the length of compilation time and the size,
  • running time and power consumption of the target
  • Determining the actual cost of compiling and
    running a program is a complex problem.

  • Finding an optimal target program for a given
    source program is an undecidable problem
  • Many of the subproblems involved are NP-hard.
  • In code generation we must often be content with
    heuristic techniques that produce good but not
    necessarily optimal target programs.

  • Assume each target-language instruction has an
    associated cost.
  • For simplicity, we take the cost of an
    instruction to be one plus the costs associated
    with the addressing modes of the operands.
  • This cost corresponds to the length in words of
    the instruction.

  • Addressing modes involving
  • registers have zero additional cost,
  • memory location or constant in them have an
    additional cost of one,
  • Some examples
  • LD RO, Rl cost1
  • LD RO, M cost2
  • LD Rl, 100(R2) cost 3

  • the cost of a target-language program on a given
    input is the sum of costs of the individual
    instructions executed when the program is run on
    that input.
  • Good code-generation algorithms seek to minimize
    the sum of the costs of the instructions executed
    by the generated target program on typical inputs.

  • Determine the costs of the following instruction
  • LD RO, y
  • LD Rl, z
  • ADD RO, RO, Rl
  • ST x, RO

Basic Blocks and Flow Graphs
Basic Block A basic block is a sequence of
consecutive statements in which flow of control
enters at the beginning and leaves at the end
without halt or possibly of the branching except
at the end.
  • Flow Graph A graph representation of three
    address statements, called flow graph.
  • Nodes in the flow graph represent computations.
  • Edges represent the flow of control.
  • Used to do better job of register allocation and
    instruction selection.

Basic Blocks (2)
  • Algorithm Partitioning three address
    instructions into basic blocks
  • Input a sequence of three address instructions.
  • Output a list of basic block for that sequence
    in which each instruction is assigned to exactly
    one basic block.
  • Method
  • We first determine the leader(first instruction
    in some basic block)
  • 1) The first instruction is a leader
  • 2) Any instruction that is the target of a
    conditional or unconditional goto is a leader
  • 3) Any instruction that immediately follows a
    goto or unconditional goto instruction is a
  • For each leader, its basic block consists of the
    leader and all the instructions up to but not
    including the next leader or the end of the

Basic Blocks(3)
  • Example Consider the source code where 10 x 10
    matrix a is converted into an identity matrix.
  • for i from 1 to 10 do
  • for j from 1 to 10 do
  • ai,j) 0.0
  • for i from 1 to 10 do
  • ai, i 1.0
  • In generating the intermediate code, we have
    assumed that the real-valued array elements take
    8 bytes each, and that the matrix a is stored in
    row-major form.

Intermediate code to set a 10 x 10 matrix to an
identity matrix
  • 1) i 1
  • 2) j 1
  • 3) t l 10 i
  • 4) t 2 t l j
  • 5) t 3 8 t2
  • 6) t 4 t3 - 88
  • 7) a t 4 0.0
  • 8) j j 1
  • 9) i f j lt 10 goto (3)
  • 10 ) i i 1
  • 11) i f i lt 10 goto (2)
  • 12) i 1
  • 13) t 5 i - 1
  • 14) t 6 88 t5
  • 15) a t 6 1.0
  • 16) i i 1
  • 17) i f i lt 10 goto (13)

Basic Blocks (5)
  • The leaders are instructions-
  • 1) By rule 1 of the algorithm
  • 2) By rule 2 of the algorithm
  • 3) By rule 2 of the algorithm
  • 10) By rule 3 of the algorithm
  • 12) By rule 3 of the algorithm
  • 13) By rule 2 of the algorithm
  • We conclude that the leaders are instructions 1,
    2, 3, 10, 12, and 13.

Basic Blocks(6)
  • The basic block of each leader contains all the
    instructions from itself until just before the
    next leader.
  • Thus,the basic block 1 is just having 1)
  • the basic block 2 is having 2)
  • the basic block 3 is having 3) to 9)
  • the basic block 4 is having 10) to 11)
  • the basic block 5 is having 12)
  • the basic block 6 is having 13) to 17)

Flow Graphs
  • Once an intermediate-code program is partitioned
    into basic blocks, we represent the flow of
    control between them by a flow graph.
  • The nodes of the flow graph are the basic blocks.
  • we add two nodes, called the entry and exit, that
    do not correspond to executable intermediate
  • There is an edge from the entry to the first
    executable node of the flow graph, that is, to
    the basic block that comes from the first
    instruction of the intermediate code.
  • There is an edge to the exit from any basic
    block that contains an instruction that could be
    the last executed instruction of the program.

Flow Graphs(2)
Representation of Flow Graphs
  • Flow graphs, being quite ordinary graphs, can be
    represented by any of the data structures
    appropriate for graphs.
  • It is likely to be more efficient to create a
    linked list of instructions for each basic block.

  • Every program spends most of its time in
    executing its loops, it is especially important
    for a compiler to generate good code for loops.
  • Many code transformations depend upon the
    identification of "loops" in a flow graph.

  • We say that a set of nodes L in a flow graph is a
    loop if
  • 1. There is a node in L called the loop entry
    with the property that no other node in L has a
    predecessor outside L. That is, every path from
    the entry of the entire flow graph to any node in
    L goes through the loop entry.
  • 2. Every node in L has a nonempty path,
    completely within L, to the entry of L.

  • Example The flow graph has three loops
  • 1. B3 by itself.
  • 2. B6 by itself.
  • 3. B2, B3, B4.

Flow Graphs(3)
  • The successor of B1 is B2.
  • The successor of B3 is B3 and B4.
  • The successor of B4 is B2,B3,B4 and B5.
  • The successor of B5 is B6.

Next-Use Information
  • If the value of a variable that is currently in a
    register will never be referenced subsequently,
    then that register can be assigned to another
  • Suppose three-address statement i assigns a value
    to x. If statement j has x as an operand, and
    control can flow from statement i to j along a
    path that has no intervening assignments to x,
    then we say statement j uses the value of x
    computed at statement i. We further say that x is
    live at statement i.
  • We wish to determine for each three-address
    statement x y z what the next uses of x, y,
    and z are.

Next-Use InformationAlgorithm to determining
the liveness and next-use information for each
statement in a basic block.
  • INPUT A basic block B of three-address
    statements. We assume that the
  • symbol table initially shows all nontemporary
    variables in B as being live on exit.
  • OUTPUT At each statement i x y z in B, we
    attach to i the liveness and
  • next-use information of x, y, and z.
  • METHOD We start at the last statement in B and
    scan backwards to the
  • beginning of B. At each statement i x y z in
    B, we do the following
  • 1. Attach to statement i the information
    currently found in the symbol table regarding the
    next use and liveness of x, y, and y.
  • 2. In the symbol table, set x to "not live" and
    "no next use."
  • 3. In the symbol table, set y and z to "live" and
    the next uses of y and z to i.

Next-Use Information
  • Here we have used as a symbol representing any
    operator. If the three-address statement i is of
    the form x y or x y, the steps are the same
    as above, ignoring z.
  • Note that the order of steps (2) and (3) may not
    be interchanged because x may be y or z.
  • For example -quadruple i x y op z
  • Record next uses of x, y ,z into quadruple
  • Mark x dead (previous value has no next use)
  • Next use of y is i next use of z is i y, z
    are live

Transformation on Basic Block
  • A basic block computes a set of expressions.
  • Transformations are useful for improving the
    quality of code.
  • Two important classes of local optimizations that
    can be applied to a basic blocks
  • Structure Preserving Transformations
  • Algebraic Transformations

The DAG Representation of Basic Blocks
  • Many important techniques for local optimization
    begin by transforming a basic block into a DAG
    (directed acyclic graph).
  • Construction of a DAG for a basic block is as
  • There is a node in the DAG for each of the
    initial values of the variables appearing in the
    basic block.
  • 2. There is a node N associated with each
    statement s within the block. The children of N
    are those nodes corresponding to statements that
    are the last definitions, prior to s, of the
    operands used by s.

  • 3. Node N is labeled by the operator applied at
    s, and also attached to N is the list of
    variables for which it is the last definition
    within the block.
  • 4. Certain nodes are designated output nodes.
    These are the nodes whose variables are live on
    exit from the block that is, their values may be
    used later, in another block of the flow graph.
    Calculation of these "live variables" is a matter
    for global flow analysis.

  • The DAG representation of a basic block lets us
    perform several code improving transformations on
    the code represented by the block.
  • a) We can eliminate local common subexpressions,
    that is, instructions that compute a value that
    has already been computed.
  • b) We can eliminate dead code, that is,
    instructions that compute a value that is never
  • c) We can reorder statements that do not depend
    on one another such reordering may reduce the
    time a temporary value needs to be preserved in a
  • d) We can apply algebraic laws to reorder
    operands of three-address instructions, and
    sometimes thereby simplify the computation.

Finding Local Common Subexpressions
  • Common subexpressions can be detected by using
    "value-number" method.
  • As a new node M is about to be added, whether
    there is an existing node N with the same
    children, in the same order, and with the same
  • If so, N computes the same value as M and may be
    used in its place.
  • Consider a block a b c
  • b a - d
  • c b c
  • d a - d

The DAG for the basic block is
  • the node corresponding to the fourth statement d
    a - d has the operator - and the nodes with
    attached variables a and do as children.
  • Since the operator and the children are the same
    as those for the node corresponding to statement
    two, we do not create this node, but add d to the
    list of definitions for the node labeled .
  • In fact, if b is not live on exit from the block,
    then we do not need to compute that variable, and
    can use d to receive the value represented by the
    node labeled .

  • The block then become
  • a b c
  • d a - d
  • c d c
  • However, if both b and d are live on exit, then a
    fourth statement must be used to copy the value
    from one to the other.

  • a b c
  • b b - d
  • c c d
  • e b c

  • When we look for common subexpressions we really
    are looking for expressions that are guaranteed
    to compute the same value, no matter how that
    value is computed.
  • Thus, the DAG method will miss the fact that the
    expression computed by the first and fourth
    statements in the sequence is the same b0c0.

  • That is, even though b and c both change between
    the first and last statements, their sum remains
    the same, because b c (b - d) (c d).
  • The DAG does not exhibit any common
  • However, algebraic identities applied
  • to the DAG, may expose the equivalence.

Dead Code Elimination
  • Delete from a DAG any root (node with no
    ancestors) that has no live variables attached.
  • Repeated application of this transformation will
    remove all nodes from the DAG that correspond to
    dead code.
  • Example

  • In the above DAG ,a and b are live but c and e
    are not, we can immediately remove the root
    labeled e.
  • Then, the node labeled c becomes a root and can
    be removed. The roots labeled a and b remain,
    since they each have live variables attached.

Structure Preserving Transformations
  • Dead Code Elemination
  • Renaming Temporary Variables
  • say, t bc where t is a temporary var.
  • If we change u bc, then change all instances
    of t to u.
  • Interchange of Statements
  • t1 b c
  • t2 x y
  • We can interchange iff neither x nor y is t1 and
    neither b nor c is t2

Say, x is dead, that is never subsequently used,
at the point where the statement x y z
appears in a block. We can safely remove x
Algebraic Transformations
  • Replace expensive expressions by cheaper one
  • X X 0 eliminate
  • X X 1 eliminate
  • X y2 (why expensive? Answer Normally
    implemented by function call)
  • by X y y
  • Flow graph
  • We can add flow of control information to the set
    of basic blocks making up a program by
    constructing directed graph called flow graph.
  • There is a directed edge from block B1 to block
    B2 if
  • There is conditional or unconditional jump from
    the last statement of B1 to the first statement
    of B2 or
  • B2 is immediately follows B1 in the order of the
    program, and B1 does not end in an unconditional

  • A loop is a collection of nodes in a flow graph
    such that
  • All nodes in the collection are strongly
    connected, that is from any node in the loop to
    any other, there is a path of length one or more,
    wholly within the loop, and
  • The collection of nodes has a unique entry, that
    is, a node in the loop such that, the only way to
    reach a node from a node out side the loop is to
    first go through the entry.

The DAG representation of Basic Block
  • 1 A 4i
  • 2 B aA
  • 3 C 4i
  • 4 D bC
  • 5 E B D
  • 6 F prod E
  • 7 Prod F
  • 8 G i 1
  • 9 i G
  • 10 if I lt 20 goto (1)



  • given the code fragment
  • draw the dependency graph before and after
    common subexpression elimination.

x aa 2ab bb y aa 2ab bb
  • dependency graph before CSE

  • dependency graph after CSE


  • dependency graph after CSE

Better code generation requires greater context
  • Over expressions
  • optimal ordering of subtrees
  • Over basic blocks
  • Common subexpression elimination
  • Register tracking with last-use information
  • Over procedures
  • global register allocation, register coloring
  • Over the program
  • Interprocedural flow analysis

Basic blocks
  • Better code generation requires information about
    points of definition and points of use of
  • In the presence of flow of control, value of a
    variable can depend on multiple points in the
  • y 12
  • x y 2 -- here x
  • label1
  • x y 2 -- 24?
    Cant tell, y may be different
  • A basic block is a single-entry, single-exit
    code fragment values that are computed within a
    basic block have a single origin more constant
    folding and common subexpression elimination,
    better register use.

Finding basic blocks
  • To partition a program into basic blocks
  • Call the first instruction (quadruple) in a basic
    block its leader
  • The first instruction in the program is a leader
  • Any instruction that is the target of a jump is a
  • Any instruction that follows a jump is a leader
  • In the presence of procedures with side-effects,
    every procedure call ends a basic block
  • A basic block includes the leader and all
    instructions that follow, up to but not including
    the next leader

Transformations on basic blocks
  • Common subexpression elimination recognize
    redundant computations, replace with single
  • Dead-code elimination recognize computations not
    used subsequently, remove quadruples
  • Interchange statements, for better scheduling
  • Renaming of temporaries, for better register
  • All of the above require symbolic execution of
    the basic block, to obtain definition/use

Simple symbolic interpretation next-use
  • If x is computed in quadruple i, and is an
    operand of quadruple j, j gt i, its value must be
    preserved (register or memory) until j.
  • If x is computed at k, k gt i, the value computed
    at i has no further use, and be discarded (i.e.
    register reused)
  • Next-use information is annotation over
    quadruples and symbol table.
  • Computed on one backwards pass over quadruple.

Computing next-use
  • Use symbol table to annotate status of variables
  • Each operand in a quadruple carries additional
  • Operand liveness (boolean)
  • Operand next use (later quadruple)
  • On exit from block, all temporaries are dead (no
  • For quadruple q x y op z
  • Record next uses of x, y ,z into quadruple
  • Mark x dead (previous value has no next use)
  • Next use of y is q next use of z is q y, z
    are live

Register allocation over basic block tracking
  • Goal is to minimize use of registers and memory
  • Doubly linked data structure
  • For each register, indicate current contents (set
    of variables) register descriptor.
  • For each variable, indicate location of current
    value memory and/or registers address
  • Procedure getreg determines optimal choice to
    hold result of next quadruple

Getreg heuristics
  • For quadruple x y op z
  • if y is in Ri, Ri contains no other variable, y
    is not live, and there is no next use of y, use
  • Else if there is an available register Rj, use it
  • Else if there is a register Rk that holds a dead
    variable, use it
  • If y is in Ri, Ri contains no other variable, and
    y is also in memory, use Ri.
  • Else find a register that holds a live variable,
    store variable in memory (spill), and use
  • Choose variable whose next use is farthest away

Using getreg
  • For x y op z
  • Call getreg to obtain target register R
  • Find current location of y, generate load into
    register if in memory, update address descriptor
    for y
  • Ditto for z
  • Emit instruction
  • Update register descriptor for R, to indicate it
    holds x
  • Update address descriptor for x to indicate it
    resides in R
  • For x y
  • Single load, register descriptor indicates that
    both x and y are in R.
  • On block exit, store registers that contain live

Computing dependencies in a basic block the dag
  • Use directed acyclic graph (dag) to recognize
    common subexpressions and remove redundant
  • Intermediate code optimization
  • basic block gt dag gt improved block gt assembly
  • Leaves are labeled with identifiers and
  • Internal nodes are labeled with operators and

Dag construction
  • Forward pass over basic block
  • For x y op z
  • Find node labeled y, or create one
  • Find node labeled z, or create one
  • Create new node for op, or find an existing one
    with descendants y, z (need hash scheme)
  • Add x to list of labels for new node
  • Remove label x from node on which it appeared
  • For x y
  • Add x to list of labels of node which currently
    holds y

Example dot product
  • prod 0
  • for j in 1 .. 20 loop
  • prod prod a (j) b
    (j) -- assume 4-byte integer
  • end loop
  • Quadruples
  • prod 0
    -- basic block leader
  • J 1
  • start T1 4 j
    -- basic block leader
  • T2 a (T1)
  • T3 4 j
    -- redundant
  • T4 b (T3)
  • T5 T2 T4
  • T6 prod T5
  • prod T6
  • T7 j 1
  • j T7
  • If j lt 20 goto

Dag for body of loop
  • Common subexpression identified

T6, prod


T7, i



T1, T3

From dag to improved block
  • Any topological sort of the dag is a legal
    evaluation order
  • A node without a label is a dead value
  • Choose the label of a live variable over a
  • start T1 4 j
  • T2 a T1
  • T4 b T1
  • T5 T2 T4
  • prod prod T5
  • J J 1
  • If j lt20 goto start
  • Fewer quadruples, fewer temporaries

Programmers dont produce common subexpressions,
code generators do!
  • A, B matrix (lo1 .. hi1, lo2 ..
    hi2) -- component size w bytes
  • A (j, k) is at location
  • base_a ((j lo1) (hi2 lo2 1) k
    lo2) w
  • The following requires 19 quadruples
  • for k in lo .. hi loop
  • A ( j, k) 1 B (j, k)
  • end loop
  • Can reduce to 11 with a dag
  • base_a (j lo1) (hi2 lo2 1) w is loop
    invariant ( loop optimization)
  • w is often a power of two (peephole optimization)

Beyond basic blocks data flow analysis
  • Basic blocks are nodes in the flow graph
  • Can compute global properties of program as
    iterative algorithms on graph
  • Constant folding
  • Common subexpression elimination
  • Live-dead analysis
  • Loop invariant computations
  • Requires complex data structures and algorithms

Using global information register coloring
  • Optimal use of registers in subprogram keep all
    variables in registers throughout
  • To reuse registers, need to know lifetime of
    variable (set of instructions in program)
  • Two variables cannot be assigned the same
    register if their lifetimes overlap
  • Lifetime information is translated into
    interference graph
  • Each variable is a node in a graph
  • There is an edge between two nodes if the
    lifetimes of the corresponding variables overlap
  • Register assignment is equivalent to graph

Graph coloring
  • Given a graph and a set of N colors, assign a
    color to each vertex so two vertices connected by
    an edge have different colors
  • Problem is NP-complete
  • Fast heuristic algorithm (Chaitin) is usually
  • Any node with fewer than N -1 neighbors is
    colorable, so can be deleted from graph. Start
    with node with smallest number of neighbors.
  • Iterate until graph is empty, then assign colors
    in inverse order
  • If at any point a node has more that N -1
    neighbors, need to free a register (spill). Can
    then remove node and continue.

  • F A B F A
  • D E C D E C
  • Order of removal B, C, A, E, F, D
  • Assume 3 colors are available assign colors in
    reverse order, constrained by already colored
  • D (no constraint) F (D) E (D) A (F, E) C (D, A )
    B (A, C)

Better approach to spilling
  • Compute required number of colors in second pass
  • Need to place R N variables in memory
  • Spill variables with lowest usage count.
  • Use loop structure to estimate usage.
Write a Comment
User Comments (0)