Title: Compiler Design Chapter 7
1Compiler Design - Chapter 7
Translation to Intermediate Code
2Intermediate representation (IR)
- Abstract syntax tree
- Closer to source code (language)
- Intermediate representation (IR)
- Independent of the details of source language
- Closer to machine code
- express the machine operations without too much
machine-specific details - IR is easier to apply optimizations to
- IR is simpler than real machine code
- Separation of front-end and back-end
- Figure 7.1
3Intermediate Representation (IR) Trees
- To do
- translate abstract syntax tree to an intermediate
tree - First
- define semantics of the intermediate tree
- Figure 7.2
- IR trees for MiniJava
4IR Trees Expressions (Tree.Exp)
CONST
Integer constant i
i
NAME
Symbolic constant n
n
TEMP
Temporary t - a register
t
MEM
Contents of a word of memory starting at m
m
5IR Trees Expressions (Tree.Exp)
BINOP
e1 op e2 - Binary operator op applied to e1 and e2
op
e2
e1
Procedure callevaluate f then the arguments in
order
CALL
f
(e1.en)
ESEQ
Evaluate s for side effects then e for the result
s
e
6IR Trees Statements (Tree.Stm)
MOVE
Evaluate e then move the result to temporary t
TEMP
t
e
MOVE
Evaluate e1 giving address a, then evaluate e2
and move the move the result to address a
MEM
e2
e1
EXP
Evaluate e then discard the result
e
7IR Trees - Statements
JUMP
Transfer control to address e labels l1..ln are
possible values for e
e
(l1.ln)
Evaluate e1 then e2 compare the results using
relational operator op jump to t if true, f if
false
CJUMP
e1
op
e2
e
t
f
SEQ
The statement S1 followed by statement s2
s1
s2
Define constant value of name n as current code
address NAME(n) can be used as target of jumps,
calls, etc.
LABEL
n
8Kinds of Expressions
- Translation among
- Expression, statement, condition
- Examples
- a3(b3)
- In C, assignment can be used as an expression
- A(agtb)
- if (a/2)
-
- else
-
-
9Translate.Exp
- Expression kinds indicate how expression might
be used - Ex(exp) expressions that compute a value
- Nx(stm) statements expressions that compute no
value - Cx conditionals (jump to true and false
destinations)
10Kinds of Expressions
- Conversion operators allow use of one form in
context of another - unEx convert to tree expression that computes
value of inner tree - unNx convert to tree statement that computes
inner tree but returns no value - unCx(t, f) convert to statement that evaluates
inner tree and branches to true destination if
non-zero, false destination otherwise
11Example
12Ex Class
- To translate ordinary expression to statement,
and conditional
EXP(e) Evaluate e then discard the result
13Cx Class
- To translate Conditional to expression,
statement, and conditional
14unEX() method of Cx Class
- Convert a conditional into a value expression
(Program 7.3)
15Simple Variables
- Simple variable v in the current procedures
stack frame - later becomes
MEM
TEMP fp
CONST k
16Simple Variables
- Declaration of a variable is translated into
assuming Frame is a class holding all machine
related information
FP() is the frame pointer wordsize() is size of
the word (used by all MiniJava variables)
17Array Variables
- In Pascal, array variable stands for contents.
- var a,b array1..12 of integer
- begin
- a b //copies the contents of a into b
- end
- In C, array variable stands for pointer.
- illegal legal
- int a12, b12 int a12, b
- b a b a
18Array Variables
- In MiniJava, array variables behave like pointers
- int a, b
- a new int12
- b new int12
- a b // legal but original 12 zeros
// allocated for a are discarded
- MiniJava objects are also pointers
- declaration is translated into memory allocation
(for addresses) - assignment just moves the address value
19Structured L-Values (I)
- An l-value is the result of an expression that
can occur on the left of an assignment statement - x, p.y, ai2
- An r-value is one that can only appear on the
right of an assignment - a3, f(x)
- An l-value denotes a location, but an r-value
does not
20Structured L-Values (II)
- Scalar
- only occupies one word
- can be hold in a register
- integer or pointer (in MiniJava)
- All variables and l-values are scalar in MiniJava
- It is different in C or Pascal
- A struct in C can be l-value (not scalar)
- An array and record in Pascal can be l-value (not
scalar) - MEM() node needs to know the size information
(not only the address)
21Subscripting and Field Selection
(i l ) x s a l lower bound of the
index range s size of the array elements a
base address
- a s x l can be calculated at compile time
- if a is global with a compile-time constant
address
- a.f select field f of a record a
- add the constant field offset of f to the address
a.
22Subscripting and Field Selection
where e is a pointer, p MEM(e)
represents the address of a, CONST(w)
is the word size (piw)
- In IR Tree, MEM() is both meant to be store
(when used as left child of MOVE) or fetch
(when used elsewhere)
23Arithmetic
- The Tree language does not have unary arithmetic
operators - -a 0-a
- not a 1 XOR a
- Tree does not support floating numbers well
(negative zero?)
24Comparisons
- Translate a cop b as
- RelCx(cop, a.unEx, b.unEx)
- When used as a conditional unCx(t,f) yields
- CJUMP(cop, a.unEx, b.unEx, t, f)
- where t and f are labels.
- When used as a value unEx yields
- ESEQ(SEQ(MOVE(TEMP r, CONST 1),
- SEQ(unCx(t, f),
- SEQ(LABEL f,
- SEQ(MOVE(TEMP r, CONST 0), LABEL t)))),
- TEMP r)
25Conditionals (I)
- The short-circuiting Boolean operators have
already been transformed into if-expressions in
MiniJava abstract syntax - e.g., x lt 5 a gt b turns into if x lt 5
then a gt b else 0 - Translate if e1 then e2 else e3 into
- IfThenElseExp(e1,e2,e3)
- When used as a value unEx yields
- ESEQ(SEQ(SEQ(e1 .unCx(t, f),SEQ(SEQ(LABEL
t,SEQ(MOVE(TEMP r, e2.unEx), JUMP join)),
SEQ(LABEL f,SEQ(MOVE(TEMP r, e3.unEx), JUMP
join)))), LABEL join), TEMP r)
26Conditionals (II)
- As a conditional unCx(t,f) yields
- SEQ(e1.unCx(tt,ff),
- SEQ(SEQ(LABEL tt, e2.unCx(t, f)),
- SEQ(LABEL ff, e3.unCx(t, f))))
27Conditionals (III)
- Create a new subclass for if statements
28Conditionals (IV)
- Applying unCx(t,f) to if xlt5 then agtb else 0
- SEQ(CJUMP(LT, x.unEx, CONST 5, tt, ff),
- SEQ(SEQ(LABEL tt, CJUMP(GT, a.unEx, b.unEx, t, f
)), - SEQ(LABEL ff, JUMP f )))
- or more optimally
- SEQ(CJUMP(LT, x.unEx, CONST 5, tt, f ),
- SEQ(LABEL tt, CJUMP(GT, a.unEx, b.uneX, t, f )))
29Strings
- Translation of string literals
- Make a label (at the declaration place)
- Then put a fragment into a global list
- Frame.string(label, literal)
- Anytime we need to refer to the literal, use the
label - All string functions are provided as runtime
functions (return an address for the return
value) - We dont worry about string literals or functions
as there is no such things in MiniJava
30Record and Array Creation
- A bunch of values referenced by a single
address (e.g. an object via object construction,
or a String literal) - Need to allocate space on a heap not on the
stack (since they may outlive the procedure in
which they are created) - We assume we have a function malloc to allocate
heap memory for us - We dont deal with garbage collection issue
- Figure 7.4
31Record and Array Creation
32Allocate for Array
- Determine how much space is needed
- (length of the array 1) x (size of integer)
- Use first cell to store the length
- Generate code that calls malloc to get heap
memory (return address r) - Generate code to save length at offset 0
- Generate code to initialize zero starting at
offset 4
33While Loops
- while c do s
- evaluate c
- if false jump to next statement after loop
- if true fall into loop body
- branch to top of loop
- e.g.,
- test
- if not(c) jump done
- s
- jump test
- done
34While Loops
- The tree produced is
- Nx(SEQ(SEQ(SEQ(LABEL test, c.unCx( body, done)),
- SEQ(SEQ(LABEL body, s.unNx), JUMP(NAME test))),
- LABEL done))
- Deal with break
- simply a JUMP to done
35For loops
i lo limit hi while (iltlimit) //body
i
36Function Call
- Translate p.m(a1, , an) to
where lcm (a label) is method m of class c, p
is the object
37Declarations
- Variable declaration
- Additional space reserved in the frame
- Function declaration
- A new fragment of Tree code will be kept for the
function body
38Variable Definition
- Declaration Add temp to frame
- Remember the temp in our symbol table
- When using it, use access to get the offset
- Initialization to zero happens before the
function body
39Function Definition
- Prologue
- Beginning of function (assembly language)
- A label for the function
- Adjust the stack point to get a new frame
- Allocate for formals
- Callee/caller-save register allocation
- Body
- Epilogue
- Move return value
- Restore registers
- Return to caller
- Assembly language stuff
40Fragments
- A function will have
- A frame
- A body
- Abstract them into a fragment to be translated
into assembly language code
41Classes and Objects
- Objects are like records
- Allocate heap space for them
- Use symbol table to remember
- Location of symbol
- Determine which method to use
- Class declarations are like record declarations
- Methods are like functions
- Frame body
- this pointer