Title: Intermediate Representations (Irs Chapter 4)
1Intermediate Representations(Irs Chapter 4)
- Mooly Sagiv
- Schrierber 313
- 03-640-7606
- http//www.math.tau.ac.il/sagiv/courses/acd.html
2Outline
- Issues in IR design
- High-Level IRs
- Medium-Level IRs
- Low-Level IRs
- Multi-Level IRs
- MIR, HIR, and LIR
- ICAN Representations
- Other IRs
- Conclusions
3Compiler Structure
String of characters
Scanner
tokens
Parser
Symbol table and access routines
OS Interface
AST
Semantic analyzer
IR
Code Generator
Object code
4Intermediate Language Selection
- Low Vs. High level control flow structures
- Flat Vs. Hierarchical (tree)
- Machine Vs. High level of instructions
- (Symbolic) Registers Vs. Stack
- Normal forms (SSA)
- Intermediate forms Control Flow Graph, Call
Graph, Program Dependence Graph - Issues Engineering, efficiency, portability,
optimization level, taste
5IRs in the Book
LIR s2? s1 s4 ? s3 s6 ?
s5 L1 if s2 gts6 goto L2 s7 ? addr a
s8 ? 4s9 s10 ? s7s8 s10 ? 2
s2 ? s2 s4 goto L1 L2
MIR v? v1 t2 ? v2 t3 ?
v3 L1 if v gtt3 goto L2 t4 ? addr a
t5 ? 4i t6 ? t4t5 t6 ? 2 v
? v t2 goto L1 L2
HIR for v ?v1 by v2 to v3 do
ai 2 endfor
6Issues in IR Design
- Portability
- Optimization level
- Complexity of the compiler
- Reuse of legacy compiler parts
- Compilation cost
- Multi vs. One IR levels
- Compiler maintenance
7ExampleMIPS Compiler
UCODE Stack Based IR
Load/Store Based Architecture
8ExampleMIPS Compiler
UCODE Stack Based IR
Medium Level IR
Load/Store Based Architecture
9ExamplePA-RISC (HP-RISC)
UCODE Stack Based IR
Load/Store Based Architecture
10ExamplePA-RISC (HP-RISC)
UCODE Stack Based IR
Translator
Very low IR (SLLIC)
Optimizer
Very low IR (SLLIC)
Code generator
Load/Store Based Architecture
11Why do we need multiple representations?
- Lower representations expose more computations
- more effective standard optimizations
- examples strength reduction, loop invariats, ...
- Higher representations provide more
non-determinism - more effective parallelization (reordering)
- data cache optimizations
12Example Arrays
LIR r1? fp-4 r2? r12
r3? fp-8 t4 ? r320 r5 ? r2r4
r6 ? 4r5 r7 ? fp-216 f7 ?
r7r6
MIR t1? j2 t2 ? i20 t3 ?
t1t2 t4 ? 4t3 t5 ? addr a
t6 ? t5t4 t7 ? t6
C-code float a2010 ... ... aij2
addr(a) 4 (i20 j 2)
HIR t ? ai, j2
13External Representation
- Internal IR representation is used in the
compiler - External representation is needed for
- Compiler debugging
- Cross-module integration
- Design issues
- Representing pointers
- Unique representation of temporaries
- Compaction
14Outline
- Issues in IR design
- High-Level IRs
- Medium-Level IRs
- Low-Level IRs
- Multi-Level IRs
- MIR, HIR, and LIR
- ICAN Representations
- Other IRs
- Conclusions
15Abstract Syntax Trees
- Compact source representation
- No punctuation symbols
- Tree defines hierarchy
- Used for Front-Ends
- Sometimes include symbol table pointers
- Can be translated into HIR
- Can be also used for compaction
16Example AST
function
body
paramlist
declist
C-CODE int f(int a, int b) int c c a
2 print(c)
paramlist
stmtList
end
ident
end
c
stmtList
call
end
const
indent
ident
arglist
a
2
print
indent
end
c
17Other HIRs
- Lambda expressions
- Normal linear forms
- Preserve control flow structures and arrays
- Simplified control flow structures
- Eliminate GOTOs
- Continuations
18Outline
- Issues in IR design
- High-Level IRs
- Medium-Level IRs
- Low-Level IRs
- Multi-Level IRs
- MIR, HIR, and LIR
- ICAN Representations
- Other IRs
- Conclusions
19Medium Level IR
- Source and target language independent
- Machine independent representation for program
variables and temporaries - Simplified control flow constructs
- Portable
- Sufficient in many optimizing compilers MIR,
Sun-IR
20Outline
- Issues in IR design
- High-Level IRs
- Medium-Level IRs
- Low-Level IRs
- Multi-Level IRs
- MIR, HIR, and LIR
- ICAN Representations
- Other IRs
- Conclusions
21Low Level IR
- One to one correspondence with machine
- Deviations from the machine
- Alternative code, e.g., MULTIPLY
- Addressing modes
- Side effects?
- Instruction selection in the last phase
- Appropriate compiler data structure can hide
dependence
22Side Effect Operations(PA-RISC)
MIR L1 t2? t1 t1 ? t14 ...
t3 ? t31 t5 ? t3 lt t4 if t5
goto L1
PA-RISC (Option 1) LDWM 4(0, r2), r3 ... ADDI
1, r4, r4 COMB, lt r4, r5, L1
PA-RISC (Option 2) LDWX r2(0, r1),
r3 ... ADDIB, lt 4, r2, r5, L1
23LIR in Tiger
/ assem.h / typedef enum I_OPER, I_LABEL,
I_MOVE AS_instr_kind struct AS_instr_
AS_instr_kind kind union struct string
assem Temp_tempList dst, src AS_targets
jumps OPER struct string assem
Temp_label label LABEL struct string
assem Temp_tempList dst, src MOVE
u
24Outline
- Issues in IR design
- High-Level IRs
- Medium-Level IRs
- Low-Level IRs
- Multi-Level IRs
- MIR, HIR, and LIR
- ICAN Representations
- Other IRs
- Conclusions
25Multi-Level Intermediate Representations
- Multiple representations in the same language
- Compromise computation exposure and high level
description - SUN-IR Arrays can be represented with multiple
subscripts - SLLLIC MULTIPLY and DEVIDE operations
26Outline
- Issues in IR design
- High-Level IRs
- Medium-Level IRs
- Low-Level IRs
- Multi-Level IRs
- MIR, HIR, and LIR
- ICAN Representations
- Other IRs
- Conclusions
27XBNF for MIR
28XBNF for Receive Instruction
29XBNF for Assignments
30XBNF for Control Flow Instructions
31XBNF for Call/Return Instruction
32XBNF for Sequence(Volatile Instructions)
33XBNF for Constants
34 XBNF for Identifiers
35Example
C-code
MIR
make_node begin receive p(val) receive
n(val) q ? call malloc, (8, int) q.next ?
nil q.value ? n p.next ? q
return end
void make_node(p, n) struct node p int
n struct node q q malloc(sizeof(struct
node)) q-gtnext nil q-gtvaluen p-gtnext q
36insert_node begin receive n(val) receive
l(val) t1 ? l.value if n lt t1 goto L1
t2 ? l.next if t2 ! nil goto L2 call
make_node, (l, type1 n, int) return L2 t4
? l.next call insert_node, (n, int, t4,
type1) return L1 return end
C-code
void insert_node( n, l) int n struct node
l if (n gt l.value) if (l-gtnext nil)
make_node(l, n) else insert_node(n,
l-gtnext)
37MIR Issues
- MIN does not usually exist
- Both value and location computation for Boolean
conditions
PA-RISC
MIR
t1 ?t2 min t3
MOVE r2, r1 COM, gt r3, r2 MOVE r3, r1
t3 ?t1ltt2 if t3 goto L1
if t1 lt t2 goto L1
38HIR
- Obtained from MIR
- Extra constructs
- Array references
- High level constructs
39XBNF for HIR
40MIR
v ?opd1 t2 ?opd2 t3 ?opd3
if t2 gt 0 goto L2 L1 if v lt t3 goto L3
instructions v ? v t2 goto L1 L2
if v gt t3 goto L3 instructions v ? v
t2 goto L2 L3
HIR
for v ?opd1 by opd2 to opd3
instructions endfor
41insert_node begin receive n(val) receive
l(val) t1 ? l.value if n gt t1 then
t2 ? l.next if t2 nil then
call make_node, (l, type1 n, int)
return else t4 ? l.next
call insert_node, (n, int, t4, type1)
return fi fi end
C-code
void insert_node( n, l) int n struct node
l if (n gt l.value) if (l-gtnext nil)
make_node(l, n) else insert_node(n,
l-gtnext)
42LIR
- Obtained from MIR
- Extra features
- Low level addressing
- Load/Store
- Eliminated constructs
- Variables
- Selectors
- Parameters
43XBNF for LIR
44XBNF for LIR (Contd.)
45insert_nodebegin s800 ?s1 s801?s2
s802?s8010if s800lts802 goto L1
s803?s8014if s803!nil goto L2 s1 ?
s801s2 ?s800 call make_node, ra return L2
s1?s800 s2 ? s8014 call insert_node,
ra return L1 return end
C-code
void insert_node( n, l) int n struct node
l if (n gt l.value) if (l-gtnext nil)
make_node(l, n) else insert_node(n,
l-gtnext)
46Outline
- Issues in IR design
- High-Level IRs
- Medium-Level IRs
- Low-Level IRs
- Multi-Level IRs
- MIR, HIR, and LIR
- ICAN Representations
- Other IRs
- Conclusions
47Representing MIR in ICAN
- An MIR program can be (internally) represented as
an abstract syntax tree - The general construction
- A (union) type for every non-terminal
- An enumerated type kind for every production
- A tuple for every production
- Other ideas
- Flatten the hierarchy in some cases
- Use functions to abstract MIR properties(simplifi
es semantic manipulations)
48ICAN Tuples for MIR Instruction (Table 4.7)
Label ltkindlabel, lblLabelgt receive
VarName(ParamType) ltkindreceive, leftVarName,
ptypeParamTypegt VarName ? Operand1 Binop
Operand2 ltkindbinasgn, left varName, opr
Binop, opd1 Operand1,
opd2Operand2gt VarName ? Unop Operand ltkindunasgn
, left VarName, opr Unop, opdOperandgt VarName
?Operand ltkindvalasgn, left VarName, opd
Operandgt ...
49IRoper enum add,
sub,
- (unary) mul,
(binary) div,
/ mod, min, max, eql, neql,
less, lseq, grtr, gteq, , !, lt, lt, gt,
gt shl, shr, shra, and, or, xor ind,
pointer-dereference indelt,
. dereference to a field
neg,
- (unary) not,
! addr, val, cast
(type
cast) ..
Table 4.6
50MIRkind enum label, receive, binasgn, unasgn,
..., sequence
Opkind enum var, const, type ExpKind enum
binexp, unexp, noexp, listexp Exp_Kind
MirKind ? ExpKind Has_Left MirKind ?
boolean Exp_Kind ltlabel, noexpgt, ltreceive,
noexpgt, ltbinassgn, binexpgt
ltunasgn, unexpgt, ...
ltcallexp, listexpgt, ...
ltsequence, noexpgt Has_Left ltlabel, falsegt,
ltreceive, truegt, ltbinasgn, truegt,
ltunasgn, truegt, ltvalasgn, truegt, ltcondasgn,
truegt ltcastasgn, truegt,
...., ltunif, falsegt, ...
51Inst array1..n of Instructions
Inst1 ltkind label, lblL1gt Inst2ltkindval
asgn, leftb,
opdltkindvar, valagtgt Inst3ltkindbinas
gn, left c, opr add,
opd1 ltkind var, val bgt,
opd2 ltkind const, val 1gtgt
MIR
L1 b ?a c ? b 1
52insert_node begin receive n(val) receive
l(val) t1 ? l.value if n lt t1 goto L1
t2 ? l.next if t2 ! nil goto L2 call
make_node, (l, type1 n, int) return L2 t4
? l.next call insert_node, (n, int, t4,
type1) return L1 return end
Fig 4.9
53Representing HIR in ICAN
- Similar to MIR (Table 4.8)
- For statement has three expressions (Figure
4.10) - Break if and for
54Representing LIR in ICAN
- Similar to MIR (Table 4.9, 4.10)
- No list expressions (Figure 4.11)
55Example (4.12, 4.13)
Inst1 ltkind label, lbl L1gt
L1 r1 ? r74 r2 ? r78 r3 ? r 1
r2 r4 ? -r3 if r3 gt 0 goto L2
r5 ? (r9) r1 r7-8(2) ? r5 L2 return r4
Inst2 ltkind loadmem, left r1,
addrltkindaddrrc,
reg r7, disp4,
len4gtgt
Inst3 ltkind loadmem, left r2,
addrltkindaddr2r,
reg r7, reg2 r8,
len4gtgt
56HIR, MIR, LIR as an ADT
- View IR as an abstract data type
- Example fields
- ProcName - the procedure name
- Nblocks - the number of basic blocks
- ninsts array1.. nblocks of integer
- Block array1..nblocks of array .. of
Instruction - Succ, Pred Integer? set of integer
- Example methods
- insert_before(i, j, ninsts, Block, inst)
57Outline
- Issues in IR design
- High-Level IRs
- Medium-Level IRs
- Low-Level IRs
- Multi-Level IRs
- MIR, HIR, and LIR
- ICAN Representations
- Other IRs
- Conclusions
58Triples
- Three address instructions
- Implicit names for results (instruction index)
- No need for temporary names
- Usually represented via pointers
- Program transformations may be tricky
- Can be translated from/into MIR
59MIR
TRIPLES
L1 i ? i 1 t1 ? i 1 t2 ? p4
t3 ? t2 p ? t2 t4 ? t1 lt10 r ?
t3 if t4 goto L1
(1) i 1 (2) i sto (1) (3) i 1 (4)
p4 (5) (4) (6) p sto (4) (7) (3) lt10 (8) r
sto (5) if (7), (1)
60Trees
- Compact representation for expressions
- A basic block is a sequence of trees
- Assignments can be implicit or explicit
61MIR
Trees
L1 i ? i 1 t1 ? i 1 t2 ?
p4 t3 ? t2 p ? t2 t4 ? t1
lt10 r ? t3 if t4 goto L1
62Combining trees may lead to incorrect computation
b add
a ? a1 b ? aa
a add
a add
a
a
1
1
63Preorder Translation into MIR
t4 less
t5 ?i1 t4 ?t5lt10
64Advantages of Trees
- Minimize temporaries
- Amenable to many optimizations
- Locally optimized code with register allocation
can be used - Easy to translate into Polish-Prefix code(used
for automatic instruction selection)
65Directed Acyclic Graphs (DAGs)
- A combination of trees
- Operands which are reused are linked
- Nodes may be annotated with variable names
66MIR
DAG
L1 i ? i 1 t1 ? i 1 t2 ?
p4 t3 ? t2 p ? t2 t4 ?
t1 lt10 r ? t3 if t4 goto L1
67MIR
DAG
c ? a b ? a 1 c ? 2 a d ? -c c ? a
1 c ?b a d ? 2 a b ? c
68Properties of DAGs
- Very compact
- Local common sub-expression elimination
- Not so easy to optimize
69Conclusions
- Representations in the book
- HIR, MIR, LIR
- Other representations
- Triples, Trees, DAGs, Stack machines
- Source language dependent
- Algol Object Code(1960)
- Pascal P-code (1980)
- Prolog Warren machine code (1977)
- Java bytecode (1996)
- Microsoft .net?