Title: CSCI 435 Compiler Design
1CSCI 435 Compiler Design
- Week 6 Class 3
- Section 4 to Section 4.1.2
- (279-290)
- Ray Schneider
2Topics of the Day
- Processing the intermediate code
- Interpretation
- Recursive Interpretation
- Iterative Interpretation
3Where we are ...
- Now we have an annotated syntax tree, either
actually in memory as a data structure (Broad
Compilers) or implicitly available during parsing
(Narrow Compiler). - The Annotated Syntax tree bears traces of its
origin, the language constructs and the like,
represented by nodes and subtrees, despite the
relative paradigm independence of the methods
being used - NOW THE NEXT STEP Transforming the AST into
Intermediate Code
4Status of various modules in compiler construction
The AST is full of nodes reflecting the specific
semantic concepts of the source language.
Intermediate Code Generation reduces the set of
specific node types to a small set of general
concepts easily implemented on actual machines.
FIND and REWRITE Intermediate Code Generation
finds the language characteristic nodes and
subtrees in the AST and rewrites them into
subtrees that use only a small number of
features, each corresponding closely to a set of
machine instructions.
5After FIND and REPLACE
- The resulting tree should be called THE
INTERMEDIATE CODE TREE but is usually still
called the AST - Features of the Intermediate Code Tree are
- expressions, including assignments
- routine calls, procedure headings, and return
statements, - conditional and unconditional jumps
- IN ADDITION
- administrative features, ex. memory allocation
for global variables, activation record
allocation, and module linkage information - the entire range of high-level concepts of the
language is replaced by a few rather low-level
concepts
6Processing the Intermediate Code
- Involves either ...
- A Little Pre-processing followed by execution on
an Interpreter, or - A lot of Pre-processing in the form of machine
code generation followed by execution on hardware - Whatever the processing system ...
- Writing the Run-Time and Library system is the
majority of the work and is primarily just brute
force coding. - We will begin by looking at Interpretation
7Simplest way ...process AST using an ...
- INTERPRETER
- An Interpreter considers the nodes of the AST in
the correct order and performs the prescribed
actions required by the semantics of the language - NOTE unlike compilation, the input data is
required - Interpreter performs actions similar to the CPU
except that it works on AST nodes rather than
Machine Instructions - A CPU by contrast works on Machine Instructions
given in the correct order and performs the
actions demanded by the language as translated
into the instructions required by the semantics
of the machine - TWO KINDS OF INTERPRETERS
- RECURSIVE (works directly on the AST), and
- ITERATIVE (works on a linearized version of the
AST)
8Simple Recursive Compiler from 1.2.8 fig 1.19
(21)
9Recursive Interpretation
- An interpreting routine is provided for each node
type in the AST - Each such routine calls other similar routines
- The meaning of the language constructs are
defined as a function of the meanings of their
components - The Interpretation Starts by calling the
interpretation routine for Program with the top
node of the AST as a parameter - An important ingredient of a Recursive
Interpreter is the UNIFORM SELF-IDENTIFYING DATA
REPRESENTATION
10Uniform Self-Identifying Data Representation
- The Interpreter has to manipulate data values of
unknown types and sizes that are not known when
the Interpreter is written - Implementation requires a generic model
- implementing values as variable-size records that
- specify the type of the run-time value
- its size and the run time value itself
- a POINTER to such a record serves as the VALUE
during Interpretation
11Example Complex Numbers
- Two Parts of Data Representation
- Actual Values, vary from entity to entity
- Type of Value, things in common
"re"
"im"
"real"
Specific to the given value of type
complex_number
Common to all values of type complex_number
12Status Indicator another important feature
- Used to direct the flow of control
- Primary Component
- Mode of Operation of the Interpreter an
enumeration value with normal value something
like "Normal Mode" indicating sequential flow of
control, other values like Jumps, Exceptions,
Function Returns - Second Component
- value in the wider sense Supply more information
about the Non-Sequential Flow of Control, ex.
Return Mode, Names and Values of Exceptions,
Label for Jump Mode, etc. - Status Indicator should contain file name and
line number of text where status indicator was
created and possibly other debugging information - Each interpreting routine checks the status
indicator after each call to another routine to
see how to carry on
13Outline of a routine for recursive interpretation
of an if-statement
PROCEDURE Elaborate if statement (If node)
SET Result TO Evaluate condition (If node
.condition) IF Status .mode / Normal mode
RETURN IF Result .type / Boolean
ERROR "Condition in if-statement is not of type
Boolean" RETURN IF Result .boolean
.value True Elaborate statement (If node
.then part) ELSE Result .boolean .value
False // Check if there is an else-part at
all IF If node .else part / No node
Elaborate statement (If node .else part)
ELSE If node .else part No node SET
Status .node TO Normal mode
14Typical Handling of the Symbol Table
- Variables, named constants, other named entities
are handled by the Symbol Table which is handled
like the example below for something like
variable V of type T say a record called
"Declarable" - a pointer to the name V,
- the file name and line number of its declaration
- an indication of its kind (variable, constant,
field selector, etc.) - a pointer to the type T
- a pointer to newly allocated room for the value
of V - a bit telling whether or not V has been
initialized, if known - one or more scope- and stack- related pointers,
depending on the language - other data as required (language dependent)
15Summary
- Recursive Interpreter can generally be written
quickly, so useful for rapid prototyping - Not the best architecture for heavy duty
interpreting but good for debugging language
concepts and features - Big Disadvantage Very Slow, as much as 1000
times slower than a compiler for the same
language - This can be improved somewhat by doing as much
static context checking as possible in the
pre-interpretive phase (see Memoization pg.286)
16Iterative Interpretation
- Structure of an Iterative Interpreter is much
closer to that of a CPU than a Recursive
Interpreter is. - Consists of a flat loop over a case statement
which contains a code segment for each node type - the code segment for a node type implements the
semantics of that node type - It requires a fully annotated and threaded AST
and maintains an ACTIVE NODE POINTER which points
to the node being interpreted, i.e. the ACTIVE
NODE - The interpreter runs the code for the Active Node
which then points to another node, the successor
node.
17include "parser.h" / for types AST_node and
Expression / include "thread.h" / for
Thread_AST() and Thread_start / include
"stack.h" / for Push() and Pop() / include
"backend.h" / for self check / static
AST_node Active_node_pointer static void
Interpret_iteratively(void) while
(Active_node_pointer ! 0) / there is
only one node type, Expression /
Expression expr Active_node-pointer
switch (expr-gttype) case 'D'
Push(expr-gtvalue) break case
'P' int e_left Pop() int e_right
Pop() switch (expr-gtoper)
case '' Push(e_left e_right) break
case '' Push(e_left e_right) break
break
Active_node_pointer Active_node_pointer-gtsuccess
or printf("d\n",Pop()) / print the
result / void Process(AST_node icode)
Thread_AST(icode) Active_node_pointer
Thread_start Interpret_iteratively()
An iterative interpreter for the demo compiler of
1.2 JUST A BIG SWITCH STATEMENT
18the Iterative Interpreter 1
- Data Structures resemble those inside a compiled
program more than those in a Recursive
Interpreter - ex. Array holding the global data, if source
language is stack oriented, then the iterative
compiler maintains a stack. - Variables and Entities have an address which is
generally an offset into a memory array - Symbol table is no longer relevant, but useful to
generate better error messages
19the Iterative Interpreter 2
- Iterative interpreter has more information about
run time events that a compiled program but less
than a recursive interpreter - one can make up for the lack of a symbol table in
an iterative interpreter by using SHADOW MEMORY
parallel to the memory arrays maintained by the
interpreter. The Shadow Memory holds properties
of the corresponding byte in memory, ex. "is
uninitialized", "is a non-first byte of a
pointer", "belongs to a read only array" the
different modes can be encoded with byte-codes - Some Iterative Interpreters store the AST in a
single array for several reasons - easier to write it to file
- more compact representation
- reusable without regenerating the AST
20Three Forms of Storing an AST a Graph
21Storing an AST in an array or as
pseudo-instructions
Array
condition
IF
condition
IF_FALSE
statement 1
statement 1
JUMP
statement 2
statement 2
statement 3
statement 3
statement 4
statement 4
Pseudo- Instructions
22AST Constructions and interpretation
- Usually puts the successor of a node right after
the node - may even omit the successor pointer altogether
and just make it the default and only include
pointers when the next node is NOT the successor
node - Historically an Iterative Interpreter mimics a
CPU working on a compiled program and the AST
array mimics the compiled program - Iterative Interpreters are easier to write even
than recursive interpreters and much easier than
compilers - Only serious deficiency is speed, even the best
interpreter is typically 30 times slower that an
optimized compiler
23Next time ...
- Next time we'll start looking at Code Generation
- We'll spend about two or three classes on it.
24Homework for Week 8
- Bison Familiarization
- Read the entire 39 pages of "A Compact Guide To
Lex and Yacc" // you can skim through it the
first time - THEN concentrate first on getting the lex example
on page 10 running - THEN after you have that running go on to
Practice, Part 1 and strive to get the primitive
calculator running (pages 14 through 17) - HINTS the lex input on page 10 can be made to
run by extending it with the line (cribbed from
our text) - int yywrap(void) return 1 //at the end, and
you have to put - include ltstdlib.hgt //at the top of your yy.lex.c
output then the code will add line number to a
text file reading the file name in from the
command line and sending the output to stdout.
25References
- Text Modern Compiler Design Figures
- Lex A Lexical Analyzer Generator by M.E. Lesk
and E. Schmidt - Yacc Yet Another Compiler-Compiler by Stephen C.
Johnson - see http//dinosaur.compilertools.net/yacc/index.h
tml and http//dinosaur.compilertools.net/lex/in
dex.html