Title: CSCI 435 Compiler Design
1CSCI 435 Compiler Design
- Week 5 Class 3
- Section 3.1.7 to 3.2.1
- (230 to 244)
- Ray Schneider
2Topics of the Day
- L Attributed Grammars
- . . . and top-down parsers
- . . . and bottom-up parsers
- S Attributed Grammars
- Equivalence of L-attributed and S-attributed
Grammars - Manual Methods
- Threading the AST
3L-attributed grammars
- parsing process creates nodes in the syntax tree
from left to right - either TOP DOWN parent to children, or
- BOTTOM UP children to parent
- L-attributed grammar
- attribute grammars which allow the attributes to
be evaluated in one left-to-right traversal of
the syntax tree - characterized by the fact that no dependency
graph of any of its production rules has a
data-flow arrow that points from a child to that
child or to a child to the left of it.
4example(s)
- Many programming language grammars are
L-attributed such as our previously seen
Constant_Definition dependency graph
Constant_Definition is L-attributed
Number is an example of a non-L
attributed grammar
5Synthesized -upward
Inherited -downward
A
B4
B3
B1
B2
Left-to-Right
C1
C3
C2
An important consequence for the L-attributed
property is that once work on a node has started,
no part of the compiler will need to return to
one of the node's siblings on the left to do
processing there. Two sets of attributes play a
role in processing (C2)
1) nodes on path to top C2 B3 A
2)
siblings to left at all tiers
6The point ...
- the attributes of the node being processed and
its path to the top is stored in the
corresponding nodes - nodes to the left are complete except for the
effects of synthesized attributes which can be
stored in the parent node - at all times we can restrict ourselves at all
times to the nodes that lie on the path from the
top to the node being processed - this is like top-down parsing the code can be
generated as the parsing passes through
7L-attributed grammars and top down parsers
- We saw LLgen earlier which injected parameters
and code directly into the parser - Coordination of parsing and attribute evaluation
is a great simplification but is applicable to a
smaller set of attribute grammars - Next slide we look at a LLgen example which
includes an example of an inherited attribute
implemented as a synthesized attribute of the
non-terminal and passed down through the
processing of the production rule and the result
is passed up as a value assigned to the memory
location, t.
8LLgen code for L-attributed grammar for simple
expression
include "symbol_table.h" lexical
get_next_token_class token IDENTIFIER token
DIGIT start Main_Program, main mainsymbol_tab
le sym_tab int result init_symbol_table(sy
m_tab) declarations(sym_tab)
expression(sym_tab, result) printf("result
d\n", result) declarations(symbol_table
sym_tab) declaration(sym_tab) ','
declaration(sym_tab) '' declaration(symbol_ta
ble sym_tab)symbol_entry sym_ent
IDENTIFIER sym_ent look_up(sym_tab,
Token.repr) '' DIGIT sym_ent-gtvalue
Token.repr '0' expression(symbol_table
sym_tab int e) int t term(sym_tab, t)
e t expression_tail_option(sym_tab,e) ex
pression_tale_option(symbol_table sym_tab inet
e) int t '-' term(sym_tab,t) e - t
expression_tail_option(sym_tab,e) term(symbol
_table sym_tab int t) IDENTIFIER t
look_up(sym_tab, Token.repr)-gtvalue
an inherited attribute, the symbol table
containing representations of some identifiers
and their integer values is produced as a
synthesized attribute by declarations in main.
symbol table passed down through
expression-gtexpression_tail_option-gt term
9A more technical definition of a 'narrow' compiler
- A narrow compiler is a compiler based formally or
informally on some form of L-attributed grammar. - It does not save substantially more information
than that which is present on the path from the
top to the node being processed - Generally that path is proportional to ln(n)
where n is the length of the program and the
entire AST is proportional to n
10L-attributed grammars and bottom-up parsers
- attributed evaluation in bottom-up parsers is not
so obvious, problem is that inherited attributes
must be passed down and the proper path is not
known until the node is resolved through handle
detection - Yacc, the most famous LALR(1) parser generator
does it anyway. How? - To the shifted terminals and reduced non-terminal
stacks in a bottom-up parser we add an attribute
stack containing the attributes of each stack
element in the same order
11Code in the middle
- A? B C becomes
- A? B C.inh_attr f(B.syn_attr) C
- where B.syn_attr is a synthesized attribute of B
and - C.inh_attr is an inherited attribute of C
- the code is attached to an e rule say A_action1
- A? B A_action1 C
- A_action1? e C.inh_attr f(B.syn_attr)
- The rule is at the end of an alternative and can
be exercised when the alternative is reduced - Only works if A? B ? C is the only hypothesis
- Another trick
- lay out the attribute stack so that the one and
only synthesized attribute of one node is in the
same position as the one and only inherited
attribute of the next node (no code needs to be
executed in between, i.e. no code in the middle
of a grammar rule)
12S-attributed grammars
- S-attributed grammars have no inherited
attributes at all (seems like a tough
restriction) - BUT anything that can be done in an L-attributed
grammar can be done in an S-attributed grammar - Now we just stack the synthesized attributes and
at the end of the alternative the parent scoops
up all the attributes and processes them - new_expr()-gttype'-'-gtexpr1-gtterm
3 - yacc grammar rules require exactly one
synthesized attribute. If more than one is
required a record is returned, forming the only
attribute
13Equivalence of L and S attributed grammars
- L-attributed grammars are rather easily converted
to S-attributed grammars - The Method Delay any computation that cannot be
done now to a later moment when it can be done - Implementation create a data structure which is
populated with the computations specified for
inherited attributes plus synthesized attributes
and pass it up the levels until the computation
can be made - Simple in principle but only practically feasible
for small problems
14Manual Methods
- Much context processing is still done manually by
writing code in traditional languages like C or
C - Two methods to collect context information from
the AST are - Symbolic Interpretation, and
- Data-Flow Equations
- Start with AST developed by syntax analysis
- Add to each node flow of control information
- typically in the form of successor pointers
linking the nodes of the AST to form the
additional data structure, THE CONTROL FLOW GRAPH
15Constructing the Control Flow Graph
- Statically by THREADING the tree
- Starts by getting a pointer for node type N (a
non-terminal), determines which production rule
of N describes the node, and calls threading
routines for its children - Accomplished by building a dynamic data structure
recursively using - Last node pointer is the dynamically last node
- each new node N is stored in Last node pointer
.successor and then the Last node pointer is made
to point to N.
16ex. threading routine for a binary expression
PROCEDURE Thread binary expression (Expr node
pointer) Thread expression (Expr node pointer
.left operand) Thread expression (Expr node
pointer .right operand) // link this node to the
dynamically last node SET Last node pointer
.successor TO Expr node pointer // make this
node the new dynamically last node SET Last
node pointer TO Expr node pointer
Control-flow Graph of bb
4ac Statically the node is the first node,
but dynamically, at run time, the left-most b is
the first node
17include "parser.h" /for types AST_node and
Expression / include "thread.h" / for self
check / / PRIVATE
/ static AST_node Last_node static void
Thread_expression(Expression expr) switch
(expr-gttype) case 'D'
Last_node-gtsuccessorexpr Last_nodeexpr
break case 'P' Thread_expression(expr-gt
left) Thread_expression(expr-gtright)
Last_node-gtsuccessor expr Last_node expr
break AST_node Thread_start void
Thread_AST(AST_node icode) AST_node
Dummy_node Last_node Dummy_node
Thread_expression(icode) Last_node-gtsuccessor
(AST_node )0 Thread_startDummy_node.succes
sor
Global variable
Threading code for demo compiler from Section 1.2
18Complications caused by flow of control
- example the IF statement (two problems)
- 1. node corresponding to run time decision has
two successors not one - 2. when we get to the end of the IF its address
must appear in both the then-part and the
else-part so a single Last node pointer is no
longer sufficient - Solution 1. Store two successor pointers in the
IF node - Solutions to 2.
- replace Last node pointer by a set of last nodes,
or - construct a special join node to merge the
diverging flow of control (such a node is part of
the Control Flow Graph but not of the AST)
19Simple Threading Routine for if-statements
PROCEDURE Thread if statement (IF node pointer)
Thread expression (If node pointer
.condition) SET Last node pointer .successor
to IF node pointer SET End if join node TO
Generate join node() SET Last node pointer
TO address of a local node Aux last node
Thread block (If node pointer .then part) SET
If node pointer .true successor TO Aux last node
.successor SET Last node pointer .successor
TO address of End if join node SET Last node
pointer TO address of Aux last node Thread
block (If node pointer .else part) SET If
node pointer .false successor TO Aux last node
.successor SET Last node pointer .successor
TO address of End if join node SET Last node
pointer TO address of End if join node
20AST and control flow graph after threading
Note that the Last node pointer has been moved to
point to the end-if join node.
21Using an attribute grammar
- Threading the AST can be expressed using an
attribute grammar - successor pointers are implemented as inherited
attributes, and - each node has an additional synthesizing
attribute that is set by the evaluation rules to
the pointer to the first node to be executed in
the tree.
If_statement (INH successor, SYN first)? 'IF'
Condition 'THEN' Then_part 'ELSE' Else_part 'END'
'IF' ATTRIBUTE RULES SET IF_statement
.first TO Condition .first SET Condition
.true successor TO Then_part .first SET
Condition .false successor TO Else_part .first
SET Then_part .successor TO If_statement
.successor SET Else_part .successor TO
If_statement .successor
22Doubly-linked list representation of CFG
A doubly linked list implementation of the
control flow graph provides a set of pointers for
each node dynamic successors and dynamic
predecessors. This gives algorithms traversing
the control flow graph great freedom of movement
and proves particularly useful when processing
data-flow equations
End_if
23Next time
- We'll start looking at two manual methods for
context handling - SYMBOLIC INTERPRETATION and
- DATA FLOW EQUATIONS
- Then we'll start looking at the process of
Intermediate Code Generation
24Homework for Week 7
- Get Lex/Flex generated code from page 95 figure
2.41 to run under Visual C - Hints resolve function conflicts by adding
include ltappropriate librarygt in the generated
C-file. ex. exit, malloc, realloc, and free are
in ltstdlib.hgt, the strcpy() function is in
ltstring.hgt - You can include the default main() by adding a 1
to the line define YY_MAIN 1 - And then amplify the default main to called
get_next_token() and print out appropriate results
25References
- Text Modern Compiler Design
26(No Transcript)