Title: Syntax Directed Translation
1Syntax Directed Translation
2Phases of a Compiler
- 1. Lexical Analyzer (Scanner)
- Takes source Program and Converts into tokens
- 2. Syntax Analyzer (Parser)
- Takes tokens and constructs a parse tree.
- 3. Semantic Analyzer
- Takes a parse tree and constructs an abstract
syntax tree with attributes.
3Phases of a Compiler- Contd
Syntax Directed Translation
- 4.
- Takes an abstract syntax tree and produces an
Interpreter code (Translation output) - 5. Intermediate-code Generator
- Takes an abstract syntax tree and produces un-
optimized Intermediate code.
4Motivation Parser as Translator
- Syntax-directed translation
Parser
ASTs, byte code assembly code, etc
Stream of tokens
Syntax translation rules (often hardcoded in
the parser)
5Important
- Syntax directed translation attaching actions to
the grammar rules (productions). - The actions are executed during the compilation
(not during the generation of the compiler, not
during run time of the program!). Either when
replacing a nonterminal with its rhs (LL,
top-down) or a handle with a nonterminal (LR,
bottom-up). - The compiler-compiler generates a parser which
knows how to parse the program (LR,LL). The
actions are implanted in the parser and are
executed according to the parsing mechanism.
6Example Expressions
- E ? E T
- E ? T
- T ? T F
- T ? F
- F ? ( E )
- F ? num
7Synthesized Attributes
- The attribute value of the terminal at the left
hand side of a grammar rule depends on the values
of the attributes on the right hand side. - Typical for LR (bottom up) parsing.
- Example T?TF .val1.val?3.val.
T.val
T.val
F.val
8Example Expressions In LEX
- E ? E T .val1.val3.val
- E ? T .val1.val
- T ? T F .val1.val3.val
- T ? F .val1.val
- F ? ( E ) .val2.val
- F ? num 1.val1.val
9Example 2Type definitions
- D ? T L
- T ? int
- T ? real
- L ? id , L
- L ? id
10Inherited attributes
- The value of the attributes of one of the symbols
to the right of the grammar rule depends on the
attributes of the other symbols (left or right). - Typical for LL parsing (top down).
- D ? T 2.type1.type L
- L ? id , 3.type1.type L
D.type
L.type
,
id
L.type
T.type
L.type
11Type definitions
- D ? T 2.type1.type L
- T ? int .typeint
- T ? real real
- L ? id , L gen(id.name,.type)
- 3.type.type
- L ? id gen(id.name,.type)
T.type
int
12Type definitions LL(1)
- D ? T 2.type1.type L
- T ? int .typeint
- T ? real real
- L ? id gen(id.name,.type)
- 2.type.type R
- R ? , id gen(id.name,.type)
- R ? ?
T.type
int
13How to arrange things for LL(1) on stack?
- Include on the stack, except for the grammar
symbol also the actions, and a shadow copy for
each nonterminal. - Each time one sees an action on the stack,
execute it. - Shadow copies are used to get synthesized values
and pass them further to the right of the rule.
14LR parser
UMBC
- Given the current state on top and current
token, consult the action table. - Either, shift, i.e., read a new token, put in
stack, and push new state, or - or Reduce, i.e., remove some elements from
stack, and given the newly exposed top of stack
and current token, to be put on top of stack,
consult the goto table about new state on top of
stack.
a
b
LR(k) parser
sn
X0
sn-1
Xn-1
action goto
s0
15LR parser adapted.
a
b
- Same as before, plus
- Whenever reduce step, execute the action
associated with grammar rule.If left-to right
inherited attributes exist, can also execute
actions in middle of rule. - Can put record of attributes, associated with a
grammar symbol, on stack.
Attributes
LR(k) parser
sn
X0
sn-1
Xn-1
action goto
s0
16LL parser
- If top symbol X a terminal, must match current
token m. - If not, pop top of stack. Then look at table TX,
m and push grammar rule there in reverse order.
17 234 num.type2
num 34 F?num F.type2
F 34 T?F T.type2
T 34 E?T E.type2
E 34 shift
E 34 shift num.type3
Enum 4 F?num F.type3
EF 4 T?F F.type3
ET 4 shift
ET 4 shift num.type4
ETnum F?num F.type4
ETF T?TF T.type12
ET E?ET E.type14
18LL parser Adapted
UMBC
- If top symbol X a terminal, must match current
token m. - Put actions into stack as part of rules. Hold for
each nonterminal a record with attributes. - If nonterminal, replace top of stack with shadow
copy. Then look at table TX, m and push
grammar rule there in reverse order. - If shadow copy, remove. This way nonterminal can
deliver values down and up.
a
b
Attributes
LL(k) parser
Y
X
Z
Parsing table
Actions
19On stack to be read rule action
D int a,b
(D)LT int a,b D?TL
(D)L(T)int int a,b T?int
(D)L(T) a,b T.typeint
(D)L a,b L.typeint
(D)(L)Rid a,b L?idR
(D)(L)R ,b Gen(a,int), R.typeint
(D)(L)(R)id, ,b R?, id
(D)(L)(R)id b
(D)(L)(R) Gen(b,int), R.typeint
20Expressions in LLEliminating left recursion
- E ? E T
- E ? T
- T ? T F
- T ? F
- F ? ( E )
- F ? num
- E ? T E
- E ? T E
- E ? ?
- T ? F T
- T ? F T
- T ? ?
- F ? ( E )
- F ? num
21UMBC
(23)4
- E ? T E
- E ? T E
- E ? ?
- T ? F T
- T ? F T
- T ? ?
- F ? ( E )
- F ? num
3
22Actions in LL
E
- E ? T 2.down1.up
- E .up2.up
- E ? T 3.down.down2.up
- E .up3.up
- E ? ? .up.down
- T ? F 2.down1.up
- T .up2.up
- T ? F 3.down.down2.up
- T .up3.down
- T ? ? .up.down
- F ? ( E ) .up2.up
- F ? num .up1.up
E
T
?
T
F
E
T
F
(
)
4
E
T
?
E
T
T
F
2
F
T
?
?
3
?
23Syntax Directed Translation Scheme
- A syntax directed translation scheme is a syntax
directed definition in which the net effect of
semantic actions is to print out a translation of
the input to a desired output form. - This is accomplished by including emit
statements in semantic actions that write out
text fragments of the output, as well as
string-valued attributes that compute text
fragments to be fed into emit statements.
24Syntax-Directed Translation
- Values of these attributes are evaluated by the
semantic rules associated with the production
rules. - Evaluation of these semantic rules
- may generate intermediate codes
- may put information into the symbol table
- may perform type checking
- may issue error messages
- may perform some other activities
- in fact, they may perform almost any activities.
- An attribute may hold almost any thing.
- a string, a number, a memory location, a complex
record. - Grammar symbols are associated with attributes to
associate information with the programming
language constructs that they represent.
25Syntax-Directed Definitions and Translation
Schemes
- When we associate semantic rules with
productions, we use two notations - Syntax-Directed Definitions
- Translation Schemes
26Schemes
- Syntax-Directed Definitions
- give high-level specifications for translations
- hide many implementation details such as order of
evaluation of semantic actions. - We associate a production rule with a set of
semantic actions, and we do not say when they
will be evaluated. - Translation Schemes
- indicate the order of evaluation of semantic
actions associated with a production rule. - In other words, translation schemes give a little
bit information about implementation details.
27Syntax-Directed Definitions
- A syntax-directed definition is a generalization
of a context-free grammar in which - Each grammar symbol is associated with a set of
attributes. - This set of attributes for a grammar symbol is
partitioned into two subsets called - synthesized and
- inherited attributes of that grammar symbol.
- Each production rule is associated with a set of
semantic rules. - Semantic rules set up dependencies between
attributes which can be represented by a
dependency graph. - This dependency graph determines the evaluation
order of these semantic rules. - Evaluation of a semantic rule defines the value
of an attribute. But a semantic rule may also
have some side effects such as printing a value.
28Annotated Parse Tree
- A parse tree showing the values of attributes at
each node is called an annotated parse
tree. - The process of computing the attributes values at
the nodes is called annotating (or decorating) of
the parse tree. - Of course, the order of these computations
depends on the dependency graph induced by the
semantic rules.
29Syntax-Directed Definition
- In a syntax-directed definition, each production
A?a is associated with a set of semantic rules
of the form - bf(c1,c2,,cn)
- where f is a function and b can be one of the
followings - ? b is a synthesized attribute of A and
c1,c2,,cn are attributes of the grammar symbols
in the production ( A?a ). - OR
- ? b is an inherited attribute one of the grammar
symbols in a (on the right side of the
production), and c1,c2,,cn are attributes of the
grammar symbols in the production ( A?a ).
30Attribute Grammar
- So, a semantic rule bf(c1,c2,,cn) indicates
that the attribute b depends on attributes
c1,c2,,cn. - In a syntax-directed definition, a semantic rule
may just evaluate a value of an
attribute or it may have some side effects such
as printing values. - An attribute grammar is a syntax-directed
definition in which the functions in the semantic
rules cannot have side effects (they can
only evaluate values of attributes).
31Syntax-Directed Definition -- Example
- Production Semantic Rules
- L ? E return print(E.val)
- E ? E1 T E.val E1.val T.val
- E ? T E.val T.val
- T ? T1 F T.val T1.val F.val
- T ? F T.val F.val
- F ? ( E ) F.val E.val
- F ? digit F.val digit.lexval
- Symbols E, T, and F are associated with a
synthesized attribute val. - The token digit has a synthesized attribute
lexval (it is assumed that it is evaluated by the
lexical analyzer).
32Annotated Parse Tree -- Example
33Dependency Graph
34Syntax-Directed Definition Example2
- Production Semantic Rules
- E ? E1 T E.locnewtemp(), E.code E1.code
T.code - add E1.loc,T.loc,E.loc
- E ? T E.loc T.loc, E.codeT.code
- T ? T1 F T.locnewtemp(), T.code T1.code
F.code - mult T1.loc,F.loc,T.loc
- T ? F T.loc F.loc, T.codeF.code
- F ? ( E ) F.loc E.loc, F.codeE.code
- F ? id F.loc id.name, F.code
- Symbols E, T, and F are associated with
synthesized attributes loc and code. - The token id has a synthesized attribute name (it
is assumed that it is evaluated by the lexical
analyzer). - It is assumed that is the string
concatenation operator.
35Syntax-Directed Definition Inherited Attributes
- Production Semantic Rules
- D ? T L L.in T.type
- T ? int T.type integer
- T ? real T.type real
- L ? L1 id L1.in L.in, addtype(id.entry,L.in)
- L ? id addtype(id.entry,L.in)
- Symbol T is associated with a synthesized
attribute type. - Symbol L is associated with an inherited
attribute in.
36A Dependency Graph Inherited Attributes
- Input real p q
- D L.inreal
- T L T.typereal L1.inreal
addtype(q,real) - real L id addtype(p,real)
id.entryq - id id.entryp
- parse tree dependency graph
37Syntax Trees
- Decoupling Translation from Parsing-Trees.
- Syntax-Tree an intermediate representation of
the compilers input. - Example Procedures mknode, mkleaf
- Employment of the synthesized attribute nptr
(pointer) - PRODUCTION SEMANTIC RULE
- E ? E1 T E.nptr mknode(,E1.nptr ,T.nptr)
- E ? E1 - T E.nptr mknode(-,E1.nptr ,T.nptr)
- E ? T E.nptr T.nptr
- T ? (E) T.nptr E.nptr
- T ? id T.nptr mkleaf(id, id.lexval)
- T ? num T.nptr mkleaf(num, num.val)
38Draw the Syntax Tree
a-4c
to entry for c
num 4
to entry for a
39Directed Acyclic Graphs for Expressions
a a ( b c ) ( b c ) d
40Examples
- 1. Postfix and Prefix notations
- We have already seen how to generate them.
- Let us generate Java Byte code.
- E -gt E E emit(iadd)
- E-gt E E emit(imul)
- E-gt T
- T -gt ICONST emit(sipush ICONST.string)
- T-gt ( E )
41Abstract Stack Machine
- We now present (from Java Virtual Machine Spec
see http//java.sun.com/docs/books/vmspec/2nd-edit
ion/html/VMSpecTOC.doc.html) a simple stack
machine and illustrate how to generate code for
it via syntax-directed translations. - The abstract machine code for an expression
simulates a stack evaluation of the postfix
representation for the expression. Expression
evaluation proceeds by processing the postfix
representation from left to right.
42Evaluation
- 1. Pushing each operand onto the stack when
encountered. - 2. Evaluating a k-ary operator by using the value
located k-1 positions below the top of the stack
as the leftmost operand, and so on, till the
value on the top of the stack is used as the
rightmost operand. - 3. After the evaluation, all k operands are
popped from the stack, and the result is pushed
onto the stack (or there could be a side-effect)
43Example
- Stmt -gt ID expr stmt.t expr.t
istore a - Applied to a 3b c
- bipush 3
- iload b
- imul
- iload c
- isub
- istore a
44Java Virtual Machine
- Analogous to the abstract stack machine, the Java
Virtual machine is an abstract processor
architecture that defines the behavior of Java
Bytecode programs. - The stack (in JVM) is referred to as the operand
stack or value stack. Operands are fetched from
the stack and the result is pushed back on to the
stack. - Advantages VM code is compact as the operands
need not be explicitly named.
45Data Types
- The int data type can hold 32 bit signed integers
in the range -231 to 2(31) -1. - The long data type can hold 64 bit signed
integers. - Integer instructions in the Java VM are also used
to operate on Boolean values. - Other data types that Java VM supports are byte,
short, float, double. (Your project should handle
at least three data types).
46Selected Java VM Instructions
- Java VM instructions are typed i.e., the operator
explicitly specifies what operand types it
expects. - Expression Evaluation
- sipush n push a 2 byte signed int on to stack
- iload v load/push a local variable v
- istore v store top of stack onto local var v
- iadd pop two elements and push their sum
- isub pop two elements and push their difference
47Selected Java VM Instructions
- imul pop two elements and push their product
- iand pop two elements and push their bitwise and
- ior pop two elements and push their bitwise or
- ineg pop top element and push its negation
- lcmp pop two elements (64 bit integers), push the
comparison result. 1 if Vs0ltvs1, 0 if - vs0vs1 otherwise -1.
- i2l convert integers to long
- l2i convert long to integer
48Selected Java VM Instructions
- Branches
- GOTO L unconditional transfer to label l
- ifeq L transfer to label L if top of stack is 0
- ifne L transfer to label L if top of stack !0
- Call/Return Each method/procedure has memory
space allocated to hold local variables (vars
register), an operand stack (optop register) and
an execution environment (frame register)
49Selected Java VM Instructions
- Invokestatic p invoke method p. pop args from
stack as initial values of formal parameters
(actual parameters are pushed before calling). - Return return from current procedure
- ireturn return from current procedure with
integer value on top of stack. - Areturn return from current procedure with
object reference return value on top of stack.
50Selected Java VM Instructions
- Array Manipulation Java VM has an object data
type reference to arrays and objects - newarray int create a new array of integers
using the top of the stack as the size. Pop the
stack and push a reference to the newly created
array. - Iaload pop array subscript expression on top of
stack and array pointer (next stack element).
Push value contained in this array element. - iastore
51Selected Java VM Instructions
- Object Manipulation
- new c create a new instance of class C (using
heap) and push the reference onto stack. - getfield f push value from object field f of
object pointed by object reference at the top of
stack. - putfield f store value from vs1 into field f of
object pointed by the object reference vs0
52Selected Java VM Instructions
- Simplifying Instructions
- ldc constant is a macro which will generate
either bipush or sipush depending on c.
53Byte Code (JVM Instructions)
- No-arg operand (instructions needing no
arguments hence take only one byte.) - examples aaload, aastore,aconsta_null, aload_0,
aload_1, areturn, arraylength, astore_0, athrow,
baload, iaload, imul etc - One-arg operand bipush, sipush,ldc etc
- methodref op
- invokestatic, invokenonvirtual, invokevirtual
54Byte Code (JVM Instructions)
- Fieldref_arg_op
- getfield, getstaic, putfield, pustatic.
- Class_arg_op
- checkcast, instanceof, new
- labelarg_op (instructions that use labels)
- goto, ifeq, ifne, jsr, jsr_w etc
- Localvar_arg_op
- iload, fload, aload, istore
55Translating an If statement
- Stmt -gt if expr then stmt1
- out newlabel()
- stmt.t expr.t ifnnonnull out stmt1.t
- label out nop
- example
- if ( a 907) x x1 x x3
56Translating a while statement
- Stmt -gt WHILE (expr) stmt1
- in newlabel()
- out newlabel()
- emit( stmt.t label in nop expr.t
ifnonnull out stmt1.t goto in
label out)
57References
- Compilers Principles, Techniques and Tools, Aho,
Sethi, and Ullman , Chapter 5 - http//www.cs.rpi.edu/moorthy/Courses/compiler98/
Lectures/lecturesinppt/lecture15.ppt - http//www.cs.wisc.edu/bodik/cs536/NOTES/lecture1
4.ppt - http//www.ece.utexas.edu/doron/yacc.ppt
- Appel, Chapter 4 and 5
- http//203.208.166.84/dtanvirahmed/cse309N/CSE309-
5.1-3.ppt