Title: Abstract Syntax
1Abstract Syntax
- Mooly Sagiv
- Schrierber 317
- 03-640-7606
- Wed 1000-1200
- html//www.cs.tau.ac.il/msagiv/courses/wcc02.html
2Outline
- The general idea
- Bison
- Motivating example Interpreter for arithmetic
expressions - The need for abstract syntax
- Abstract syntax for Straight-line code
- Abstract syntax for Tiger (Targil)
3Semantic Analysis during Recursive Descent
Parsing
- Scanner returns semantic values for some tokens
- The function of every non-terminal returns the
corresponding subtree value - When A B C D is appliedthe function for A
can use the values returned by B, C, and D - The function can also pass parameters, e.g., to
D(), reflecting left contexts
4int E() swith(tok) case num
temptok.val eat(num) return EP(temp)
default error()
E num E E empty-string E num E
int EP(int left) swith(tok) case
return left case eat()
temptok.val eat(num) return EP(left
temp) default error()
5Semantic Analysis during Bottom-Up Parsing
- Scanner returns semantic values for some tokens
- Use parser stack to store the corresponding
subtree values - When A B C D is reducedthe function for A
can use the values returned by B, C, and D - No action in the middle of the rule
6Example
num 5
E E num E num
E 7
E 12
7Bison Specification
Declarations Productions C -Routines
8Interpreter (in Bison)
declarations of yylex() and yyeror()
union int num string
id token ltidgt ID token ltnumgt NUM type ltnumgt
e f t start e
e e t 1 3 e - t
1 - 3 t 1
t t f 1 3 t
/ f 1 / 3 f 1
f NUM 1 ID
lookup(1) - e - 2
( e ) 2
9Interpreter (compact spec.)
declarations of yylex() and
yyeror() union int num
string id token ltidgt ID token ltnumgt NUM type
ltnumgt e start e left PLUS MINUS left MUL
DIV right UMINUS
e e PLUS e 1 3 e MINUS e
1 - 3 e MUL e 1 3
e DIV e 1 / 3 NUM
ID lookup(1) MINUS e
prec UMINUS - 2 ( e )
2
10(No Transcript)
11stack
input
action
e 11 e 7
17
reduce e ee
12stack
input
action
e 17 e 18
reduce e ee
13So why cant we write all the compiler code in
Bison?
14 typdef struct table Table_ typedef Table_
struct string id, int value, Table _tail Table_
Table(string id, int value, struct table
tail) Table_ tableNULL int lookup(Table_
table, string id) assert(table!NULL) if
(idtable.id) return table.value else
return lookup(table.tail, id) void
update(Table_ tabptr, string id, int value)
tabptr Table(id, value, tabptr) union
int num string id token ltnumgt INT token
ltidgt ID token ASSIGN PRINT LPAREN RPAREN type
ltnumgt exp left SEMICOLUMN COMMA left PLUS
MINUS left TIMES DIV start prog
prog stm stm stm SEMICOLUMN stm
ID ASSIGN exp update(table, 1, 3)
PRINT LPAREN exps RPAREN printf(\n)
exps exp printf(d, 1)
exps COMMA exp printf(d, 3)
exp INT 1 ID
lookup(table, 1) exp PLUS exp
1 3 exp MINUS exp
1 - 3 exp TIMES exp 1
3 exp DIV exp 1 / 3
stm COMMA exp 3 ( exp
) 2
15Historical Perspective
- Originally parsers were written w/o tools
- yacc, bison, ... make tools acceptable
- But it is still difficult to write compilers in
parser actions (top-down and bottom-up) - Natural grammars are ambiguous
- No modularity principle
- Many useful programming language features prevent
code generation while parsing - Use before declaration
- gotos
16Abstract Syntax
- Intermediate program representation
- Defines a tree - Preserves program hierarchy
- Generated by the parser
- Declared using an (ambiguous) context free
grammar (relatively flat) Not meant for parsing - Keywords and punctuation symbols are not stored
(Not relevant once the tree exists) - Big programs can be also handled (possibly via
virtual memory)
17Issues
- Concrete vs. Abstract syntax tree
- Need to store concrete source position
- Abstract syntax can be defined by
- Ambiguous context free grammar
- C recursive data type
- Constructor functions
- Debugging routines linearize the tree
18Abstract Syntax for Straight-line Program
19 include absyn.h union int num
string id A_stm stm
A_exp exp A_expList
expList token ltnumgt INT token ltidgt ID token
ASSIGN PRINT LPAREN
RPAREN type ltnumgt exp left
SEMICOLUMN COMMA left PLUS MINUS left
TIMES DIV start prog
prog stm 1 stm stm
SEMICOLUMN stm A_CompoundStm(1, 3)
ID ASSIGN exp A_AssignStm(1, 3)
PRINT LPAREN exps RPAREN
A_PrintStm(3) exps exp
A_ExpList(1, NULL) exps COMMA exp
A_ExpList(1, 3) exp
INT A_NumExp(1) ID A_IdExp(
1) exp PLUS exp A_OpExp(1,
A_Plus, 3) exp MINUS exp
A_OpExp(1, A_Minus, 3) exp TIMES
exp A_OpExp(1, A_Time, 3)
exp DIV exp A_OpExp(1, A_Div, 3)
exp COMMA exp A_EseqExp(1, 3)
( exp ) 2
20Summary
- Flex and Bison simplify the task of writing
compiler/interpreter front-ends - Abstract syntax provides a clear interface with
other compiler phases - Supports general programming languages
- But the design of an abstract syntax for a given
PL may take some time