Title: Abstract Syntax
1Abstract Syntax
2Abstract Syntax Trees
- So far a parser traces the derivation of a
sequence of tokens - The rest of the compiler needs a structural
representation of the program - Abstract syntax trees
- Like parse trees but ignore some details
- Abbreviated as AST
3Abstract Syntax Tree. (Cont.)
- Consider the grammar
- E ? int ( E ) E E
- And the string
- 5 (2 3)
- After lexical analysis (a list of tokens)
- int5 ( int2 int3 )
- During parsing we build a parse tree
4AST Covered
- We built AST by hand in the 1st Project
- Lets see what the Galles text has to say about
AST - Lets also look at some code
5Example of Abstract Syntax Tree
PLUS
PLUS
2
5
3
- Also captures the nesting structure
- But abstracts from the concrete syntax
- gt more compact and easier to use
- An important data structure in a compiler
6Example of Parse Tree
E
- Traces the operation of the parser
- Does capture the nesting structure
- But too much info
- Parentheses
- Single-successor nodes
E
E
int5
(
E
)
E
E
int2
int3
7Semantic Actions
- This is what well use to construct ASTs
- Each grammar symbol may have attributes
- For terminal symbols (lexical tokens) attributes
can be calculated by the lexer - Each production may have an action
- Written as X ? Y1 Yn action
- That can refer to or compute symbol attributes
8Semantic Actions An Example
- Consider the grammar
- E ? int E E ( E )
- For each symbol X define an attribute X.val
- For terminals, val is the associated lexeme
- For non-terminals, val is the expressions value
(and is computed from values of subexpressions) - We annotate the grammar with actions
- E ? int E.val int.val
- E1 E2 E.val E1.val
E2.val - ( E1 ) E.val E1.val
9Semantic Actions An Example (Cont.)
- String 5 (2 3)
- Tokens int5 ( int2 int3 )
- Productions Equations
- E ? E1 E2 E.val
E1.val E2.val - E1 ? int5 E1.val
int5.val 5 - E2 ? ( E3) E2.val E3.val
- E3 ? E4 E5 E3.val E4.val
E5.val - E4 ? int2 E4.val
int2.val 2 - E5 ? int3 E5.val
int3.val 3
10Semantic Actions Notes
- Semantic actions specify a system of equations
- Order of resolution is not specified
- Example
- E3.val E4.val E5.val
- Must compute E4.val and E5.val before E3.val
- We say that E3.val depends on E4.val and E5.val
- The parser must find the order of evaluation
11Dependency Graph
E
- Each node labeled E has one slot for the val
attribute - Note the dependencies
E2
E1
int5
5
(
E3
)
E4
E5
int2
2
int3
3
12Evaluating Attributes
- An attribute must be computed after all its
successors in the dependency graph have been
computed - In previous example attributes can be computed
bottom-up - Such an order exists when there are no cycles
- Cyclically defined attributes are not legal
13Semantic Actions Notes (Cont.)
- Synthesized attributes
- Calculated from attributes of descendents in the
parse tree - E.val is a synthesized attribute
- Can always be calculated in a bottom-up order
- Grammars with only synthesized attributes are
called S-attributed grammars - Most frequent kinds of grammars
14Semantic Actions Top-down Approach
- Recursive-descent interpreter
- Consider this grammar
- S -gt E
- E -gt T E E-gt T E E -gt - T E
E-gt - T -gt F T T -gt F T T -gt / F T T
-gt - F -gt id F -gt num F -gt ( E )
- Needs type of non-terminals and tokens
15Recursive-descent interpreter
- int T() switch (tok.kind)
- case ID case NUM case LPAREN
- return Tprime( F() )
- defaultprint(expected ID, NUM, or
left-paren) - skipto(T_follow) return 0
- int Tprime(int a) switch (tok.kind)
- case TIMES eat(TIMES) return
Tprime(aF()) - case DIVIDE eat(DIVIDE) return
Tprime(a/F()) - case PLUS case MINUS case RPAREN case
EOF - return a
- default / error handling /
16JavaCC version
- Grammar
- S -gt E
- E -gt T ( T - T)
- T -gt F ( F - F)
- F -gt id num ( E )
- Note
- E gt T E E -gt T E - T E e
17JavaCC version
- void Start()
- int i
- iExp() ltEOFgt System.out.println(i)
-
- int Exp()
- int a, i
- aTerm() ( iTerm() aai
- - iTerm() aai )
- return a
-
- Int Factor()
- Token t int i
- t ltIDENTIFIER gt return lookup(t.image)
- tltINTEGER_LITERALgt return Integer.parseInt(t.
image) - ( iExp() ) return i
-
18Semantic Actions Reduce and Shift
- We can now illustrate how semantic actions are
implemented for LR parsing - Keep attributes on the stack
- On shift a, push attribute for a on stack
- On reduce X a
- pop attributes for a
- compute attribute for X
- and push it on the stack
19Performing Semantic Actions. Example
- Recall the example from previous lecture
- E T E1 E.val T.val E1.val
- T E.val T.val
- T int T1 T.val int.val T1.val
- int T.val int.val
- Consider the parsing of the string 3 5 8
20Performing Semantic Actions. Example
- int int int shift
- int3 int int shift
- int3 int int shift
- int3 int5 int reduce T
int - int3 T5 int reduce T
int T - T15 int shift
- T15 int shift
- T15 int8 reduce T
int - T15 T8 reduce E
T - T15 E8 reduce E
T E - E23 accept
21Inherited Attributes
- Another kind of attribute
- Calculated from attributes of parent and/or
siblings in the parse tree - Example a line calculator
22A Line Calculator
- Each line contains an expression
- E ? int E E
- Each line is terminated with the sign
- L ? E E
- In second form the value of previous line is used
as starting value - A program is a sequence of lines
- P ? ? P L
23Attributes for the Line Calculator
- Each E has a synthesized attribute val
- Calculated as before
- Each L has a synthesized attribute val
- L ? E L.val E.val
- E L.val E.val L.prev
- We need the value of the previous line
- We use an inherited attribute L.prev
24Attributes for the Line Calculator (Cont.)
- Each P has a synthesized attribute val
- The value of its last line
- P ? ? P.val 0
- P1 L P.val L.val
- L.prev P1.val
- Each L has an inherited attribute prev
- L.prev is inherited from sibling P1.val
- Example
25Example of Inherited Attributes
P
- val synthesized
- prev inherited
- All can be computed in depth-first order
L
P
E3
?
0
E4
E5
2
int2
int3
3
26Semantic Actions Notes (Cont.)
- Semantic actions can be used to build ASTs
- And many other things as well
- Also used for type checking, code generation,
- Process is called syntax-directed translation
- Substantial generalization over CFGs
27Constructing An AST
- We first define the AST data type
- Supplied by us for the project
- Consider an abstract tree type with two
constructors
n
mkleaf(n)
PLUS
mkplus(
)
,
T1
T2
T1
T2
28Constructing a Parse Tree
- We define a synthesized attribute ast
- Values of ast values are ASTs
- We assume that int.lexval is the value of the
integer lexeme - Computed using semantic actions
- E ? int E.ast mkleaf(int.lexval)
- E1 E2 E.ast mkplus(E1.ast,
E2.ast) - ( E1 ) E.ast E1.ast
29Parse Tree Example
- Consider the string int5 ( int2 int3
) - A bottom-up evaluation of the ast attribute
- E.ast mkplus(mkleaf(5),
-
mkplus(mkleaf(2), mkleaf(3))
30Review
- We can specify language syntax using CFG
- A parser will answer whether s ? L(G)
- and will build a parse tree
- which we convert to an AST
- and pass on to the rest of the compiler
31Abtract Parse Trees Expression Grammar
- Abstract Syntax
- E -gt E E
- E -gt E E
- E -gt E E
- E -gt E / E
- E -gt id
- E -gt num
32AST Node types
- public abstract class Exp
- public abstract int eval()
-
- public class PlusExp extends Exp
- private Exp e1, e2
- public PlusExp(Exp a1, Exp a2) e1a1 d2a2
- public int eval()
- return e1.eval()e2.eval()
-
-
- public class Identifier extends Exp private
String f0 - public Indenfifier(String n0) f0 n0
- public int eval()
- return lookup(f0)
-
-
- public class IntegerLiteral extends Exp private
String f0 - public IntegerLiteral(String n0) f0 n0
- public int eval()
33JavaCC Example for AST construction
- Exp Start()
- Exp e
- eExp() return e
- Exp Exp()
- Exp e1, e2
- e1Term() ( e2Term() e1new
PlusExp(e1,e2) - - e2Term() e1new
MinusExp(e1,e2) ) - return a
-
- Exp Factor()
- Token t Exp e
- t ltIDENTIFIER gt return new
Identifier(t.image) - tltINTEGER_LITERALgt
- return new IntegerLiteral(t.ima
ge) - ( eExp() ) return e
34Positions
- Must remember the position in the source file
- Lexical analysis, parsing and semantic analysis
are not done simultaneously. - Necessary for error reporting
- AST must keep the pos fields, which indicate the
position within the original source file. - Lexer must pass the information to the parser.
- Ast node constructors must be augmented to init
the pos fields.
35JavaCC Class Token
- Each Token object has the following fields
- int kind
- int beginLine, beginColumn, endLine, endColumn
- String image
- Token next
- Token specialToken
- static final Token newToken(int ofKind)
- Unfortunately, .
36Visitors
- syntax separate from interpretation style of
programming - Vs. object-oriented style of programming
- Visitor pattern
- Visitor implements an interpretation.
- Visitor object contains a visit method for each
syntax-tree class. - Syntax-tree classes contain accept methods.
- Visitor calls accept(what is your class?). Then
accept calls the visit of the visitor.
37Example Expression Classes
- public abstract class Exp
- public abstract int accept(Visitor v)
-
- public class PlusExp extends Exp
- private Exp e1, e2
- public PlusExp(Exp a1, Exp a2) e1a1 d2a2
- public int accept(Visitor v) return
v.visit(this) -
- public class Identifier extends Exp private
String f0 - public Indenfifier(String n0) f0 n0
- public int accept(Visitor v) return
v.visit(this) -
- public class IntegerLiteral extends Exp private
String f0 - public IntegerLiteral(String n0) f0 n0
- public int accept(Visitor v) return
v.visit(this) -
38An interpreter visitor
- public interface Visitor
- public int visit(PlusExp n)
- public int visit(Identifier n)
- public int visit(IntegerLiteral n)
-
- public class Interpreter implements Visitor
- public int visit(PlusExp n)
- return n.e1.accept(this) n.e2.accept(this)
-
- public int visit(Identifier n)
- return looup(n.f0)
-
- public int visit(IntegerLiteral n)
- return Integer.parseInt(n.f0)
-
39Abstract Syntax for MiniJava (I)
- Package syntaxtree
- Program(MainClass m, ClassDecList c1)
- MainClass(Identifier i1, Identifier i2, Statement
s) - ----------------------------
- abstract class ClassDecl
- ClassDeclSimple(Identifier i, VarDeclList vl,
- methodDeclList m1)
- ClassDeclExtends(Identifier i, Identifier j,
- VarDecList vl, MethodDeclList
ml) - -----------------------------
- VarDecl(Type t, Identifier i)
- MethodDecl(Type t, Identifier I, FormalList fl,
- VariableDeclList vl, StatementList sl,
Exp e) - Formal(Type t, Identifier i)
40Abstract Syntax for MiniJava (II)
- abstract class type
- IntArrayType()
- BooleanType()
- IntegerType()
- IndentifierType(String s)
- ---------------------------
- abstract class Statement
- Block(StatementList sl)
- If(Exp e, Statement s1, Statement s2)
- While(Exp e, Statement s)
- Print(Exp e)
- Assign(Identifier i, Exp e)
- ArrayAssign(Identifier i, Exp e1, Exp e2)
- -------------------------------------------
41Abstract Syntax for MiniJava (III)
- abstract class Exp
- And(Exp e1, Exp e2) LessThan(Exp e1, Exp
e2) - Plus(Exp e1, Exp e2) Minus(Exp e1, Exp
e2) - Times(Exp e1, Exp e2) Not(Exp e)
- ArrayLookup(Exp e1, Exp e2) ArrayLength(Exp e)
- Call(Exp e, Identifier i, ExpList el)
- IntergerLiteral(int i)
- True() False()
- IdentifierExp(String s)
- This()
- NewArray(Exp e) NewObject(Identifier
i) - -------------------------------------------------
- Identifier(Sting s)
- --list classes-------------------------
- ClassDecList() ExpList() FormalList()
MethodDeclList() - StatementLIst() VarDeclList()
42Syntax Tree Nodes - Details
- package syntaxtree
- import visitor.Visitor
- import visitor.TypeVisitor
- public class Program
- public MainClass m
- public ClassDeclList cl
- public Program(MainClass am, ClassDeclList acl)
- mam clacl
-
- public void accept(Visitor v)
- v.visit(this)
-
- public Type accept(TypeVisitor v)
- return v.visit(this)
-
43ClassDecl.java
- package syntaxtree
- import visitor.Visitor
- import visitor.TypeVisitor
- public abstract class ClassDecl
- public abstract void accept(Visitor v)
- public abstract Type accept(TypeVisitor v)
44ClassDeclExtends.java
- package syntaxtree
- import visitor.Visitor
- import visitor.TypeVisitor
- public class ClassDeclExtends extends ClassDecl
- public Identifier i
- public Identifier j
- public VarDeclList vl
- public MethodDeclList ml
-
- public ClassDeclExtends(Identifier ai,
Identifier aj, - VarDeclList avl, MethodDeclList
aml) - iai jaj vlavl mlaml
-
- public void accept(Visitor v)
- v.visit(this)
-
- public Type accept(TypeVisitor v)
- return v.visit(this)
45StatementList.java
- package syntaxtree
- import java.util.Vector
- public class StatementList
- private Vector list
- public StatementList()
- list new Vector()
-
- public void addElement(Statement n)
- list.addElement(n)
-
- public Statement elementAt(int i)
- return (Statement)list.elementAt(i)
-
- public int size()
- return list.size()
-
46Package Visitor/visitor.java
- package visitor
- import syntaxtree.
- public interface Visitor
- public void visit(Program n) public void
visit(MainClass n) - public void visit(ClassDeclSimple n) public
void visit(ClassDeclExtends n) - public void visit(VarDecl n) public void
visit(MethodDecl n) - public void visit(Formal n) public void
visit(IntArrayType n) - public void visit(BooleanType n) public void
visit(IntegerType n) - public void visit(IdentifierType n) public
void visit(Block n) - public void visit(If n) public void
visit(While n) - public void visit(Print n) public void
visit(Assign n) - public void visit(ArrayAssign n) public void
visit(And n) - public void visit(LessThan n) public void
visit(Plus n) - public void visit(Minus n) public void
visit(Times n) - public void visit(ArrayLookup n) public void
visit(ArrayLength n) - public void visit(Call n) public void
visit(IntegerLiteral n) - public void visit(True n) public void
visit(False n) - public void visit(IdentifierExp n) public
void visit(This n)
47X y.m(1,45)
- Statement -gt AssignmentStatement
- AssignmentStatement -gt Identfier1 Expression
- Identifier1 -gt ltIDENTIFIERgt
- Expression -gt Expression1 . Identifier2 ( (
ExpList)? ) - Expression1 -gt IdentifierExp
- IdentifierExp -gt ltIDENTIFIERgt
- Identifier2 -gt ltIDENTIFIERgt
- ExpList -gt Expression2 ( , Expression3 )
- Expression2 -gt ltINTEGER_LITERALgt
- Expression3 -gt PlusExp -gt Expression
Expression - -gt ltINTEGER_LITERALgt ,
ltINTEGER_LITERALgt
48AST
Statement s -gt
Assign (Identifier,Exp)
Identifier(x)
Call(Exp,Identifier,ExpList)
init
IdentifierExp(y)
Identifier(m)
ExpList e1
add
IntegerLiteral(1)
add
Plus(Exp,Exp)
(IntegerLiteral(5)
IntegerLiteral(4)
49MiniJava Grammar(I)
- Program -gt MainClass ClassDecl
- Program(MainClass, ClassDeclList)
- Program Goal()
- MainClass m ClassDeclList cl new
ClassDeclList() - ClassDecl c
-
- m MainClass() (c ClassDecl()
cl.addElement(c)) - ltEOFgt return new Program(m,cl)
-
50MiniJava Grammar(II)
- MainClass -gt class id public static void main
( String id )Â Â - Â Â Â Â Statement
- MainClass(Identifier, VarDeclList)
- ClassDecl -gt class id VarDecl MethodDecl
- -gt class id extends id
VarDecl MethodDecl - ClassDeclSimple(),
ClassDecExtends() - VarDecl -gt Type id
- VarDecl(Type, Identifier)
- MethodDecl -gt public Type id ( FormalList )
- Â Â Â Â Â Â VarDecl
Statement return Exp - MethodDecl(Type,Identifier,FormalList,VarD
eclList - StaementList, Exp)
51MiniJava Grammar(III)
- FormalList -gt Type id FormalRest
- -gt
- FormalRest -gt , Type id
- Type -gt int
- -gt  boolean
- -gt  int
- -gt  id
52MiniJava Grammar(IV)
- Statement -gt Statement Â
- -gt if ( Exp ) Statement
else Statement - -gt while ( Exp ) Statement Â
- -gt System.out.println ( Exp
) Â - -gt id Exp
- -gt id Exp Exp
- ExpList -gt Exp ExpRest
- -gt
- Â ExpRest -gt , Exp
53MiniJava Grammar(V)
- Exp -gt Exp op Exp
- -gt  Exp Exp Â
- -gt Exp . length
- -gt  Exp . Id ( ExpList ) Â
- -gt INTEGER_LITERALÂ
- -gt true
- -gt falseÂ
- -gt idÂ
- -gt this
- -gt new int Exp
- -gt new id ( )
- -gt  ! Exp
- -gt  ( Exp )
54References
- Andrew W. Appel, Modern Compiler Implementation
in Java (2nd Edition), Cambridge University
Press, 2002 - http//compiler.kaist.ac.kr/courses/cs420/classtps
/Chapter05.pps - Modern Compiler Design, Scott Galles, Scott Jones