Title: Syntax Analysis
1Syntax Analysis
- Mooly Sagiv
- html//www.math.tau.ac.il/msagiv/courses/wcc01.ht
ml - TextbookModern Compiler Implementation in C
- Chapter 3
2A motivating example
- Create a desk calculator
- Challenges
- Non trivial syntax
- Recursive expressions (semantics)
- Operator precedence
3Solution (lexical analysis)
/ desk.l / 0-9 yylval
atoi(yytext) return NUM
return PLUS - return MINUS /
return DIV return MUL (
return LPAR ) return RPAR
//.\n / comment / \t\n
/ whitespace / . error (illegal
symbol, yytext0)
4Solution (syntax analysis)
/ desk.y / token NUM left PLUS, MINUS left
MUL, DIV start P P E printf(d\n,
1) E NUM 1 LPAR e
RPAR 2 e PLUS e 1
3 e MINUS e 1 - 3
e MUL e 1 3 e
DIV e 1 / 3 include
lex.yy.c
flex desk.l
bison desk.y
cc y.tab.c ll -ly
5Solution (syntax analysis)
a.out ltinput
// input 7 5 3
22
6Subjects
- The task of syntax analysis
- Automatic generation
- Error handling
- Context free grammars and derivations
- Ambiguous Grammars
- Predictive Parsing (sketch only)
- Bottom-up syntax analysis
- Bison A parser generator
Next week
7Basic Compiler Phases
Source program (string)
Front-End
lexical analysis
Tokens
syntax analysis
Abstract syntax tree
semantic analysis
Back-End
Fin. Assembly
8Syntax Analysis (Parsing)
- input
- Sequence of tokens
- output
- Abstract Syntax Tree
- Report syntax errors
- unbalanced parenthesizes
- Create symbol-table
- Create pretty-printed version of the program
- In some cases the tree need not be generated
(one-pass compilers)
9Handling Syntax Errors
- Report and locate the error
- Diagnose the error
- Correct the error
- Recover from the error in order to discover more
errors - without reporting too many strange errors
10Example
a a ( b c d
11The Valid Prefix Property
- For every prefix tokens
- t1, t2, , ti that the parser identifies as
legal - there exists tokens ti1, ti2, , tnsuch that
t1, t2, , tnis a syntactically valid program - If every token is considered as a single
character - For every prefix word u that the parser
identifies as legal - there exists a word w such that
- u.w is a valid program
12Error Diagnosis
- Line number
- may be far from the actual error
- The current token
- The expected tokens
- Parser configuration
13Error Recovery
- Becomes less important in interactive
environments - Example heuristics
- Search for a semi-column and ignore the statement
- Try to replace tokens for common errors
- Refrain from reporting 3 subsequent errors
- Globally optimal solutions
- For every input w, find a valid program w with
a minimal-distance from w
14Designing a parser
language design
context-free grammar design
Bison
parser (c program)
15Context Free Grammar (Review)
- What is a grammar
- Derivations and Parsing Trees
- Ambiguous grammars
- Resolving ambiguity
16Context Free Grammars
- Non-terminals
- Start non-terminal
- Terminals (tokens)
- Context Free RulesltNon-Terminalgt ? Symbol Symbol
Symbol
17Example Context Free Grammar
1 S ? S S 2 S ? id E 3 S ? print (L) 4
E ? id 5 E ? num 6 E ? E E 7 E ? (S, E) 8
L ? E 9 L ? L, E
18Derivations
- Show that a sentence is in the grammar (valid
program) - Start with the start symbol
- Repeatedly replace one of the non-terminals by a
right-hand side of a production - Stop when the sentence contains terminals only
- Rightmost derivation
- Leftmost derivation
19Example Derivations
S
S S
1 S ? S S 2 S ? id E 3 S ? print (L) 4
E ? id 5 E ? num 6 E ? E E 7 E ? (S, E) 8
L ? E 9 L ? L, E
S id E
id E id E
id num id E
id num id E E
id num id E num
id num id num num
a
56
b
77
16
20Parse Trees
- The trace of a derivation
- Every internal node is labeled by a non-terminal
- Each symbol is connected to the deriving
non-terminal
21Example Parse Tree
S
s
S S
s
s
S id E
id E id E
id
E
id
E
id num id E
id
E
id
E
id num id E E
id num id E num
num
num
id num id num num
22Ambiguous Grammars
- Two leftmost derivations
- Two rightmost derivations
- Two parse trees
23A Grammar for Arithmetic Expressions
1 E ? E E 2 E ? E E 3 E ? id 4 E ? (E)
24Non Ambiguous Grammarfor Arithmetic Expressions
Ambiguous grammar
- E ? E T
- E ? T
- T ? T F
- T ? F
- 5 F ? id
- 6 F ? (E)
1 E ? E E 2 E ? E E 3 E ? id 4 E ? (E)
25Non Ambiguous Grammarsfor Arithmetic Expressions
Ambiguous grammar
- E ? E T
- E ? T
- T ? T F
- T ? F
- 5 F ? id
- 6 F ? (E)
- E ? E T
- E ? T
- T ? F T
- T ? F
- 5 F ? id
- 6 F ? (E)
1 E ? E E 2 E ? E E 3 E ? id 4 E ? (E)
26Efficient Parsers
- Pushdown automata
- Deterministic
- Report an error as soon as the input is not a
prefix of a valid program - Not usable for all context free grammars
bison
Ambiguity errors
parse tree
27Kinds of Parsers
- Top-Down (Predictive Parsing) LL
- Construct parse tree in a top-down matter
- Find the leftmost derivation
- For every non-terminal and token predict the next
production - Bottom-Up LR
- Construct parse tree in a bottom-up manner
- Find the rightmost derivation in a reverse order
- For every potential right hand side and token
decide when a production is found
28Example Grammar for Predictive Parsing
1 S ? if E then S else S 2 S ? begin
S L 3 S ? print (E) 4 L ? end 5 L ?
S L 6 E ? num num
29Predictive Parser(utility functions)
enum token IF, THEN, ELSE, BEGIN, END, PRINT,
SEMI, NUM, EQ, LP, RP extern enum
token getToken(void) void advance(void) tok
getToken() void eat(enum token t) if
(tokt) advance() else error()
30Predictive Parser (S)
void S(void) switch(tok) case IF
eat(IF) E()
eat(THEN) S()
eat(ELSE) S() break
case BEGIN eat(BEGIN)
S() L() break case PRINT
eat(PRINT) eat(LP)
E() eat(RP)
break default error(tok,
Expecting if, begin, or print')
1 S ? if E then S else S 2 S ? begin
S L 3 S ? print (E)
31Predictive Parser (L)
void L(void) switch(tok) case END
eat(END) break case SEMI eat(SEMI) S()
L() break default error(tok,
Expecting end or semicolumn'')
4 L ? end 5 L ? S L
32Predictive Parser (E)
void E(void) switch(tok) case NUM
eat(NUM) eat(EQ)
eat(NUM)
break default error(tok, Expecting a
number)
6 E ? num num
33Predictive Parser for Arithmetic Expressions
- E ? E T
- E ? T
- T ? T F
- T ? F
- 5 F ? id
- 6 F ? (E)
34Summary
- Context free grammars provide a natural way to
define the syntax of programming languages - Ambiguity may be resolved
- Predictive parsing is natural
- Good error messages
- Natural error recovery
- But not expressive enough
- Next lesson LR bottom-up parsing