Parsing - PowerPoint PPT Presentation

About This Presentation
Title:

Parsing

Description:

Parsing Compiler Baojian Hua bjhua_at_ustc.edu.cn Front End Parsing The parser translates the source program into abstract syntax trees Token sequence: from the lexer ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 63
Provided by: Baoji1
Category:
Tags: case | parsing | syntax | theory

less

Transcript and Presenter's Notes

Title: Parsing


1
Parsing
  • Compiler
  • Baojian Hua
  • bjhua_at_ustc.edu.cn

2
Front End
lexical analyzer
source code
tokens
abstract syntax tree
parser
semantic analyzer
IR
3
Parsing
  • The parser translates the source program into
    abstract syntax trees
  • Token sequence
  • from the lexer
  • abstract syntax trees
  • check validity of programs
  • cook compiler internal data structures for
    programs
  • Must take account the program syntax

4
Conceptually
parser
token sequence
abstract syntax tree
language syntax
5
Syntax Context-free Grammar
  • Context-free grammars are (often) given by BNF
    expressions (Backus-Naur Form)
  • read Dragon sec 2.2
  • More powerful than RE in theory
  • Good for defining language syntax

6
Context-free Grammar (CFG)
  • A CFG consists of 4 components
  • a set of terminals (tokens) T
  • a set of nonterminals N
  • a set of production rules P
  • s -gt t1 t2 tn
  • with s?N, and t1, , tn ?(T?N)
  • a unique start nonterminal S

7
Example
  • // Recall the min-ML language in code3
  • // (simplified)
  • N decs, dec, exp
  • T SEMICOLON, VAL, ID, ASSIGN, NUM
  • S decs
  • decs -gt dec SEMICOLON decs
  • dec -gt VAL ID ASSIGN exp
  • exp -gt ID
  • NUM

8
Derivation
  • A derivation
  • Starts with the unique start nonterminal S
  • repeatedly replacing a right-hand nonterminal s
    by the body of a production rule of the
    nonterminal s
  • stop when right-hand are all terminals
  • The final string consists of terminals only and
    is called a sentence (program)

9
Example
  • decs -gt dec SEMICOLON decs
  • dec -gt VAL ID ASSIGN exp
  • exp -gt ID
  • NUM

decs -gt (a choice)
derive me
val x 5 val y x
10
Example
  • decs -gt dec SEMICOLON decs
  • dec -gt VAL ID ASSIGN exp
  • exp -gt ID
  • NUM

decs -gt dec SEMICOLON decs -gt VAL ID ASSIGN
exp SEMICOLON decs -gt VAL ID ASSIGN NUM
SEMICOLON decs -gt VAL ID ASSIGN NUM
SEMICOLON dec SEMICOLON decs -gt
-gt VAL ID ASSIGN NUM SEMICOLON VAL
ID ASSIGN ID SEMICOLON decs
derive me
val x 5 val y x
11
Another Way to Derive the same Program
  • decs -gt dec SEMICOLON decs
  • dec -gt VAL ID ASSIGN exp
  • exp -gt ID
  • NUM

decs -gt dec SEMICOLON decs -gt dec SEMICOLON
dec SEMICOLON decs -gt
derive me
val x 5 val y x
12
Derivation
  • For same string, there may exist many derivations
  • left-most derivation
  • right-most derivation
  • Parsing is the problem of taking a string of
    terminals and figure out whether it could be
    derived from a CFG
  • error-detection

13
Parse Trees
  • Derivation can also be represented as trees
  • useful to understand AST (discussed later)
  • Idea
  • each internal node is labeled with a non-terminal
  • each leaf node is labeled with a terminal
  • each use of a rule in a derivation explains how
    to generate children in the parse tree from the
    parents

14
Example
  • decs -gt dec SEMICOLON decs
  • dec -gt VAL ID ASSIGN exp
  • exp -gt ID
  • NUM

decs
SEMI
dec
decs
derive me
VAL

exp
dec
SEMI
decs
ID
val x 5 val y x
5
similar case
15
Different Derivations, same Tree
decs -gt dec SEMICOLON decs -gt VAL ID ASSIGN
exp SEMICOLON decs -gt
decs -gt dec SEMICOLON decs -gt dec SEMICOLON
dec SEMICOLON decs -gt
decs
SEMI
dec
decs
derive me
VAL

exp
dec
SEMI
decs
ID
val x 5 val y x
5
similar case
16
Parse Tree has Meaningspost-order traversal
decs -gt dec SEMICOLON decs -gt VAL ID ASSIGN
exp SEMICOLON decs -gt
decs -gt dec SEMICOLON decs -gt dec SEMICOLON
dec SEMICOLON decs -gt
decs
SEMI
dec
decs
derive me
VAL

exp
dec
SEMI
decs
ID
val x 5 val y x
5
similar case
17
Ambiguous Grammars
  • A grammar is ambiguous if the same sequence of
    tokens can give rise to two or more different
    parse trees

18
Example
  • exp -gt num
  • -gt id
  • -gt exp exp
  • -gt exp exp

exp -gt exp exp -gt 3 exp -gt 3 exp
exp -gt 3 4 exp -gt 3 4 5
derive me
exp -gt exp exp -gt exp exp exp -gt 3
exp exp -gt 3 4 exp -gt 3 4 5
345
19
Example
exp
exp

exp
3
exp

exp
  • exp -gt num
  • -gt id
  • -gt exp exp
  • -gt exp exp

4
5
exp -gt exp exp -gt 3 exp -gt 3 exp
exp -gt 3 4 exp -gt 3 4 5
exp
exp

exp
exp -gt exp exp -gt exp exp exp -gt 3
exp exp -gt 3 4 exp -gt 3 4 5
5
exp

exp
3
4
20
Ambiguous Grammars
  • Problem compilers make use of parse trees to
    interpret the meaning of parsed programs
  • different parse trees have different meanings
  • eg 4 5 6 is not (4 5) 6
  • languages with ambiguous grammars are DISASTROUS
    the meaning of programs isnt well-defined! You
    cant tell what your program might do!
  • Solution rewrite grammar to equivalent forms

21
Eliminating ambiguity
  • In programming language syntax, ambiguity often
    arises from missing operator precedence or
    associativity
  • is of high precedence than
  • both and are left-associative
  • Why or why not?
  • Rewrite grammar to take account of this

22
Example
  • exp -gt num
  • -gt id
  • -gt exp exp
  • -gt exp exp

exp -gt exp term -gt term term -gt term
factor -gt factor factor -gt num -gt id
Q is the right grammar ambiguous? Why or why not?
23
Parser
  • A program to check whether a program is derivable
    from a given grammar
  • expensive in general
  • must be fast
  • to compile a 2000k lines of kernel
  • even for small application code
  • Theorists have developed specialized kind of
    grammar which may be parsed efficiently
  • LL(k) and LR(k)

24
Predictive parsing
  • A.K.A Recursive descent parsing, top-down
    parsing
  • simple to code by hand
  • efficient
  • can parse a large set of grammar
  • Key idea
  • one (recursive) function for each nonterminal
  • one clause for each right-hand production rule

25
Example
  • decs -gt dec SEMICOLON decs
  • dec -gt VAL ID ASSIGN exp
  • exp -gt ID
  • NUM

( step 1 represent tokens ) datatype token
Val Id of string Num of int Assign
Semicolon Eof ( step 2 connect with lexer
) token current ref getToken () fun advance
() current getToken () fun eat (token t)
if !current t then advance () else
error (want , t, but got , !current)
26
  • decs -gt dec SEMICOLON decs
  • dec -gt VAL ID ASSIGN exp
  • exp -gt ID
  • NUM

( step 1 represent tokens ) datatype token
Val Id of string Num of int Assign Semi
Eof ( step 2 connect with lexer ) token
current ref getToken () fun advance ()
current getToken () fun eat (token t) (
step 3 build the parser ) fun parseDecs()
case !current of VAL gt parseDec () eat
(Semi) parseDecs () EOF gt () _ gt
error (want VAL or EOF) fun parseDec () fun
parseExp ()
27
Moral
  • The key point in predicative parsing is to
    determine the production rule to use (recursive
    function to call)
  • must know the start symbols of each rule
  • start symbol must not overlap
  • ex exp -gt NUM ID
  • This motivates the idea of first and follow sets

28
Moral
  • Current nonterminal is S, and the current input
    token is t
  • if wk starts with t, then choose wk, or
  • if wk derives empty string, and the string follow
    S starts with t
  • First symbol sets of wi (1ltiltn) dont overlap
    to avoid backtracking
  • S -gt w1
  • -gt w2
  • -gt
  • -gt wn

29
Nullable, First and Follow sets
  • To use predicative parsing, we must compute
  • Nullable nonterminals that derive empty string
  • First(?) set of terminals that can begin any
    string derivable from ?
  • Follow(X) set of terminals that can immediately
    follow any string derivable from nonterminal X
  • Read Dragon sec 4.4.2 and Tiger sec 3.2
  • Fixpoint algorithms

30
Nullable, First and Follow sets
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
  • Which symbol X, Y and Z can derive empty string?
  • What terminals may the string derived from X, Y
    and Z begin with?
  • What terminals may follow X, Y and Z?

31
Nullable
  • If X can derive an empty string, iff
  • base case
  • X -gt
  • inductive case
  • X -gt Y1 Yn
  • Y1, , Yn are n nonterminals and may all derive
    empty strings

32
Computing Nullable
  • Nullable lt-
  • while (F still change)
  • for (each production X -gt a)
  • switch (a)
  • case ?
  • Nullable ? X
  • break
  • case Y1 Yn
  • if (Y1?Nullable Yn?Nullable)
  • Nullable ? X
  • break

33
Example Nullables
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Round 0 1 2
F
34
Example Nullables
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Round 0 1 2
F Y, X
35
Example Nullables
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Round 0 1 2
F Y, X Y, X
36
First(X)
  • Set of terminals that X begins with
  • X gt a
  • Rules
  • base case
  • X -gt a
  • First (X) ? a
  • inductive case
  • X -gt Y1 Y2 Yn
  • First (X) ? First(Y1)
  • if Y1?Nullable, First (X) ? First(Y2)
  • if Y1,Y2 ?Nullable, First (X) ? First(Y3)

37
Computing First
  • // Suppose Nullable has been computed
  • First(X) lt- // for each X
  • while (First still change)
  • for (each production X -gt a)
  • switch (a)
  • case a
  • First(X) ? a
  • break
  • case Y1 Yn
  • First(X) ? First(Y1)
  • if (Y1\not\in Nullable)
  • break
  • First(X) ? First(Y1)
  • // Similar as above

38
Example First
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Nullable X, Y
Round 0 1 2 3
First(Z)
First(Y)
First(X)
39
Example First
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Nullable X, Y
Round 0 1 2 3
First(Z) d
First(Y) c
First(X) c, a
40
Example First
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Nullable X, Y
Round 0 1 2 3
First(Z) d d, c, a
First(Y) c c
First(X) c, a c, a
41
Example First
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Nullable X, Y
Round 0 1 2 3
First(Z) d d, c, a d, c, a
First(Y) c c c
First(X) c, a c, a c, a
42
Parsing with First
Z -gt d d -gt X Y Z a, c, d Y -gt c
c -gt X -gt Y c -gt a
a
Now consider this string d Suppose we choose
the production Z -gt X Y Z But we get stuck at X
-gt Y -gt a neither can accept d! Why?
First(Z) d, c, a
First(Y) c
First(X) c, a
Nullable X, Y
43
Follow(X)
  • Set of terminals that may follow X
  • S gt X a
  • Rules
  • Base case
  • Follow (X)
  • inductive case
  • Y -gt ?1 X ?2
  • Follow(X) ? Fisrt(?2)
  • if ?2 is Nullable, Follow(X) ? Follow(Y)

44
Computing Follow(X)
  • Follow(X) lt-
  • while (Follow still change)
  • for (each production Y -gt ?1 X ?2 )
  • Follow(X) ? First (?2)
  • if (?2 is Nullable)
  • Follow(X) ? Follow (Y)

45
Example Follow
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Nullable X, Y
Round 0 1 2 3
First(Z) Follow(Z) d, c, a
First(Y) Follow(Y) c
First(X) Follow(X) c, a
46
Example Follow
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Nullable X, Y
Round 0 1 2 3
First(Z) Follow(Z) d, c, a
First(Y) Follow(Y) c d, c, a
First(X) Follow(X) c, a d, c, a
47
Example Follow
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Nullable X, Y
Round 0 1 2 3
First(Z) Follow(Z) d, c, a
First(Y) Follow(Y) c d, c, a d, c, a
First(X) Follow(X) c, a d, c, a d, c, a
48
Predicative Parsing Table
  • With Nullables, First(), and Follow(), we can
    make a parsing table P(N,T)
  • each entry contains a set of productions

t1 t2 t3 t4
(EOF) N1 ri N2
rk N3 rj
49
Predicative Parsing Table
  • For each rule X -gt ?
  • for each a?First(?), add X -gt ? to P(X, a)
  • if X is nullable, add X -gt ? to P(X, b) for each
    b ? Follow (X)
  • all other entries are error

t1 t2 t3 t4
(EOF) N1 r1 N2
rk N3 ri
50
Example Predicative Parsing Table
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Nullable X, Y
a c d
Z Z-gtX Y Z Z-gtX Y Z Z-gtd Z-gtX Y Z
Y Y-gt Y-gtc Y-gt Y-gt
X X-gtY X-gta X-gtY X-gtY
First(X) Follow(X) c, a c, d, a
First(Y) Follow(Y) c c, d, a
First(Z) Follow(Z) d, c, a
51
Example Predicative Parsing Table
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Nullable X, Y
a c d
Z Z-gtX Y Z Z-gtX Y Z Z-gtd Z-gtX Y Z
Y Y-gt Y-gtc Y-gt Y-gt
X X-gtY X-gta X-gtY X-gtY
First(X) Follow(X) c, a c, d, a
First(Y) Follow(Y) c c, d, a
First(Z) Follow(Z) d, c, a
52
LL(1)
  • A context-free grammar is called LL(1) if it can
    be parsed this way
  • Left-to-right parsing
  • Leftmost derivation
  • 1 token lookahead
  • This means that in the predicative parsing table,
    there is at most one production in every entry

53
Speeding up set Construction
  • All these sets (Nullable, First, Follow) can be
    computed simultaneously
  • see Tiger algorithm 3.13
  • Order the computation
  • Whats the optimal order to compute these set?

54
Example Speeding up set Construction
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Nullable X, Y
Round 0 1 2 3
First(Z)
First(Y)
First(X)
Q1 Whats reasonable order here?
Q2 How to set this order?
55
Directed Graph Model
Z -gt d -gt X Y Z Y -gt c -gt X -gt Y -gt a
Nullable X, Y
d, c, a
c
Y
Z
Q1 Whats reasonable order here?
X
c, a
Q2 How to set this order?
Order Y X Z
56
Reverse Topological Sort
  • Quasi-topological sort the directed graph
  • Quasi topo-sort general directed graph is
    impossible
  • also known as reverse depth-first ordering
  • Reverse information (First) flows from
    successors to predecessors
  • Refer to your favorite algorithm book

57
Problem
  • LL(1) can only be used with grammars in which
    every production rules for a nonterminal start
    with different terminals
  • Unfortunately, many grammars dont have this
    perfect property

58
Example
  • exp -gt num
  • -gt id
  • -gt exp exp
  • -gt exp exp

exp -gt exp term -gt term term -gt term
factor -gt factor factor -gt num -gt id
Q is the right grammar LL(1)? Why or why not?
59
Solutions
  • Left-recursion elimination
  • Left-factoring
  • Read
  • dragon sec4.3.2, 4.3.3, 4.3.4
  • tiger sec3.2

60
Example
exp -gt term exp exp -gt term exp -gt
term -gt factor term term-gt factor term
-gt factor -gt num -gt id
exp -gt exp term -gt term term -gt term
factor -gt factor factor -gt num -gt id
Q is the right grammar LL(1)? are those two
grammars equivalent?
61
LL(k)
  • LL(1) can be further generalized to LL(k)
  • Left-to-right parsing
  • Leftmost derivation
  • k token lookahead
  • Q table size? other problems with this approach?

62
Summary
  • Context-free grammar is a math tool for
    specifying language syntax
  • and others
  • Writing parsers for general grammar is hard and
    costly
  • LL(k) and LR(k)
  • LL(1) grammars can be implemented efficiently
  • table-driven algorithms (again!)
Write a Comment
User Comments (0)
About PowerShow.com