Programming Language Concepts CIS 635 - PowerPoint PPT Presentation

1 / 67

About This Presentation

Title:

Programming Language Concepts CIS 635

Description:

Lexer: generating tokens from string or character stream ... Lexer called from parser. Elsa L. Gunter. Recursive Decent Parsing ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 68

Provided by: me6105

Category:

more less

Transcript and Presenter's Notes

Title: Programming Language Concepts CIS 635

1
Programming Language Concepts (CIS 635)

Elsa L Gunter
4303 GITC
NJIT, http//www.cs.njit.edu/elsa/635-spring2004

2
Attribute Grammars

Attribute grammars add to BNF grammars an
additional field with a function describing the
meaning (attributes) of the construct described
by the BNF rule
Attributes can be used to describe an interpreter
or even a simple compiler
Usually used to describe abstract syntax trees to
be generated

3
Example Atrribute Grammar

ltSumgt 0
value(ltSumgt) 0
ltSum gt 1
value(ltSumgt) 1
ltSumgt ltSumgt ltSumgt
value(ltSum1gt) value(ltSum2gt) value(ltSum3gt)
ltSumgt (ltSumgt)
value (ltSum1gt) value(ltSum2gt)

4
Attribute Grammars

An inherited attribute describes the meaning of
the nonterminals on the right in the rule in
terms of the nonterminal on the left
A synthesized attribute describes the meaning of
the left-hand nonterminal in terms of the
nonterminals on the right

5
YACC

The input to YACC (a parser generator for C) is
basically an attribute grammar with only
synthesized attributes (inherited attributes can
be handled using global variables (typically
tables))
ML-YACC is a version of YACC that produces SML
code instead of C code

6
Parsing Programs

Parsing is the process of tracing or constructing
a parse tree for a given input string
Process usually broken into two phases
Lexer generating tokens from string or character
stream
Parser generating parse tree from token list or
stream
Lexer called from parser

7
Recursive Decent Parsing

Recursive decent parsers are a class of parsers
derived fairly directly from BNF grammar
A recursive descent parser traces out a parse
tree in top-down order it is a top-down parser

8
Recursive Decent Parsing

Each nonterminal in the grammar has a subprogram
associated with it the subprogram parses all
phrases that the nonterminal can generate
Each nonterminal in right-hand side of a rule
corresponds to a recursive call to the
associated subprogram

9
Recursive Decent Parsing

Each subprogram must be able to decide how to
begin parsing by looking at the left-most
character in the string to be parsed
May do so directly, or indirectly by calling
another parsing subprogram
Recursive descent parsers, like other top-down
parsers, cannot be built from left-recursive
grammars

10
Sample Grammar

ltexprgt lttermgt lttermgt ltexprgt
lttermgt - ltexprgt
lttermgt ltfactorgt ltfactorgt lttermgt
ltfactorgt / lttermgt
ltfactorgt ltidgt ( ltexprgt )

11
Tokens as SML Datatypes

- / ( ) ltidgt
Becomes an SML datatype
datatype token
Id_token of string
Left_parenthesis Right_parenthesis
Times_token Divide_token
Plus_token Minus_token

12
Parse Trees as Datatypes

ltexprgt lttermgt lttermgt ltexprgt
lttermgt - ltexprgt
datatype Expr
Term_as_Expr of Term
Plus_Expr of (Term Expr)
Minus_Expr of (Term Expr)

13
Parse Trees as Datatypes

lttermgt ltfactorgt ltfactorgt lttermgt
ltfactorgt / lttermgt
and Term
Factor_as_Term of Factor
Mult_Term of (Factor Term)
Div_Term of (Factor Term)

14
Parse Trees as Datatypes

ltfactorgt ltidgt ( ltexprgt )
and Factor
Id_as_Factor of string
Parenthesized_Expr_as_Factor of Expr

15
Parsing Lists of Tokens

Will create three mutually recursive functions
expr token list -gt (Expr token list)
term token list -gt (Term token list)
factor token list -gt (Factor token list)
Each parses what it can and gives back parse and
remaining tokens

16
Parsing an Expression

ltexprgt lttermgt ( - ) ltexprgt
fun expr tokens
(case term tokens
of ( term_parse , tokens_after_term) gt
(case tokens_after_term
of ( Plus_token tokens_after_plus)
gt

17
Parsing an Expression

ltexprgt lttermgt ( - ) ltexprgt
fun expr tokens
(case term tokens
of ( term_parse , tokens_after_term) gt
(case tokens_after_term
of ( Plus_token tokens_after_plus)
gt

18
Parsing a Plus Expression

ltexprgt lttermgt ( - ) ltexprgt
fun expr tokens
(case term tokens
of ( term_parse , tokens_after_term) gt
(case tokens_after_term
of ( Plus_token tokens_after_plus)
gt

19
Parsing a Plus Expression

ltexprgt lttermgt ltexprgt
(case expr tokens_after_plus
of ( expr_parse , tokens_after_expr) gt
( Plus_Expr ( term_parse , expr_parse ),
tokens_after_expr))

20
Parsing a Plus Expression

ltexprgt lttermgt ltexprgt
(case expr tokens_after_plus
of ( expr_parse , tokens_after_expr) gt
( Plus_Expr ( term_parse , expr_parse ),
tokens_after_expr))

21
Building Plus Expression Parse Tree

ltexprgt lttermgt ltexprgt
(case expr tokens_after_plus
of ( expr_parse , tokens_after_expr) gt
( Plus_Expr ( term_parse , expr_parse ),
tokens_after_expr))

22
Parsing a Minus Expression

ltexprgt lttermgt - ltexprgt
( Minus_token tokens_after_minus) gt
(case expr tokens_after_minus
of ( expr_parse , tokens_after_expr) gt
( Minus_Expr ( term_parse , expr_parse ),
tokens_after_expr))

23
Parsing a Minus Expression

ltexprgt lttermgt - ltexprgt
( Minus_token tokens_after_minus) gt
(case expr tokens_after_minus
of ( expr_parse , tokens_after_expr) gt
( Minus_Expr ( term_parse , expr_parse ),
tokens_after_expr))

24
Parsing an Expression as a Term

ltexprgt lttermgt
_ gt (Term_as_Expr term_parse ,
tokens_after_term)))
Code for term is same except for replacing
addition with multiplication and subtraction with
division

25
Parsing Factor as Id

ltfactorgt ltidgt
and factor (Id_token id_name tokens)
( Id_as_Factor id_name, tokens)

26
Parsing Factor as Parenthesized Expression

ltfactorgt ( ltexprgt )
factor ( Left_parenthesis tokens)
(case expr tokens
of ( expr_parse , tokens_after_expr) gt

27
Parsing Factor as Parenthesized Expression

ltfactorgt ( ltexprgt )
(case tokens_after_expr
of Right_parenthesis tokens_after_rparen gt
( Parenthesized_Expr_as_Factor expr_parse ,
tokens_after_rparen)))

28
( a b ) c - d

expr Left_parenthesis, Id_token "a", Plus_token,
Id_token "b",Right_parenthesis, Times_token,
Id_token "c", Minus_token, Id_token "d"

29
( a b ) c - d

val it (Minus_Expr (Mult_Term
(Parenthesized_Expr_as_Factor
(Plus_Expr (Factor_as_Term
(Id_as_Factor "a"), Term_as_Expr
(Factor_as_Term (Id_as_Factor "b")))),
Factor_as_Term (Id_as_Factor "c")),
Term_as_Expr (Factor_as_Term (Id_as_Factor
"d"))),) Expr token list

30
( a b ) c d

ltexprgt
lttermgt - ltexprgt
ltfactorgt lttermgt lttermgt
( ltexprgt ) ltfactorgt
ltfactorgt
lttermgt ltexprgt ltidgt ltidgt
ltfactorgt lttermgt c d
ltidgt ltfactorgt
a ltidgt
b

31
a b c d

expr Id_token "a", Plus_token, Id_token "b",
Times_token, Id_token "c", Minus_token,
Id_token "d"
val it (Plus_Expr (Factor_as_Term
(Id_as_Factor "a"), Minus_Expr
(Mult_Term (Id_as_Factor "b",
Factor_as_Term (Id_as_Factor "c")),
Term_as_Expr (Factor_as_Term (Id_as_Factor
"d")))),) Expr token list

32
a b c d

ltexprgt
lttermgt ltexprgt
ltfactorgt lt termgt - ltexprgt
ltidgt ltfactorgt lttermgt lttermgt
a ltidgt ltfactorgt
ltfactorgt
b ltidgt
ltidgt
c
d

33
( a b c - d

expr Left_parenthesis, Id_token "a", Plus_token,
Id_token "b", Times_token, Id_token "c",
Minus_token, Id_token "d"uncaught
exception nonexhaustive match failure raised
at arith_exp.sml94.12
Cant parse because it was expecting a right
parenthesis but it got to the end without
finding one

34
a b ) c - d )

expr Id_token "a", Plus_token, Id_token "b",
Right_parenthesis, Times_token, Id_token "c",
Minus_token, Id_token "d"
val it (Plus_Expr (Factor_as_Term
(Id_as_Factor "a"), Term_as_Expr
(Factor_as_Term (Id_as_Factor "b"))),
Right_parenthesis,Times_token,Id_token
"c",Minus_token,Id_token "d") Expr token
list

35
Error Cases?

What if factor doesnt find an id token or a left
parenthesis when it starts?
What if it doesnt find a right parenthesis after
the expression?

36
Streams in Place of Lists

More realistically, we don't want to create the
entire list of tokens before we can start parsing
We want to generate one token at a time and use
it to make one step in parsing
Will use
(token option (unit -gt token option)
in place of token list

37
Parsing an Expression

ltexprgt lttermgt ( - ) ltexprgt
fun expr tokens
(case term tokens
of ( SOME term_parse ,
tokens_after_term) gt
(case tokens_after_term
of ( SOME Plus_token,
tokens_after_plus) gt

38
Parsing a Plus Expression

ltexprgt lttermgt ltexprgt
fun expr tokens
(case term tokens
of ( SOME term_parse ,
tokens_after_term) gt
(case tokens_after_term
of ( SOME Plus_token ,
tokens_after_plus) gt

39
Parsing a Plus Expression

ltexprgt lttermgt ltexprgt
(case expr (tokens_after_plus(),
tokens_after_plus)
of ( SOME expr_parse,
tokens_after_expr) gt
( SOME ( Plus_Expr (term_parse,
expr_parse)),
tokens_after_expr)

40
Parsing a Plus Expression

ltexprgt lttermgt ltexprgt
(case expr (tokens_after_plus(),
tokens_after_plus)
of ( SOME expr_parse,
tokens_after_expr) gt
( SOME ( Plus_Expr (term_parse,
expr_parse)),
tokens_after_expr)

41
Building Plus Expression Parse Tree

ltexprgt lttermgt ltexprgt
(case expr (tokens_after_plus(),
tokens_after_plus)
of ( SOME expr_parse,
tokens_after_expr) gt
( SOME ( Plus_Expr ( term_parse,
expr_parse)),
tokens_after_expr)

42
What If No Expression After Plus

ltexprgt lttermgt ltexprgt
( NONE ,rem_tokens) gt
( NONE , rem_tokens))
Code for Minus_token is almost identical

43
What If No Plus or Minus

ltexprgt lttermgt
_ gt ( SOME (Term_as_Expr term_parse) ,
tokens_after_term))

44
What if No Term

exprgt lttermgt ( - ) ltexprgt
( NONE , rem_tokens) gt
( NONE , rem_tokens))
Code for term is same as for expr except for
replacing addition with multiplication and
subtraction with division

45
Parsing Factor as Id

ltfactorgt ltidgt
and factor (SOME (Id_token id_name) ,
tokens)
(SOME (Id_as_Factor id_name),
(tokens(), tokens))

46
Parsing Factor as Parenthesized Expression

ltfactorgt ( ltexprgt )
factor (SOME Left_parenthesis ,
tokens)
(case expr (tokens(), tokens)
of (SOME expr_parse,
tokens_after_expr) gt

47
Parsing Factor as Parenthesized Expression

ltfactorgt ( ltexprgt )
(case tokens_after_expr
of ( SOME Right_parenthesis ,
tokens_after_rparen ) gt
(SOME (Parenthesized_Expr_as_Factor
expr_parse),
(tokens_after_rparen(),tokens_after_rparen))

48
What if No Right Parenthesis

ltfactorgt ( ltexprgt )
_ gt (NONE, tokens_after_expr))

49
What If No Expression After Left Parenthesis

ltfactorgt ( ltexprgt )
( NONE , rem_tokens) gt
( NONE , rem_tokens))

50
What If No Id or Left Parenthesis

ltfactorgt ltidgt ( ltexprgt )
factor tokens (NONE, tokens)

51
Parsing Factor as Id

ltfactorgt ltidgt
and factor (SOME (Id_token id_name) ,
tokens)
( true , (tokens(), tokens))

52
Parsing - in C

Assume global variable currentToken that holds
the latest token removed from token stream
Assume subroutine lex( ) to analyze the character
stream, find the next token at the head of that
stream and update currentToken with that token
Assume subroutine error( ) to raise an exception

53
Parsing expr in C

ltexprgt lttermgt ( - ) ltexprgt
void expr ( )
term ( )
if (nextToken PLUS_CODE)
lex ( )
expr ( )
else if (nextToken MINUS_CODE)
lex ( )
expr ( )

54
SML Code

fun expr tokens
(case term tokens
of ( true , tokens_after_term) gt
(case tokens_after_term
of (SOME Plus_token,tokens_after_plus) gt
(case expr (tokens_after_plus(),
tokens_after_plus)
of ( true , tokens_after_expr) gt
( true , tokens_after_expr)

55
Parsing expr in C (optimized)

ltexprgt lttermgt ( - ) ltexprgt
void expr ( )
term( )
while (nextToken PLUS_CODE
nextToken MINUS_CODE)
lex ( )
term ( )

56
Parsing factor in C

ltfactorgt ltidgt
void factor ( )
if (nextToken ID_CODE)
lex ( )

57
Parsing factor in C

ltfactorgt ( ltexprgt )
else if (nextToken
LEFT_PAREN_CODE)
lex ( )
expr ( )
if (nextToken
RIGHT_PAREN_CODE)
lex

58
Comparable SML Code

factor (SOME Left_parenthesis , tokens)
(case expr (tokens(), tokens)
of ( true , tokens_after_expr) gt
(case tokens_after_expr
of ( SOME Right_parenthesis ,
tokens_after_rparen ) gt
( true , (tokens_after_rparen(),
tokens_after_rparen))

59
Parsing factor in C

else
error ( )
/ Right parenthesis missing /
else
error ( )
/ Neither ltidgt nor ( was found at start /

60
Error cases in SML

( No right parenthesis )
_ gt ( false , tokens_after_expr))
( No expression found )
( false , rem_tokens) gt
( false , rem_tokens))
( Neither ltidgt nor left parenthesis found )
factor tokens ( false , tokens)

61
Lexers Simple Parsers

Lexers are parsers driven by regular grammars
Use character codes and arithmetic comparisons
rather than case analysis to determine syntactic
category for each character
Often some semantic action must be taken
Compute a number or build a string and record it
in a symbol table

62
Example

ltposgt ltdigitgt ltposgt ltdigitgt
ltdigitgt 0 1 2 3 4 5 6 7 8 9
fun digit c
(case Char.ord c
of n gt if n gt 0 andalso n lt 9
then SOME n
else NONE)

63
Example

fun pos (ccs)
(case digit c
of SOME m gt
(case pos cs
of SOME(p, n) gt mpn
NONE gt SOME(10,m)
NONE gt NONE)

64
Problems for Recursive-Descent Parsing

Left Recursion
A Aw
translates to a subroutine that loops forever
Indirect Left Recursion
A Bw
B Av
causes the same problem

65
Problems for Recursive-Descent Parsing

Parser must always be able to choose the next
action based only only the next very next token
Pairwise disjointedness Test Can we always
determine which rule (in the non-extended BNF) to
choose based on just the first token

66
Pairwise Disjointedness Test