Title: Programming Language Concepts CIS 635
1Programming Language Concepts (CIS 635)
- Elsa L Gunter
- 4303 GITC
- NJIT, http//www.cs.njit.edu/elsa/635-spring2004
2Attribute Grammars
- Attribute grammars add to BNF grammars an
additional field with a function describing the
meaning (attributes) of the construct described
by the BNF rule - Attributes can be used to describe an interpreter
or even a simple compiler - Usually used to describe abstract syntax trees to
be generated
3Example Atrribute Grammar
- ltSumgt 0
- value(ltSumgt) 0
- ltSum gt 1
- value(ltSumgt) 1
- ltSumgt ltSumgt ltSumgt
- value(ltSum1gt) value(ltSum2gt) value(ltSum3gt)
- ltSumgt (ltSumgt)
- value (ltSum1gt) value(ltSum2gt)
4Attribute Grammars
- An inherited attribute describes the meaning of
the nonterminals on the right in the rule in
terms of the nonterminal on the left - A synthesized attribute describes the meaning of
the left-hand nonterminal in terms of the
nonterminals on the right
5YACC
- The input to YACC (a parser generator for C) is
basically an attribute grammar with only
synthesized attributes (inherited attributes can
be handled using global variables (typically
tables)) - ML-YACC is a version of YACC that produces SML
code instead of C code
6Parsing Programs
- Parsing is the process of tracing or constructing
a parse tree for a given input string - Process usually broken into two phases
- Lexer generating tokens from string or character
stream - Parser generating parse tree from token list or
stream - Lexer called from parser
7Recursive Decent Parsing
- Recursive decent parsers are a class of parsers
derived fairly directly from BNF grammar - A recursive descent parser traces out a parse
tree in top-down order it is a top-down parser
8Recursive Decent Parsing
- Each nonterminal in the grammar has a subprogram
associated with it the subprogram parses all
phrases that the nonterminal can generate - Each nonterminal in right-hand side of a rule
corresponds to a recursive call to the
associated subprogram
9Recursive Decent Parsing
- Each subprogram must be able to decide how to
begin parsing by looking at the left-most
character in the string to be parsed - May do so directly, or indirectly by calling
another parsing subprogram - Recursive descent parsers, like other top-down
parsers, cannot be built from left-recursive
grammars
10Sample Grammar
- ltexprgt lttermgt lttermgt ltexprgt
- lttermgt - ltexprgt
- lttermgt ltfactorgt ltfactorgt lttermgt
- ltfactorgt / lttermgt
- ltfactorgt ltidgt ( ltexprgt )
11Tokens as SML Datatypes
- - / ( ) ltidgt
- Becomes an SML datatype
- datatype token
- Id_token of string
- Left_parenthesis Right_parenthesis
- Times_token Divide_token
- Plus_token Minus_token
12Parse Trees as Datatypes
- ltexprgt lttermgt lttermgt ltexprgt
- lttermgt - ltexprgt
- datatype Expr
- Term_as_Expr of Term
- Plus_Expr of (Term Expr)
- Minus_Expr of (Term Expr)
13Parse Trees as Datatypes
- lttermgt ltfactorgt ltfactorgt lttermgt
- ltfactorgt / lttermgt
- and Term
- Factor_as_Term of Factor
- Mult_Term of (Factor Term)
- Div_Term of (Factor Term)
14Parse Trees as Datatypes
- ltfactorgt ltidgt ( ltexprgt )
- and Factor
- Id_as_Factor of string
- Parenthesized_Expr_as_Factor of Expr
15Parsing Lists of Tokens
- Will create three mutually recursive functions
- expr token list -gt (Expr token list)
- term token list -gt (Term token list)
- factor token list -gt (Factor token list)
- Each parses what it can and gives back parse and
remaining tokens
16Parsing an Expression
- ltexprgt lttermgt ( - ) ltexprgt
- fun expr tokens
- (case term tokens
- of ( term_parse , tokens_after_term) gt
- (case tokens_after_term
- of ( Plus_token tokens_after_plus)
gt
17Parsing an Expression
- ltexprgt lttermgt ( - ) ltexprgt
- fun expr tokens
- (case term tokens
- of ( term_parse , tokens_after_term) gt
- (case tokens_after_term
- of ( Plus_token tokens_after_plus)
gt
18Parsing a Plus Expression
- ltexprgt lttermgt ( - ) ltexprgt
- fun expr tokens
- (case term tokens
- of ( term_parse , tokens_after_term) gt
- (case tokens_after_term
- of ( Plus_token tokens_after_plus)
gt
19Parsing a Plus Expression
- ltexprgt lttermgt ltexprgt
- (case expr tokens_after_plus
- of ( expr_parse , tokens_after_expr) gt
- ( Plus_Expr ( term_parse , expr_parse ),
- tokens_after_expr))
20Parsing a Plus Expression
- ltexprgt lttermgt ltexprgt
- (case expr tokens_after_plus
- of ( expr_parse , tokens_after_expr) gt
- ( Plus_Expr ( term_parse , expr_parse ),
- tokens_after_expr))
21Building Plus Expression Parse Tree
- ltexprgt lttermgt ltexprgt
- (case expr tokens_after_plus
- of ( expr_parse , tokens_after_expr) gt
- ( Plus_Expr ( term_parse , expr_parse ),
- tokens_after_expr))
22Parsing a Minus Expression
- ltexprgt lttermgt - ltexprgt
- ( Minus_token tokens_after_minus) gt
- (case expr tokens_after_minus
- of ( expr_parse , tokens_after_expr) gt
- ( Minus_Expr ( term_parse , expr_parse ),
- tokens_after_expr))
23Parsing a Minus Expression
- ltexprgt lttermgt - ltexprgt
- ( Minus_token tokens_after_minus) gt
- (case expr tokens_after_minus
- of ( expr_parse , tokens_after_expr) gt
- ( Minus_Expr ( term_parse , expr_parse ),
- tokens_after_expr))
24Parsing an Expression as a Term
- ltexprgt lttermgt
- _ gt (Term_as_Expr term_parse ,
tokens_after_term))) - Code for term is same except for replacing
addition with multiplication and subtraction with
division
25Parsing Factor as Id
- ltfactorgt ltidgt
- and factor (Id_token id_name tokens)
- ( Id_as_Factor id_name, tokens)
26Parsing Factor as Parenthesized Expression
- ltfactorgt ( ltexprgt )
- factor ( Left_parenthesis tokens)
- (case expr tokens
- of ( expr_parse , tokens_after_expr) gt
27Parsing Factor as Parenthesized Expression
- ltfactorgt ( ltexprgt )
- (case tokens_after_expr
- of Right_parenthesis tokens_after_rparen gt
- ( Parenthesized_Expr_as_Factor expr_parse ,
tokens_after_rparen)))
28( a b ) c - d
- expr Left_parenthesis, Id_token "a", Plus_token,
Id_token "b",Right_parenthesis, Times_token,
Id_token "c", Minus_token, Id_token "d"
29( a b ) c - d
- val it (Minus_Expr (Mult_Term
(Parenthesized_Expr_as_Factor
(Plus_Expr (Factor_as_Term
(Id_as_Factor "a"), Term_as_Expr
(Factor_as_Term (Id_as_Factor "b")))),
Factor_as_Term (Id_as_Factor "c")),
Term_as_Expr (Factor_as_Term (Id_as_Factor
"d"))),) Expr token list
30( a b ) c d
- ltexprgt
- lttermgt - ltexprgt
- ltfactorgt lttermgt lttermgt
- ( ltexprgt ) ltfactorgt
ltfactorgt - lttermgt ltexprgt ltidgt ltidgt
- ltfactorgt lttermgt c d
- ltidgt ltfactorgt
- a ltidgt
- b
31a b c d
- expr Id_token "a", Plus_token, Id_token "b",
Times_token, Id_token "c", Minus_token,
Id_token "d" - val it (Plus_Expr (Factor_as_Term
(Id_as_Factor "a"), Minus_Expr
(Mult_Term (Id_as_Factor "b",
Factor_as_Term (Id_as_Factor "c")),
Term_as_Expr (Factor_as_Term (Id_as_Factor
"d")))),) Expr token list
32a b c d
- ltexprgt
- lttermgt ltexprgt
- ltfactorgt lt termgt - ltexprgt
- ltidgt ltfactorgt lttermgt lttermgt
- a ltidgt ltfactorgt
ltfactorgt - b ltidgt
ltidgt - c
d
33( a b c - d
- expr Left_parenthesis, Id_token "a", Plus_token,
Id_token "b", Times_token, Id_token "c",
Minus_token, Id_token "d"uncaught
exception nonexhaustive match failure raised
at arith_exp.sml94.12 - Cant parse because it was expecting a right
parenthesis but it got to the end without
finding one
34a b ) c - d )
- expr Id_token "a", Plus_token, Id_token "b",
Right_parenthesis, Times_token, Id_token "c",
Minus_token, Id_token "d" - val it (Plus_Expr (Factor_as_Term
(Id_as_Factor "a"), Term_as_Expr
(Factor_as_Term (Id_as_Factor "b"))),
Right_parenthesis,Times_token,Id_token
"c",Minus_token,Id_token "d") Expr token
list
35Error Cases?
- What if factor doesnt find an id token or a left
parenthesis when it starts? - What if it doesnt find a right parenthesis after
the expression?
36Streams in Place of Lists
- More realistically, we don't want to create the
entire list of tokens before we can start parsing - We want to generate one token at a time and use
it to make one step in parsing - Will use
- (token option (unit -gt token option)
- in place of token list
37Parsing an Expression
- ltexprgt lttermgt ( - ) ltexprgt
- fun expr tokens
- (case term tokens
- of ( SOME term_parse ,
- tokens_after_term) gt
- (case tokens_after_term
- of ( SOME Plus_token,
- tokens_after_plus) gt
38Parsing a Plus Expression
- ltexprgt lttermgt ltexprgt
- fun expr tokens
- (case term tokens
- of ( SOME term_parse ,
- tokens_after_term) gt
- (case tokens_after_term
- of ( SOME Plus_token ,
- tokens_after_plus) gt
39Parsing a Plus Expression
- ltexprgt lttermgt ltexprgt
- (case expr (tokens_after_plus(),
tokens_after_plus) - of ( SOME expr_parse,
- tokens_after_expr) gt
- ( SOME ( Plus_Expr (term_parse,
- expr_parse)),
- tokens_after_expr)
40Parsing a Plus Expression
- ltexprgt lttermgt ltexprgt
- (case expr (tokens_after_plus(),
tokens_after_plus)
- of ( SOME expr_parse,
- tokens_after_expr) gt
- ( SOME ( Plus_Expr (term_parse,
- expr_parse)),
- tokens_after_expr)
41Building Plus Expression Parse Tree
- ltexprgt lttermgt ltexprgt
- (case expr (tokens_after_plus(),
tokens_after_plus) - of ( SOME expr_parse,
- tokens_after_expr) gt
- ( SOME ( Plus_Expr ( term_parse,
-
expr_parse)), - tokens_after_expr)
42What If No Expression After Plus
- ltexprgt lttermgt ltexprgt
- ( NONE ,rem_tokens) gt
- ( NONE , rem_tokens))
- Code for Minus_token is almost identical
43What If No Plus or Minus
- ltexprgt lttermgt
- _ gt ( SOME (Term_as_Expr term_parse) ,
- tokens_after_term))
44What if No Term
- exprgt lttermgt ( - ) ltexprgt
- ( NONE , rem_tokens) gt
- ( NONE , rem_tokens))
- Code for term is same as for expr except for
replacing addition with multiplication and
subtraction with division
45Parsing Factor as Id
- ltfactorgt ltidgt
- and factor (SOME (Id_token id_name) ,
- tokens)
- (SOME (Id_as_Factor id_name),
- (tokens(), tokens))
46Parsing Factor as Parenthesized Expression
- ltfactorgt ( ltexprgt )
- factor (SOME Left_parenthesis ,
- tokens)
- (case expr (tokens(), tokens)
- of (SOME expr_parse,
- tokens_after_expr) gt
47Parsing Factor as Parenthesized Expression
- ltfactorgt ( ltexprgt )
- (case tokens_after_expr
- of ( SOME Right_parenthesis ,
- tokens_after_rparen ) gt
- (SOME (Parenthesized_Expr_as_Factor
- expr_parse),
(tokens_after_rparen(),tokens_after_rparen))
48What if No Right Parenthesis
- ltfactorgt ( ltexprgt )
- _ gt (NONE, tokens_after_expr))
49What If No Expression After Left Parenthesis
- ltfactorgt ( ltexprgt )
- ( NONE , rem_tokens) gt
- ( NONE , rem_tokens))
50What If No Id or Left Parenthesis
- ltfactorgt ltidgt ( ltexprgt )
- factor tokens (NONE, tokens)
51Parsing Factor as Id
- ltfactorgt ltidgt
- and factor (SOME (Id_token id_name) ,
- tokens)
- ( true , (tokens(), tokens))
52Parsing - in C
- Assume global variable currentToken that holds
the latest token removed from token stream - Assume subroutine lex( ) to analyze the character
stream, find the next token at the head of that
stream and update currentToken with that token - Assume subroutine error( ) to raise an exception
53Parsing expr in C
- ltexprgt lttermgt ( - ) ltexprgt
- void expr ( )
- term ( )
- if (nextToken PLUS_CODE)
- lex ( )
- expr ( )
- else if (nextToken MINUS_CODE)
- lex ( )
- expr ( )
54SML Code
- fun expr tokens
- (case term tokens
- of ( true , tokens_after_term) gt
- (case tokens_after_term
- of (SOME Plus_token,tokens_after_plus) gt
- (case expr (tokens_after_plus(),
tokens_after_plus) - of ( true , tokens_after_expr) gt
- ( true , tokens_after_expr)
55Parsing expr in C (optimized)
- ltexprgt lttermgt ( - ) ltexprgt
- void expr ( )
- term( )
- while (nextToken PLUS_CODE
- nextToken MINUS_CODE)
- lex ( )
- term ( )
-
56Parsing factor in C
- ltfactorgt ltidgt
- void factor ( )
- if (nextToken ID_CODE)
- lex ( )
57Parsing factor in C
- ltfactorgt ( ltexprgt )
- else if (nextToken
- LEFT_PAREN_CODE)
- lex ( )
- expr ( )
- if (nextToken
- RIGHT_PAREN_CODE)
- lex
58Comparable SML Code
- factor (SOME Left_parenthesis , tokens)
- (case expr (tokens(), tokens)
- of ( true , tokens_after_expr) gt
- (case tokens_after_expr
- of ( SOME Right_parenthesis ,
- tokens_after_rparen ) gt
- ( true , (tokens_after_rparen(),
- tokens_after_rparen))
59Parsing factor in C
- else
- error ( )
- / Right parenthesis missing /
-
- else
- error ( )
- / Neither ltidgt nor ( was found at start /
-
60Error cases in SML
- ( No right parenthesis )
- _ gt ( false , tokens_after_expr))
- ( No expression found )
- ( false , rem_tokens) gt
- ( false , rem_tokens))
- ( Neither ltidgt nor left parenthesis found )
- factor tokens ( false , tokens)
61Lexers Simple Parsers
- Lexers are parsers driven by regular grammars
- Use character codes and arithmetic comparisons
rather than case analysis to determine syntactic
category for each character - Often some semantic action must be taken
- Compute a number or build a string and record it
in a symbol table
62Example
- ltposgt ltdigitgt ltposgt ltdigitgt
- ltdigitgt 0 1 2 3 4 5 6 7 8 9
- fun digit c
- (case Char.ord c
- of n gt if n gt 0 andalso n lt 9
- then SOME n
- else NONE)
63Example
- fun pos (ccs)
- (case digit c
- of SOME m gt
- (case pos cs
- of SOME(p, n) gt mpn
- NONE gt SOME(10,m)
- NONE gt NONE)
64Problems for Recursive-Descent Parsing
- Left Recursion
- A Aw
- translates to a subroutine that loops forever
- Indirect Left Recursion
- A Bw
- B Av
- causes the same problem
65Problems for Recursive-Descent Parsing
- Parser must always be able to choose the next
action based only only the next very next token - Pairwise disjointedness Test Can we always
determine which rule (in the non-extended BNF) to
choose based on just the first token
66Pairwise Disjointedness Test
- For each rule
- A y
- Calculate
- FIRST (y) a y gt aw ? ? if y gt ?
- For each pair of rules A y and A z,
require FIRST(y) ? FIRST(z) - Test too strong Cant handle
- ltexprgt lttermgt ( - ) ltexprgt
67Example
- Grammar
- ltSgt ltAgt a ltBgt b
- ltAgt ltAgt b b
- ltBgt a ltBgt a
- FIRST (ltAgt b) b
- FIRST (b) b
- Rules for ltAgt not pairwise disjoint