Title: Functional Programming
1Functional Programming
- The design of the imperative languages is based
directly on the von Neumann architecture - Efficiency is the primary concern, rather than
the suitability of the language for software
development - The design of the functional languages is based
on mathematical functions - A solid theoretical basis that is also closer to
the user, but relatively unconcerned with the
architecture of the machines on which programs
will run
2Mathematical Functions
- A mathematical function is a mapping of members
of one set, called the domain set, to another
set, called the range set - A lambda expression specifies the parameter(s)
and the mapping of a function in the following
form - ?(x) x x x
- for the function cube (x) xxx
3Lambda Expressions
- Lambda expressions describe nameless functions
- Lambda expressions are applied to parameter(s) by
placing the parameter(s) after the expression - e.g., (?(x) x x x)(2)
- which evaluates to 8
4Functional Forms
- A higher-order function, or functional form, is
one that either takes functions as parameters or
yields a function as its result, or both
5Function Composition
- A functional form that takes two functions as
parameters and yields a function whose value is
the first actual parameter function applied to
the application of the second - Form h ? f g
- which means h (x) ? f ( g ( x))
- For f (x) ? x 2 and g (x) ? 3 x,
- h ? f g yields (3 x) 2
6Apply-to-all
- A functional form that takes a single function as
a parameter and yields a list of values obtained
by applying the given function to each element of
a list of parameters - Form ?
- For h (x) ? x x
- ?( h, (2, 3, 4)) yields (4, 9, 16)
7Fundamentals of Functional Programming Languages
- The objective of the design of a FPL is to mimic
mathematical functions to the greatest extent
possible
8Fundamentals of Functional Programming Languages
- The basic process of computation is fundamentally
different in a FPL than in an imperative language - In an imperative language, operations are done
and the results are stored in variables for later
use - Management of variables is a constant concern and
source of complexity for imperative programming - In an FPL, variables are not necessary, as is the
case in mathematics
9Referential Transparency
- In an FPL, the evaluation of a function always
produces the same result given the same parameters
10Lexical and Syntax Analysis
11Lexical analysis
- lexical analyzer strips off all the comments
- tokens smallest units of programming language
- tokens can be recognized by regular expressions
12Lexical analysis
- integer constant regular expression
- 0 9 (one or more occurrences)
- ?
- 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
- a regular expression matches a string of
characters (or not)
13Regular expressions
- (all these are tokens)
- a (single letter)
- matches a and nothing else
- a b c (set)
- matches any style char inside the
brackets - a zA Z (ranges)
- matches all upper and lower case letters
14Regular expressions
- if r and s are regular expressions, then so are
the following - (r) same as r
- rs (concatenation)
- matches something matching r,
immediately followed by something
matching s) - rs (or)
- matches anything matching either r or s
- r matches one or more matches of r in a
row - r matches zero or more matches of r in a
row - r? matches zero or one matches of r
15Precedence in regular expressions
- Highest ( )
-
- , , ?
- rs ? juxtaposition
- rs
16Regular expressions
- Variable names
- A Za z _ A Za z _ 0 9
- integer constants
- 0 9
- real constants
- 0 9.0 9(eE-?0 9) Pascal
- ? ? ? ? ? ?
- 3 . 14 e 0 3.14e0
- in C (this is allowed)
- .314e1
- 314.e2
- 314.e-2
17Lexical analysis
- Language implementation systems must analyze
source code, regardless of the specific
implementation approach - Nearly all syntax analysis is based on a formal
description of the syntax of the source language
(BNF)
18Syntax Analysis
- The syntax analysis portion of a language
processor nearly always consists of two parts - A low-level part called a lexical analyzer
(mathematically, a finite automaton based on a
regular grammar) - A high-level part called a syntax analyzer, or
parser (mathematically, a push-down automaton
based on a context-free grammar, or BNF)
19Using BNF to Describe Syntax
- Provides a clear and concise syntax description
- The parser can be based directly on the BNF
- Parsers based on BNF are easy to maintain
20Reasons to Separate Lexical and Syntax Analysis
- Simplicity - less complex approaches can be used
for lexical analysis separating them simplifies
the parser - Efficiency - separation allows optimization of
the lexical analyzer - Portability - parts of the lexical analyzer may
not be portable, but the parser always is portable
21Lexical Analysis
- A lexical analyzer is a pattern matcher for
character strings - A lexical analyzer is a front-end for the
parser - Identifies substrings of the source program that
belong together - lexemes - Lexemes match a character pattern, which is
associated with a lexical category called a token - sum is a lexeme its token may be IDENT
22Lexical Analysis
- The lexical analyzer is usually a function that
is called by the parser when it needs the next
token - Three approaches to building a lexical analyzer
- Write a formal description of the tokens and use
a software tool that constructs table-driven
lexical analyzers given such a description - Design a state diagram that describes the tokens
and write a program that implements the state
diagram - Design a state diagram that describes the tokens
and hand-construct a table-driven implementation
of the state diagram
23State Diagram Design
- A naïve state diagram would have a transition
from every state on every character in the source
language - such a diagram would be very large!
24Lexical Analysis (cont.)
- In many cases, transitions can be combined to
simplify the state diagram - When recognizing an identifier, all uppercase and
lowercase letters are equivalent - Use a character class that includes all letters
- When recognizing an integer literal, all digits
are equivalent - use a digit class
25Lexical Analysis
- Reserved words and identifiers can be recognized
together (rather than having a part of the
diagram for each reserved word) - Use a table lookup to determine whether a
possible identifier is in fact a reserved word
26Lexical Analysis
- Convenient utility subprograms
- getChar - gets the next character of input, puts
it in nextChar, determines its class and puts the
class in charClass - addChar - puts the character from nextChar into
the place the lexeme is being accumulated, lexeme - lookup - determines whether the string in lexeme
is a reserved word (returns a code)
27State Diagram
28Lexical Analysis
- Implementation (assume initialization)
- int lex()
- getChar()
- switch (charClass)
- case LETTER
- addChar()
- getChar()
- while (charClass LETTER charClass
DIGIT) -
- addChar()
- getChar()
-
- return lookup(lexeme)
- break
-
29Lexical Analysis
-
- case DIGIT
- addChar()
- getChar()
- while (charClass DIGIT)
- addChar()
- getChar()
-
- return INT_LIT
- break
- / End of switch /
- / End of function lex /
30The Parsing Problem
- Goals of the parser, given an input program
- Find all syntax errors for each, produce an
appropriate diagnostic message, and recover
quickly - Produce the parse tree, or at least a trace of
the parse tree, for the program
31The Parsing Problem
- Two categories of parsers
- Top down - produce the parse tree, beginning at
the root - Order is that of a leftmost derivation
- Traces or builds the parse tree in preorder
- Bottom up - produce the parse tree, beginning at
the leaves - Order is that of the reverse of a rightmost
derivation - Parsers look only one token ahead in the input
32Top-down parsing
- Builds the parse tree from the root down
- Easier to write a top-down parser by hand
- Root node ? leaves
- Abstract ? concrete
- Uses grammar left ? right
- Works by "guessing"
33Bottom-up parsing
- Parse tree is built in bottom up order
(post-order) - yacc, bison bottom-up parser creator
- read from bottom
- Leaves ? root node
- Concrete ? abstract
- Uses grammar right ? left
- Works by "pattern matching"
34Parsing
- A top down parser traces or builds the parse tree
in preorder. - A preorder traversal of a parse tree begins with
the root. - Each node is visited before its branches are
followed. Branches are followed in left-to-right
order. - Leftmost derivation.
- Right recursive
- A bottom-up parser constructs a parse tree by
beginning at the leaves and progressing toward
the root. - Shift-reduce
- Rightmost derivation.
- Left recursive
35Parsing
- Gt S ? e Gb S ? e
- S ? aS S ? Sa
- S ? bS S ? Sb
S
S
b
S
a
S
a
S
a
S
b
S
b
S
Gt
a
S
Gb
a
S
a
S
S
b
e
e
36The Parsing Problem
- Top-down Parsers
- Given a sentential form, xA? , the parser must
choose the correct A-rule to get the next
sentential form in the leftmost derivation, using
only the first token produced by A - The most common top-down parsing algorithms
- Recursive descent - a coded implementation
- LL parsers - table driven implementation
37The Parsing Problem
- Bottom-up parsers
- Given a right sentential form, ?, determine what
substring of ? is the right-hand side of the rule
in the grammar that must be reduced to produce
the previous sentential form in the right
derivation - The most common bottom-up parsing algorithms are
in the LR family
38The Parsing Problem
- The Complexity of Parsing
- Parsers that work for any unambiguous grammar are
complex and inefficient ( O(n3)), where n is the
length of the input ) - Compilers use parsers that only work for a subset
of all unambiguous grammars, but do it in linear
time ( O(n)), where n is the length of the input )
39Recursive-Descent Parsing
- There is a subprogram for each nonterminal in the
grammar, which can parse sentences that can be
generated by that nonterminal - EBNF is ideally suited for being the basis for a
recursive-descent parser, because EBNF minimizes
the number of nonterminals
40Recursive-Descent Parsing
- A grammar for simple expressions
- ltexprgt ? lttermgt ( -) lttermgt
- lttermgt ? ltfactorgt ( /) ltfactorgt
- ltfactorgt ? id ( ltexprgt )
41Recursive-Descent Parsing
- Assume we have a lexical analyzer named lex,
which puts the next token code in nextToken - The coding process begins when there is only one
RHS - For each terminal symbol in the RHS, compare it
with the next input token if they match,
continue, else there is an error - For each nonterminal symbol in the RHS, call its
associated parsing subprogram
42Recursive-Descent Parsing
- / Function expr
- Parses strings in the language
- generated by the rule
- ltexprgt ? lttermgt ( -) lttermgt
- /
- void expr()
- / Parse the first term /
-
- term()
-
43Recursive-Descent Parsing
- / As long as the next token is or -, call
- lex to get the next token, and parse the
- next term /
-
- while (nextToken PLUS_CODE
- nextToken MINUS_CODE)
- lex()
- term()
-
-
- This particular routine does not detect errors
- Convention Every parsing routine leaves the next
token in nextToken
44Recursive-Descent Parsing
- A nonterminal that has more than one RHS requires
an initial process to determine which RHS it is
to parse - The correct RHS is chosen on the basis of the
next token of input (the lookahead) - The next token is compared with the first token
that can be generated by each RHS until a match
is found - If no match is found, it is a syntax error
45Recursive-Descent Parsing
- / Function factor
- Parses strings in the language
- generated by the rule
- ltfactorgt -gt id (ltexprgt) /
- void factor()
- / Determine which RHS /
- if (nextToken) ID_CODE)
- / For the RHS id, just call lex /
- lex()
46Recursive-Descent Parsing
- / If the RHS is (ltexprgt) call lex to pass
- over the left parenthesis, call expr, and
- check for the right parenthesis /
- else if (nextToken LEFT_PAREN_CODE)
- lex()
- expr()
- if (nextToken RIGHT_PAREN_CODE)
- lex()
- else
- error()
- / End of else if (nextToken ... /
- else error() / Neither RHS matches /
-
47Recursive-Descent Parsing
- The LL Grammar Class
- The Left Recursion Problem
- If a grammar has left recursion, either direct or
indirect, it cannot be the basis for a top-down
parser - A grammar can be modified to remove left recursion
48Top Down Parsing
- ltexprgt ltexprgt lttermgt lttermgt
- lttermgt lttermgt ltfactorgt ltfactorgt
- ltfactorgt '(' ltexprgt ')' num ident
- Note Knowing something about lexical analysis we
can define num and ident as terminal symbols - Exact definition of num and ident are details
left to lexical analysis
49Start
ltexprgt
1 2 3
50First Guess
ltexprgt
lttermgt
ltexprgt
1 2 3
51Second Guess
ltexprgt
lttermgt
ltexprgt
lttermgt
ltexprgt
1 2 3
52Third Guess
ltexprgt
lttermgt
ltexprgt
lttermgt
ltexprgt
lttermgt
ltexprgt
1 2 3
53Fourth Guess
ltexprgt
lttermgt
ltexprgt
lttermgt
ltexprgt
lttermgt
ltexprgt
lttermgt
ltexprgt
1 2 3
54Maybe we just guessed poorly?
55Should have picked lttermgt
- ltexprgt ltexprgt lttermgt lttermgt
- lttermgt lttermgt ltfactorgt ltfactorgt
- ltfactorgt '(' ltexprgt ')' num ident
ltexprgt
lttermgt
When we reach a bad choice we just back up and
try again...
ltfactorgt
ltidentgt
1 2 3
56Should have picked lttermgt
ltexprgt
lttermgt
- ltexprgt ltexprgt lttermgt lttermgt
- lttermgt lttermgt ltfactorgt ltfactorgt
- ltfactorgt '(' ltexprgt ')' num ident
ltfactorgt
ltnumgt
1 2 3
57Problem
- Grammar as written is not type that can be used
successfully with top-down parsing - Grammar contains left-recursive productions
ltexprgt ltexprgt lttermgt lttermgt lttermgt
lttermgt ltfactorgt factorgt ltfactorgt'('ltexprgt')
' num ident
58Recursion
- Recall recursion
- Check for terminating condition
- Recurse
- Not
- Recurse
- Check for terminating condition
- Fix is to make grammar right-recursive.
59Making it right-recursive
- ltexprgt ltexprgt lttermgt lttermgt
ltexprgt
ltexprgt
ltexprgt
lttermgt
lttermgt
ltexprgt
lttermgt
ltexprgt
lttermgt
lttermgt
ltexprgt
lttermgt
lttermgt
lttermgt lttermgt
lttermgt lttermgt lttermgt
60Parsing Problem
- G3 ltexpr, term, factor, 0, 1, .., 9, , -,
, /, (, ), expr, Pgt where P is as follows - ltexprgt ? ltexprgt lttermgt ltexprgt lttermgt
lttermgt - lttermgt ? lttermgt ltfactorgt lttermgt / ltfactorgt
ltfactorgt - ltfactorgt ? id num ( ltexprgt )
61Parsing Problem
- Top-down grammar for arithmetic expression
- G4 ltexpr, e_tail, term, t_tail, F, 0, 1, ..,
9, , -, , /, (, ), expr, Pgt where P is as
follows - ltexprgt ? lttermgtlte_tailgt
- lte_tailgt ? e lttermgtlte_tailgt - lttermgt
lte_tailgt e_tail means possibly more terms - lttermgt ? ltfactorgtltt_tailgt
- ltt_tailgt ? e ltfactorgtltt_tailgt / ltfactorgt
ltt_tailgt t_tail means possibly more factors - ltfactorgt ? id num (ltexprgt )
62Parsing Problem
E
E
T
E
T
T
F
E
-
T
T
2
F
T
F
e
F
T
e
4
e
3
5
e
63Parsing Problem
E
E
T
e
F
T
2
F
/
T
E
(
)
e
T
E
F
T
E
-
T
e
e
F
T
3
e
5
64Recursive-Descent Parsing
- The other characteristic of grammars that
disallows top-down parsing is the lack of
pairwise disjointness - The inability to determine the correct RHS on the
basis of one token of lookahead - Def FIRST(?) a ? gt a?
- (If ? gt ?, ? is in FIRST(?))
65Recursive-Descent Parsing
- Pairwise Disjointness Test
- For each nonterminal, A, in the grammar that has
more than one RHS, for each pair of rules, A ? ?i
and A ? ?j, it must be true that - FIRST(?i) FIRST(?j) ?
- Examples
- A ? a bB cAb
- A ? a aB
66Recursive-Descent Parsing
- Left factoring can resolve the problem
- Replace
- ltvariablegt ? identifier identifier
ltexpressiongt - with
- ltvariablegt ? identifier ltnewgt
- ltnewgt ? ? ltexpressiongt
- or
- ltvariablegt ? identifier ltexpressiongt
- (the outer brackets are metasymbols of EBNF)
67Parser Classification
- Parsers are broadly broken down into
- LL - Top down parsers
- L - Scan Left to Right
- L - Traces leftmost derivation of input string
- LR - Bottom up parsers
- L - Scan left to right
- R - Traces rightmost derivation of input string
- Typical notation
- LL(1), LL(0), LR(1), LR(k)
- Number (k) refers to maximum look ahead
- Lower is better!
68Parser Classification
- Thus k is maximum height of stack
- k 0, No stack k 1, single variable k gt 1,
stack - Writing grammar with small k is not easy!
69Tradeoff
- Grammar ? Parser
- LL Parsers are a subset of LR Parsers
- Anything parsable with LL is parsable with LR.
Reverse is not true.
LR
LL
70expr
1 2 3
71expr
term
1 2 3
72expr
term
factor
1 2 3
73expr
term
factor
Finds num
1 2 3
74expr
term
Success
factor
1 2 3
75expr
term
Success
factor
t_tail
Finds nothing!
1 2 3
76expr
term
Success
Success
factor
t_tail
1 2 3
77expr
Success
term
1 2 3
78expr
Success
term
e_tail
1 2 3
79expr
Success
term
e_tail
Finds
1 2 3
80expr
Success
term
e_tail
Finds
term
1 2 3
81expr
Success
term
e_tail
Finds
term
factor
1 2 3
82expr
Success
term
e_tail
Finds
term
factor
Finds num
1 2 3
83expr
Success
term
e_tail
Finds
term
factor
1 2 3
84expr
Success
term
e_tail
Finds
term
Success
factor
1 2 3
85expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
1 2 3
86expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Finds
1 2 3
87expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Finds
factor
1 2 3
88expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Finds
factor
Finds num
1 2 3
89expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Finds
factor
t_tail
Finds nothing
1 2 3
90expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Success
Success
Finds
factor
t_tail
1 2 3
91expr
Success
term
e_tail
Finds
term
Success
Success
factor
t_tail
1 2 3
92expr
Success
term
e_tail
Success
Finds
term
1 2 3
93expr
Success
Success
term
e_tail
1 2 3
94expr
Success
1 2 3
95What happened?
ltexprgt
lttermgt
lte_tailgt
lttermgt
lte_tailgt
ltfactorgt
ltt-tailgt
?
ltfactorgt
ltt-tailgt
?
num
ltfactorgt
ltt_tailgt
num
num
?
1
2
3
96Bottom-up Parsing
- The parsing problem is finding the correct RHS in
a right-sentential form to reduce to get the
previous right-sentential form in the derivation
97Bottom-up Parsing
- Intuition about handles
- Def ? is the handle of the right sentential form
- ? ??w if and only if S gtrm ?Aw gtrm
??w - Def ? is a phrase of the right sentential form
- ? if and only if S gt ? ?1A?2 gt
?1??2 - Def ? is a simple phrase of the right sentential
form ? if and only if S gt ? ?1A?2 gt ?1??2
98Bottom-up Parsing
- Intuition about handles
- The handle of a right sentential form is its
leftmost simple phrase - Given a parse tree, it is now easy to find the
handle - Parsing can be thought of as handle pruning
99Bottom-up Parsing
- Shift-Reduce Algorithms
- Reduce is the action of replacing the handle on
the top of the parse stack with its corresponding
LHS - Shift is the action of moving the next token to
the top of the parse stack
100Bottom-up Parsing
- A parser table can be generated from a given
grammar with a tool, e.g., yacc
101Introduction
- Shift-reduce parsing
- A general style of bottom-up parsing
- Attempts to construct a parse tree for an input
string - Beginning at the leaves (the bottom)
- Working up towards the root (the top)
- A process of reducing the input string to the
start symbol
102Introduction
- At each reduction step, a substring matching the
RHS of a production is replaced by the symbol on
the LHS - If the substring is chosen correctly at each
step, a rightmost derivation is traced in reverse
103Example
- Consider the following grammar
- S ? aABe A ? Abc b B ? d
- The sentence abbcde can be reduced to S by the
following steps - abbcde ? aAbcde ? aAde ? aABe ? S
- rightmost derivation (in reverse)
104Introduction
- Two problems to be solved
- How to locate the substring to be reduced?
- What production to choose if there are many
productions with that substring on the RHS?
105Stack Implementation (of Shift-Reduce Parsing)
- A stack is used to hold grammar symbols and an
input buffer to hold the input string - used to mark bottom of stack end of input
- Initially stack is empty and we have a string,
say w, as input ( is end of string) - STACK INPUT
- w
106Stack Implementation
- The parser shifts zero or more input symbols onto
the stack until a substring ß (called a handle)
is on top of the stack - Then ß is reduced to the LHS of a production
- Repeat until (an error is seen or) the stack has
the start symbol and input is empty - STACK INPUT
- S
At this point, parser halts with success
107Example
- Suppose we have the (ambiguous) CFG
- E ? E E E E (E) id
- Consider the input string id1 id2 id3 that
can be derived (rightmost) as - E ? E E ? E E E ? E E id3 ? E id2
id3 ? id1 id2 id3 - Show the actions of a shift-reduce parser
108(No Transcript)
109Actions
- Shift shift next symbol onto top of stack
- Reduce locate the left end of a handle within
the stack (right end is the top) and decide the
non-terminal to replace handle - Accept announce successful completion
- Error discover syntax error and call error
recovery routine
110Conflicts during Parsing
- Shift-reduce conflict
- Cannot decide whether to shift or reduce
- Reduce-reduce conflict
- Cannot decide which of the several reductions to
make (multiple productions to choose from)
111Conflict Resolution
- Conflict resolution by adapting the parsing
algorithm (e.g., in parser generators) - Shift-reduce conflict
- Resolve in favor of shift
- Reduce-reduce conflict
- Use the production that appears earlier
112Bottom-up Parsing
- Advantages of LR parsers
- They will work for nearly all grammars that
describe programming languages. - They work on a larger class of grammars than
other bottom-up algorithms, but are as efficient
as any other bottom-up parser. - They can detect syntax errors as soon as it is
possible. - The LR class of grammars is a superset of the
class parsable by LL parsers.
113Bottom-up Parsing
- LR parsers must be constructed with a tool
- Knuths insight A bottom-up parser could use the
entire history of the parse, up to the current
point, to make parsing decisions - There were only a finite and relatively small
number of different parse situations that could
have occurred, so the history could be stored in
a parser state, on the parse stack
114Bottom-up Parsing
- An LR configuration stores the state of an LR
parser - (S0X1S1X2S2XmSm, aiai1an)
115Bottom-up Parsing
- LR parsers are table driven, where the table has
two components, an ACTION table and a GOTO table - The ACTION table specifies the action of the
parser, given the parser state and the next token - Rows are state names columns are terminals
- The GOTO table specifies which state to put on
top of the parse stack after a reduction action
is done - Rows are state names columns are nonterminals
116Structure of An LR Parser
117Bottom-up Parsing
- Initial configuration (S0, a1an)
- Parser actions
- If ACTIONSm, ai Shift S, the next
configuration is - (S0X1S1X2S2XmSmaiS, ai1an)
- If ACTIONSm, ai Reduce A ? ? and S
GOTOSm-r, A, where r the length of ?, the
next configuration is - (S0X1S1X2S2Xm-rSm-rAS, aiai1an)
118Bottom-up Parsing
- Parser actions (continued)
- If ACTIONSm, ai Accept, the parse is complete
and no errors were found. - If ACTIONSm, ai Error, the parser calls an
error-handling routine.
119LR Parsing Table
120LR Parsing Introduction
- An efficient bottom-up parsing technique
- Can parse a large set of CFGs
- Technique is called LR(k) parsing
- L for left-to-right scanning of input
- R for rightmost derivation (in reverse)
- k for number of tokens of look-ahead
- (when k is omitted, it implies k1)
121LL(k) vs. LR(k)
- LL(k) must predict which production to use
having seen only first k tokens of RHS - Works only with some grammars
- But simple algorithm (can construct by hand)
- LR(k) more powerful
- Can postpone decision until seen tokens of entire
RHS of a production k more beyond
122LR(k)
- Can recognize virtually all programming language
constructs (if CFG can be given) - Most general non-backtracking shift-reduce method
known, but can be implemented efficiently - Class of grammars can be parsed is a superset of
grammars parsed by LL(k) - Can detect syntax errors as soon as possible
123LR(k)
- Main drawback too tedious to do by hand for
typical programming lang. grammars - We need a LR parser generator
- Many available
- Yacc (yet another compiler compiler) or bison
for C/C environment - CUP (Construction of Useful Parsers) for Java
environment JavaCC is another example - We write the grammar and the generator produces
the parser for that grammar