Title: Compilers
1Compilers
- Introduction
- Basic Compiler Function
- Lexical Analysis,
- Syntactic Analysis
- Operator-Precedence Parsing
- Shift Reduce Parsing
- Recusive Descent Parsing
N.K. Srinath srinath_nk_at_yahoo.com 1
RVCE
2Lexical Analysis
Lexical Analysis involves scanning the program to
be compiled. Scanned items are recognized
directly as single tokens. These tokens could
be defined as a part of the grammar.
Example ltidgtltlettergtltidgtltlettergtltidgtltdigitgt
ltlettergtABC ... Z ltdigitgt
012 ...9 In such a case the scanner world
recognize as tokens the single characters
A,B,...Z,0,1,...9. The parser could interpret a
sequence of such characters as the language
construct ltidgt.
N.K. Srinath srinath_nk_at_yahoo.com 31
RVCE
3- Features
- The length of the identifiers could be
restricted. - The scanner generally recognizes both single and
multiple character tokens directly. - The scanner output consists of sequence of
tokens. - This token can be considered to have a fixed
length code. - For a Pascal grammar list of integer code for
each token is provided in table.
N.K. Srinath srinath_nk_at_yahoo.com 32
RVCE
4N.K. Srinath srinath_nk_at_yahoo.com 33
RVCE
5 Issues in Lexical Analyzer
- The lexical analyzer has to recognize the
longest possible string. - Ex identifier newva -- n ne new newv
newva - There is no end delimiter for the tokens
defined. - Normally we dont return a comment as a token
and the comments are only processed by the
lexical analyzer. - Symbol table holds information about token.
N.K. Srinath srinath_nk_at_yahoo.com 34
RVCE
6- Some scanners enter the identifiers directly
into a symbol table. The token specifier for the
identifiers may be a pointer to the symbol table
entry for that identifier. - The entire program is not scanned at one time.
- Scanner is a operator as a procedure that is
called by the processor when it needs another
token. - Scanner is responsible for reading the lines of
the source program and possible for printing the
source listing.
N.K. Srinath srinath_nk_at_yahoo.com
35 RVCE
7- The scanner, except for printing as the
- output listing, ignores comments.
- Scanner must look into the language
characteristics. - Example FOTRAN
- Columns 1 - 5 Statement number
- Column 6 Continuation of line
- Column 7-72 Program statement
- PASCAL Blanks function as delimiters for tokens
- Statement can be continued freely
- End of statement is indicated by (semi
column)
N.K. Srinath srinath_nk_at_yahoo.com
36 RVCE
8- Scanners should look into the rules
- for the formation of tokens.
- Example 'READ' Should not be considered as
keyword as it is within quotes. i.e., all string
within quotes should not be considered as token. - Blanks are significant within the quoted
string. - Blanks has important factor to play in
different language - Example 1 FORTRAN Statement
- Do 10 I 1, 100
- Do is a key word, I is identifier, 10 is the
statement number.
N.K. Srinath srinath_nk_at_yahoo.com 37
RVCE
9Statement DO 10 I 1 It is an identifier
Do 10 I 1 Note Blanks are ignored in
FORTRAN statement and hence it is a assignment
statement. In this case the scanner must look
ahead to see if there is a comma (,) before it
can decide in the proper identification of the
character. Example 2 In FORTRAN keywords may
also be used as an identifier. Words such as if,
then, and ELSE might represent either keywords
or variable names.
N.K. Srinath srinath_nk_at_yahoo.com 38
RVCE
10 if (then .EQ. ELSE) then if
then ELSE then if endif
Modeling Scanners as Finite Automata Finite
automata provides an easy way to visualize the
operation of a scanner. An algorithm is shown
to recognize a token.
N.K. Srinath srinath_nk_at_yahoo.com
39 RVCE
11Get first Input-character if
Input-character in 'A' .. ' Z' then
Begin while Input-character in 'A' ..
'Z', ' 0'.. ' 9' do Begin get next
input character if Input_character _
then Begin get next
Input_character if Input_character _
then Last_Char_is_Underscore true
End if End if _
Else
N.K. Srinath srinath_nk_at_yahoo.com
40 RVCE
12 Last_Char_Is_Underscorefalse
end while if
Last_Char_Is_Underscore then
return(token-error) else
return (Valid_token) end if first is
'A' .. ' Z' else return
(token-error)
N.K. Srinath srinath_nk_at_yahoo.com 41
RVCE
13SYNTACTIC ANALYSIS
- Syntax Analyzer creates the syntactic
structure of the given source program. - Syntax Analyzer is also known as parser.
- The syntax of a programming is described by a
context-free grammar (CFG). We will use BNF
(Backus-Naur Form) notation in the description of
CFGs. - The syntax analyzer (parser) checks whether a
given source program satisfies the rules implied
by a context-free grammar or not.
N.K. Srinath srinath_nk_at_yahoo.com 42
RVCE
14 - If it satisfies, the parser creates
the parse tree of that
program. - Otherwise the parser gives the error messages.
- A context-free grammar
- gives a precise syntactic specification of a
programming language. - the design of the grammar is an initial phase
of the design of a compiler. - a grammar can be directly converted into a
parser by some tools.
N.K. Srinath srinath_nk_at_yahoo.com 43
RVCE
15- Parser works on a stream of tokens.
- The smallest item is a token.
- We categorize the parsers into two groups
- Top-Down Parser the parse tree is created top to
bottom, starting from the root. - Bottom-Up Parser the parse is created bottom to
top starting from the leaves
N.K. Srinath srinath_nk_at_yahoo.com 44
RVCE
16Bottom up Bottom up methods begin with the
terminal nodes of the tree and attempt to
combine these into successively high - level
nodes until the root is reached. Top down Top
down methods begin with the rule of the grammar
that specifies the goal of the analysis ( i.e.,
the root of the tree), and attempt to construct
the tree so that the terminal nodes match the
statement being analyzed.
N.K. Srinath srinath_nk_at_yahoo.com 45
RVCE
17 - Both top-down and bottom-up
- parsers scan the input from left
- to right (one symbol at a time).
-
- Efficient top-down and bottom-up parsers can be
implemented only for sub-classes of context-free
grammars. - LL for top-down parsing
- LR for bottom-up parsing
N.K. Srinath srinath_nk_at_yahoo.com 46
RVCE
18 Bottom-up parsing Example E T E T T
int T int (E) Consider the string
int int int
N.K. Srinath srinath_nk_at_yahoo.com 47
RVCE
19 Bottom-up parsing reduces a string to the start
symbol by inverting productions int int
int Int T int T int T T T E E
T int
ET
T int T
E
T
E
T
T
int int int
N.K. Srinath srinath_nk_at_yahoo.com 48
RVCE
20 Context-Free Grammars
- Inherently recursive structures of a programming
language are defined by a context-free grammar. - In a context-free grammar, we have
- A finite set of terminals (in our case, this
will be the set of tokens) - A finite set of non-terminals
(syntactic-variables) - A finite set of productions rules are in the
following form
N.K. Srinath srinath_nk_at_yahoo.com 49
RVCE
21 - A ? ?
- where A is a non-terminal and
- ? is a string of terminals and non-terminals
(including the empty string) - A start symbol (one of the non-terminal symbol)
- Example
- E ? E E E E E E E / E
- E - E ? ( E )
- E ? id
N.K. Srinath srinath_nk_at_yahoo.com 50
RVCE
22 Derivations
E EE EE derives from E we can replace E by
EE to able to do this, we have to have a
production rule EEE in our grammar. E
EE idE idid A sequence of replacements of
non-terminal symbols is called a derivation of
idid from E.
N.K. Srinath srinath_nk_at_yahoo.com 51
RVCE
23 Left-Most Derivation E ? -E ? -(E) ? -(EE) ?
-(idE) ? -(idid) Right-Most Derivation E ? -E
? -(E) ? -(EE) ? -(Eid) ? -(idid) We will see
that the top-down parsers try to find the
left-most derivation of the given source
program. We will see that the bottom-up parsers
try to find the right-most derivation of the
given source program in the reverse order.
N.K. Srinath srinath_nk_at_yahoo.com 52
RVCE
24 Parse Tree
- Inner nodes of a parse
- tree are non-terminal symbols.
- The leaves of a parse tree
- are terminal symbols.
- A parse tree can be seen as a graphical
representation of a derivation.
? -(E)
N.K. Srinath srinath_nk_at_yahoo.com 53
RVCE
25 Ambiguity
A grammar produces more than one parse tree for
a sentence is called as an ambiguous grammar.
E ? EE ? idE ? idEE ? ididE ? ididid
E ? EE ? EEE ? idEE ? ididE ? ididid
N.K. Srinath srinath_nk_at_yahoo.com 54
RVCE
26- Operator-Precedence Parsing
- It is very simple
- Used in languages where virtually all
operators are used. Example SNOBOL. - Three disjoint relations lt. and .gt are used
between certain pairs of terminals. - If alt. b we say that a yields precedence to
b. - if ab we say that a has same precedence as b.
- If a .gtb we say that a takes precedence over b.
N.K. Srinath srinath_nk_at_yahoo.com 55
RVCE
27OPERATOR PRECEDENCE PARSING The bottom up
parsing technique considered is called the
operator precedence method. This method is loaded
on examining pairs of consecutive operators in
the source program and making decisions about
which operation should be performed first.
Example A B C - D (1)
N.K. Srinath srinath_nk_at_yahoo.com 56
RVCE
28There are two ways of determining what precedence
relation should hold between a pair of
terminals. First Method Intuitive method based
on the traditional notions of associativity and
precedence of operators. The usual procedure of
operation is multiplication and division has
higher precedence over addition and
subtraction. the two operators ( and ), we find
that has lower precedence than . This is
written as ? has lower precedence .
N.K. Srinath srinath_nk_at_yahoo.com 57
RVCE
29Consider the following grammar for expressions E
E A E (E) -E id (2) A -
/ It is not an operator grammar. If we
substitute for A each of its alternates, we
obtain the following operator grammar E E
E E E E E E / E (E) -E id The
ambiguity with this grammar is that it does not
indicate precedence of relations.
N.K. Srinath srinath_nk_at_yahoo.com 58
RVCE
30There are two ways of determining what precedence
relation should hold between a pair of
terminals. First Method Intuitive method based
on the traditional notions of associativity and
precedence of operators. This approach will
resolve the ambiguities of grammar shown in (2)
and allow us to resolve the ambiguities. Second
Method An Unambiguous grammar for the
N.K. Srinath srinath_nk_at_yahoo.com 59
RVCE
31language is constructed first. This grammar
reflects the correct associativity and precedence
in its parse trees. Example For arithmetic
expressions involving , -, , / the grammar is
E E E E E E E E / E (E)
-E id
N.K. Srinath srinath_nk_at_yahoo.com 60
RVCE
32To construct an unambiguous
grammar, there is a mechanical
method for constructing
operator-precedence relations form it. Example
for the expression
id id id the operator
precedence
relations table
is as follows
precedence
relations table
N.K. Srinath srinath_nk_at_yahoo.com 61
RVCE
33- The given expression is considered
as string. - All the nonterminals are removed
and correct relation ? ,? and ? are
placed between terminals as per the operator
precedence relations. - is placed at the beginning and the end of
the string. - Example id id id id
- ? ? ?
N.K. Srinath srinath_nk_at_yahoo.com 62
RVCE
34Precedence matrix for the grammar Pascal rammar
N.K. Srinath srinath_nk_at_yahoo.com 63
RVCE
35For a Pascal Grammar the precedence for some of
the tokens are explained. Example PROGRAM?VAR
These two tokens have equal precedence Begin ?
FOR begin has lower precedence over FOR.
There are some values which do not follow
precedence relations for comparisons. Example
? end and end ? i.e., when is
followed by end, the ' ' has higher precedence
and when end is followed by the end has higher
precedence.
N.K. Srinath srinath_nk_at_yahoo.com 64
RVCE
36In all the statements where precedence
relation does not exist in the table,
two tokens cannot appear
together in any legal statement. If such
combination occurs during parsing it should be
recognized as error. Example Pascal
Statement begin READ (VALUE)
These Pascal statements scanned from left to
right, one token at a time. For each pair of
operators, the precedence relation between them
is determined.
N.K. Srinath srinath_nk_at_yahoo.com 65
RVCE
37- . . . begin READ ( id )
? ? ? ? - 2. . . . begin READ ( lt N1 gt )
(N1) - ? ? ?
? - id
-
Value -
N.K. Srinath srinath_nk_at_yahoo.com 66
RVCE
38. . . begin lt N2 gt
ltN2 gt
READ ( ltN1gt )
N.K. Srinath srinath_nk_at_yahoo.com 67
RVCE
39- Example Show a step-by-step parsing for the
assignment - VARIANCE SUMSQ DIV 100 - MEAN MEAN
- . . id 1 id 2 DIV int -
id3 id4 - ? ? ?
- Left to right scan is continued in each step
only far enough to determine the next portion of
the statement to be recognized, which is the
first portion delimited by ? and ?. - Once this portion has been determined, it is
interpreted as a nonterminal according t some
rule of the grammar.
N.K. Srinath srinath_nk_at_yahoo.com 68
RVCE
40Parse tree is constructed from the terminal nodes
up towards the root, hence the term bottom-up
parsing.
The id SUMSQ is interpreted as the single
nonterminal ltN1gt, which is an operand of the
DIV. That is, ltN1gt in the tree corresponds to two
non terminals, ltfactorgt and lttermgt as per the
pascal grammar.
ii . . . id 1 ltN1gt DIV int -
id3 id4
? ? ? ?
N.K. Srinath srinath_nk_at_yahoo.com 69
RVCE
41iii . id 1 ltN1gt DIV ltN2gt- id3 id4
? ?
? iv id 1 ltN3gt - id3 id4
? ? ? ?
N.K. Srinath srinath_nk_at_yahoo.com 70
RVCE
42v id 1 ltN3gt - ltN4gt id4
? ? ? ?
? vi id 1 ltN3gt - ltN4gt ltN5gt
? ? ?
? vii id 1 ltN3gt - ltN6gt
? ? ?
N.K. Srinath srinath_nk_at_yahoo.com 71
RVCE
43.. id1 ltN7gt
?
? ?
N.K. Srinath srinath_nk_at_yahoo.com 72
RVCE
44 ltN8gt
N.K. Srinath srinath_nk_at_yahoo.com 73
RVCE
45- SHIFT REDUCE PARSING
-
- The operator precedence parsing is bottom up
parsing. - It was developed to shift reduce parsing.
- This method makes use of a stack to store tokens
that have not yet been recognized in terms of the
grammar. - The actions of the parser are controlled by
entries in a table, which is somewhat similar to
the precedence matrix.
N.K. Srinath srinath_nk_at_yahoo.com 74
RVCE
46- The two main actions of shift reducing parsing
are - Shift Push the current token into the stack.
- Reduce Recognize symbols on top of the stack
according to a rule of a grammar - Example begin READ ( id ) . . .
- Steps Token Stream
- 1. . . . begin READ ( id ) . . .
Stack
Begin is first pushes the token on stack
Shift
N.K. Srinath srinath_nk_at_yahoo.com 75
RVCE
472. . . . begin READ ( id ) . . . 3. . . .
begin READ ( id ) . . .
The next token READ is also shifted on to the
stack.
N.K. Srinath srinath_nk_at_yahoo.com 76
RVCE
483. . . . begin READ ( id ) . . .
4. . . . begin READ ( id ) . . .
5. . . . begin READ ( id ) . . .
When it parser examines the token ), the reduce
action is invoked.
N.K. Srinath srinath_nk_at_yahoo.com 77
RVCE
49A set of tokens from the top of the
stack is reduced to anon
terminal
symbol from the grammar. It is pushed onto the
stack, to be reduced later as part of the READ
statement. Note Shift action is taken by an
operator-precedence parser when it encounters the
relations ? and ? . Reduce action is taken
when an operator
precedence parser encounters the relation ?.
N.K. Srinath srinath_nk_at_yahoo.com 78
RVCE
50 Shift Reduce parsing can also be represented as
follows
Shift Move one place to the right. Shifts
a terminal to the left string ABCxyz ?
ABCxyz Reduce Apply an inverse production at
the right end of the left string. If A xy
is a production, then Cbxyijk ?
CbAijk
N.K. Srinath srinath_nk_at_yahoo.com 79
RVCE
51 Example with Reduction only int int int
reduce T int int T int reduce T
int T T int reduce T int T T
reduce E T T E reduce E T E E
N.K. Srinath srinath_nk_at_yahoo.com 80
RVCE
52 Shift-Reduce Parsing Example int int int
shift int int int shift int
int int shift int int int reduce T
int int T int reduce T int
T T int shift T int shift T
int reduce T int T T reduce E
T T E reduce E T E E
N.K. Srinath srinath_nk_at_yahoo.com 81
RVCE
53 Note Handle We only want to reduce at
handles. A handle is a string that can be
reduced, and that also allows further reductions
back to the start symbol. In shift-reduce
parsing, handles always appear at the top of the
stack. Handles are never to the left of the
rightmost non-terminal. Therefore, shift-reduce
moves are sufficient the need never move left.
Bottom-up parsing algorithms are based
on recognizing handles.
N.K. Srinath srinath_nk_at_yahoo.com 82
RVCE
54RECURSIVE DESCENT PARSING
- Recursive-Descent is a top-down parsing
technique. - It is made up of a procedure for each
non-terminal symbol in the grammar. - A procedure is associated to find a sub-string
of the input. - During this process it may call other
procedures, or call itself recursively to search
for other non-terminals. - If the procedure finds the non-terminal it
returns success else failure.
N.K. Srinath srinath_nk_at_yahoo.com 83
RVCE
55Example ltreadgt READ (ltid-listgt) The
procedure for READ statement is provided which is
derived from the grammar. The grammar considered
are ltid - listgt id , id unlike the one
considered earlier. lt id - list gt id
ltid-listgt, id
This is because of the fundamental difficulty. If
the procedure decided to try the second
alternative
N.K. Srinath srinath_nk_at_yahoo.com 84
RVCE
56(ltid-listgt, id), it would immediately
call itself recursively to find an ltid-listgt.
This would result in another
immediate recursive call, which leads to unending
chain. Note that an integer code is provided for
each token as listed earlier. PROCEDURE READ
begin FOUND FALSE if TOKEN 8 READ
then
N.K. Srinath srinath_nk_at_yahoo.com 85
RVCE
57The READ procedure has been invoked and has
examined the tokens READ and the stream indicated
by dashed lines.
begin advance to next token if TOKEN 20
( then begin advance to next token
if IDLIST returns
success then
N.K. Srinath srinath_nk_at_yahoo.com 86
RVCE
58READ has called IDLIST (indicated by solid line),
which has examined the token id.
if token 21 ) then begin
FOUND TRUE advance to
next token end if ) end
if ( end if READ
N.K. Srinath srinath_nk_at_yahoo.com 87
RVCE
59if FOUND TRUE then return success
else return failure end (READ)
IDLIST has returned to READ, indicating
success. Note The parse tree was constructed
beginning at the root, hence the term top-down
parsing.
N.K. Srinath srinath_nk_at_yahoo.com 88
RVCE
60Procedure for the IDLIST This
procedure checks whether the id-list is as per
the grammar. Procedure IDLIST begin FOUND
FALSE if TOKEN 22 id then
begin FOUND TRUE
advance to Next token
N.K. Srinath srinath_nk_at_yahoo.com 89
RVCE
61A flag is set to indicate that the identifier
is found and is ok. It
checks for the ,
in the next statement while (TOKEN 14 ,) and
(FOUND TRUE) do begin advance to next
token if TOKEN 22 id then
advance to next token else
FOUND FALSE end
while end if id
N.K. Srinath srinath_nk_at_yahoo.com 90
RVCE
62if FOUND TRUE then return success
else return failure end IDLIST
If , is not followed by an id the function
returns failure. eg. id,id, - Is an error
Recursive descent parse of the Assignment
Statement procedure and the parse tree is shown
for the nonterminal symbols. ltassigngt id
ltexpgt
N.K. Srinath srinath_nk_at_yahoo.com 91
RVCE
63Procedure ASSIGN begin FOUND
FALSE if TOKEN 22 id then
begin advance to Next token
if TOKEN 15 then
begin advance to next token
if EXP returns success then
FOUND TRUE end
if end end id
N.K. Srinath srinath_nk_at_yahoo.com 92
RVCE
64 if FOUND TRUE then
return success else return
failure end ASSIGN
Parse tree for Assignment
N.K. Srinath srinath_nk_at_yahoo.com 93
RVCE
65Grammar for expression is ltexpgt lttermgt
lttermgt - lttermgt Procedure EXP begin
FOUND FALSE If TERM returns success then
begin FOUND TRUE while ((TOKEN
16)or (TOKEN 17 -))
and (FOUND TRUE) do
N.K. Srinath srinath_nk_at_yahoo.com 94
RVCE
66 begin advance to next
token if TERM returns success
then FOUND FALSE end
while end if TERM if FOUND
TRUE then return success else return
failure end EXP
N.K. Srinath srinath_nk_at_yahoo.com 95
RVCE
67Parse Tree for ASSIGN - TERM
N.K. Srinath srinath_nk_at_yahoo.com 96
RVCE
68Procedure TERM begin FOUND FALSE
If FACTOR returns success then
begin FOUND TRUE while ((TOKEN
18 ) or (TOKEN 19
DIV ) and (FOUND TRUE) do
begin advance to next token
if TERM returns
failure then
FOUND FALSE
N.K. Srinath srinath_nk_at_yahoo.com 97
RVCE
69 end while end if FACTOR
if FOUND TRUE then return
success else return failure
end TERM The TERM calls the procedure factor.
If it returns success then it proceeds. The
Factor procedure is as follows
N.K. Srinath srinath_nk_at_yahoo.com 98
RVCE
70Procedure FACTOR begin FOUND FALSE
if (TOKEN22id) or (TOKEN23int) then
begin FOUND TRUE
advance to next token end if id or int
else if TOKEN 20 (
then begin advance to
next token
N.K. Srinath srinath_nk_at_yahoo.com 99
RVCE
71if EXP returns success then if TOKEN 21 )
then begin (FOUND TRUE)
advance to next token end if )
end if ( if FOUND TRUE
then return success else return
failure end FACTOR
N.K. Srinath srinath_nk_at_yahoo.com 100
RVCE
72Recursive-Descent Parsing (uses Backtracking)
S ? aBc B ? bc b input abc S
a B c
b
S
a B c
b c
backtrack
Fails
N.K. Srinath srinath_nk_at_yahoo.com
101 RVCE