Title: Compiler Construction Principles
1Compiler Construction Principles Implementation
Techniques
- Dr. Ying JIN
- Associate Professor
- Sept. 2007
2What we have already introduced
- What this course is about?
- What is a compiler?
- The ways to design and implement a compiler
- General functional components of a compiler
- The general translating process of a compiler
3What will be introduced
- Scanning
- The first phase in a compiler
- Functional requirement (input, output, functions)
- Data structures
- General techniques in developing a scanner
- Two formal languages
- Regular expression
- Finite automata (DFA, NFA)
- Three algorithms
- From regular expression to NFA
- From NFA to DFA
- Minimizing DFA
- One implementation
- Implementing DFA to get a scanner
4Functional components of a Compiler
Target Code Generation
Lexical Analysis scanning
Intermediate Code Optimization
Syntax Analysis Parsing
Intermediate Code Generation
Semantic Analysis
analysis/front end
synthesis/back end
5Presentation Time
- What you have read and known?
6Outline
- 2.1 Overview
- 2.1.1 General Function of a Scanner
- 2.1.2 Some Issues about Scanning
- 2.2 Finite Automata
- 2.2.1 Definition and Implementation of DFA
- 2.2.2 Non-Determinate Finite Automata
- 2.2.3 Transforming NFA into DFA
- 2.2.4 Minimizing DFA
- 2.3 Regular Expressions
- 2.3.1 Definition of Regular Expressions
- 2.3.2 Regular Definition
- 2.3.4 From Regular Expression to DFA
- 2.4 Design and Implementation of a Scanner
- 2.4.1 Developing a Scanner from DFA
- 2.4.2 A Scanner Generator Lex
7Knowledge Relation Graph
using
transforming
Lexical definition
Regular Expression
NFA
transforming
basing on
DFA
Develop a Scanner
minimizing
minimized DFA
implement
8Outline
- 2.1 Overview
- 2.1.1 General Function of a Scanner
- 2.1.2 Some Issues about Scanning
- 2.2 Finite Automata
- 2.2.1 Definition and Implementation of DFA
- 2.2.2 Non-Determinate Finite Automata
- 2.2.3 Transforming NFA into DFA
- 2.2.4 Minimizing DFA
- 2.3 Regular Expressions
- 2.3.1 Definition of Regular Expressions
- 2.3.2 Regular Definition
- 2.3.4 From Regular Expression to DFA
- 2.4 Design and Implementation of a Scanner
- 2.4.1 Developing a Scanner from DFA
- 2.4.2 A Scanner Generator Lex
92.1 Overview
- General function of scanning
- Input
- Output
- Functional description
- Some issues about scanning
- Tokens
- Blank/tab, return, newline, comments
- Lexical errors
10General Function of Scanning
- Input
- Source program
- Output
- Sequence of words (tokens)
- Functional description (similar to spelling)
- Read source program
- Recognize words one by one according to the
lexical definition of the source language - Build internal representation of words tokens
- Check lexical errors
- Output the sequence of words
11Some Issues about Scanning
12Tokens
- Source program
- Stream of characters
- Token
- A sequence of characters that can be treated as a
unit in the grammar of a programming language - Token types
- Identifier x, y1,
- Number 12, 12.3,
- Keywords (reserved words) int, real, main,
- Operator , -, , / , gt, lt ,
- Delimiter , , ,
13Tokens
token-type
Semantic information
Semantic information - Identifier the
string - Number the value - Keywords
(reserved words) the number in the keyword table
- Operator itself - Delimiter itself
14Keywords
- Keywords
- Words that have special meaning
- Can not be used for other meanings
- Reserved words
- Words that are reserved by a programming language
for special meaning - Can be used for other meaning, overload previous
meaning - Keyword table
- to record all keywords defined by the source
programming language
15Sample Source Program
y
,
?
n
i
?
v
a
r
t
x
?
?
a
d
e
y
r
(
d
r
?
x
)
a
(
e
)
t
?
n
h
e
?
y
gt
x
?
i
f
?
1
s
l
e
?
)
(
t
e
i
r
w
e
)
?
(
0
t
e
i
r
16Sequence of Tokens
var,k ? int, k x, ide , y, ide ?
read, k ( x, ide ) ? read, k
( y, ide ) ? if, k x, ide gt
y , ide then, k write,k ( 1, num ) ?
else, k write,k ( 0, num ) ?
17Blank, tab, newline, comments
- No semantic meaning
- Only for readability
- Can be removed
- Line number should be calculated
18The End of Scanning
- Two optional situations
- Once read the character representing the end of a
program - PASCAL .
- The end of the source program file
19Lexical Errors
- Limited types of errors can be found during
scanning - illegal character
- , ?
- the first character is wrong
- /abc
- Lexical Error Recovery
- Once a lexical error has been found, scanner will
not stop, it will take some measures to continue
the process of scanning - Ignore current character, start from next
character
if a then x 12.else
20Scanner
call
Attached Scanner
Syntax Analysis
Token
CharList
Independent Scanner
TokenList
Syntax Analysis
21To Develop a Scanner
- Now we know what is the function of a Scanner
- How to implement a scanner?
- Basis lexical rules of the source programming
language - Set of allowed characters
- What kinds of tokens it has?
- The structure of each token-type
- Scanner will be developed according to the
lexical rules
22How to define the lexical structure of a
programming language?
- Natural language (English, Chinese )
- - easy to write, ambiguous and hard to implement
- Formal languages
- - need some special background knowledge
- - concise, easy to implement
- - automation
23- Two formal languages for defining lexical
structure of a programming language - Finite automata
- Non-deterministic finite automata (NFA)
- Deterministic finite automata (DFA)
- Regular expressions (RE)
- Both of them can formally describing lexical
structure - The set of allowed words
- They are equivalent
- FA is easy to implement RE is easy to use
24Outline
v
- 2.1 Overview
- 2.1.1 General Function of a Scanner
- 2.1.2 Some Issues about Scanning
- 2.2 Finite Automata
- 2.2.1 Definition and Implementation of DFA
- 2.2.2 Non-Determinate Finite Automata
- 2.2.3 Transforming NFA into DFA
- 2.2.4 Minimizing DFA
- 2.3 Regular Expressions
- 2.3.1 Definition of Regular Expressions
- 2.3.2 Regular Definition
- 2.3.4 From Regular Expression to DFA
- 2.4 Design and Implementation of a Scanner
- 2.4.1 Developing a Scanner from DFA
- 2.4.2 A Scanner Generator Lex
252.2 Finite Automata
- Definition of DFA
- Implementation of DFA
- Non-Determinate Finite Automata
- Transforming NFA into DFA
- Minimizing DFA
26Definition of DFA
- - formal definition
- - two ways of representations
- - examples
- - some concepts
27Formal Definition of DFA
- (?,SS, S0, ?, TS)
- ?(alphabet),set of allowed characters, each
character can be called as input symbol - SS S0, S1, S2, ,a finite set, each element
is called state - S0? SS, start state
- ? SS ? ? ? SS ? ? , transforming function
- TS?SS, set of terminal (accept) states
- Note? is a function which accepts a state and a
symbol and returns either one unique state or
?(no definition)
One DFA defines a set of strings each string is
a sequence of characters in ? Start state gives
the start point of generating strings Terminal
states give the end point Transforming function
give the rules how to generate strings
28- Features of a DFA
- One start state
- For a state and a symbol, it has at most one
edge - Functions of DFA
- It defines a set of strings
- It can be used for defining lexical structure of
a programming language
29Two ways of Representations
- Table
- Convenient for implementation
- Graph
- easy to read and understand
30Two ways of Representations (Table)
- Transforming Table
- start state S0
- terminal state
- Row(?) characters
- Column(?)states
- Cell(??) states or ?
31Example of Transforming Table
- a, b, c, d
- SS S0, S1, S2, S
- Start state S0
- Set of terminal states S
- (S0,a)? S1, (S0,c)?S2,
- (S0,d)?S, (S1,b)?S1,
- (S1,d)?S2, (S2,a)?S,
- (S, c)?S
a b c d
S0 S1 ? S2 S
S1 ? S1 ? S2
S2 S ? ? ?
S ? ? S ?
32Two ways of Representations (graph)
- Graph
- start state
- terminal state
- State
- Edge
S
33Example of Graphical DFA
- a, b, c, d
- SS S0, S1, S2, S
- Start state S0
- Set of terminal states S
- (S0,a)? S1, (S0,c)?S2,
- (S0,d)?S, (S1,b)?S1,
- (S1,d)?S2, (S2,a)?S,
- (S, c)?S
S1
S2
34Some Concepts
- String acceptable by a DFA
- If A is a DFA, a1 a2 an is a string, if there
exists a sequence - of states (S0, S1 , ,Sn), which satisfies
- S0 S1 , S1 S2 , , Sn-1
Sn - where S0 is the start symbol, Sn is one of the
accept states, - the string a1 a2 an is acceptable by the DFA A.
- Set of strings defined by DFA
- The set of all the strings that are acceptable by
a DFA A is called the set of strings defined by
A, which is denoted as L(A)
35Some Concepts
- Special case
- If a DFA is composed by one state, which is the
start state and the accept state, the set of
strings defined by the DFA is an empty set ?
(?????).
36Relating DFA to Lexical Structure of a
Programming language
- Use a DFA to define the lexical structure of one
word-type in a programming language - usigned real number (?????)
- A DFA can be defined for the lexical structure of
all the words in a programming language - The set of strings defined by the DFA is the set
of allowed words in the programming language - The implementation of the DFA can be used as a
scanner for the programming language
37Assignment
- For standard C programming language
- Find out the token types and their lexical
structures - Write down DFA for each token type
38Implementation of DFA
39Implementation of DFA
- Objective (meaning of implementing a DFA)
- Given a DFA which defines rules for a set of
strings - Develop a program, which
- Read a string
- Check whether this string is accepted by the DFA
- Two ways
- Basing on transforming table of DFA
- Basing on graphical representation of DFA
40Transforming Table based Implementation
- Main idea
- Input a string
- Output true if acceptable, otherwise false
- Data structure
- Transforming table (two dimensional array T)
- Two variables
- CurrentState record current state
- CurrentChar record current character that is
read in the string
41Transforming Table based Implementation
- Main idea
- General Algorithm
- 1.CurrentState S0
- 2. read the first character as
CurrentChar - 3. if CurrentChar is not the end of the
string, - if T(CurrentState,CurrentChar)?
error, - CurrentState
T(CurrentState,CurrentChar), - read next character of
the string as CurrentChar, - goto 3
- 4. if CurrentChar is the end of the
string and CurrentState is one of the - terminal states, return true
otherwise, return false.
42Example
a b c d
S0 S1 ? S2 S
S1 ? S1 ? S2
S2 S ? ? ?
S ? ? S ?
1) abbacc true
2) cab
false
43Transforming table for the DFA
Variables CurrentChar, CurrentState
Read the string that want to be checked
Checking process
44Graph based Implementation of DFA
- each state corresponds to a case statement
- each edge corresponds to a goto statement
- for accept state, add one more branch, if current
char is the end of the string then accept
Li case CurrentChar of a
goto Lj b goto Lk
other Error( )
Li case CurrentChar of a
goto Lj b goto Lk
return true
other Error( )
i
45LS0 read character to CurrentChar case
CurrentChar of a goto LS1
c goto LS2 d goto LS3
other return false
LS1 read character to CurrentChar case
CurrentChar of b goto LS1 d
goto LS2 other return false
46LS2 read character to CurrentChar case
CurrentChar of a goto LS3
other return false
LS3 read character to CurrentChar
case CurrentChar of c goto LS2
return true other return
false
47Definition of NFA
48Formal Definition
- (?,SS, SS0, ?, TS)
- ?(alphabet),set of allowed characters, each
character can be called as input symbol - SS S0, S1, S2, ,a finite set, each element
is called state - S0? SS, set of start states
- ? SS ? ? ? power set of SS ? ? , transforming
function - TS?SS, set of terminal (accept) states
- Note? is a function which accepts a state and a
symbol and returns a set of states or ?(no
definition)
49Differences between DFA NFA
DFA NFA
Start state One start state Set of start states
? no allowed
T (S, a) S or ? S1, , Sn or ?
implementation easy Non-deterministic
50Example of NFA
- a, b, c, d
- SS S0, S10, S2, S
- Set of Start state S0 , S10
- Set of terminal states S
- (S0,a)? S10, S,(S0,?)? S2,
- (S10,b)?S10, (S10, ?)?S2,
- (S2, ?)?S,
- (S, c)?S
S10
S2
51From NFA to DFA
52Main Idea
- Solve two problems
- ? edge
- ?-closure (SS)
??? - Merging those edges with the same symbol
- NextStates(SS, a)
- Conversion of NFA to DFA
- Using a set of states in NFA as one state in DFA
- Assuring accepting the same set of strings
53?-closure (???)
- For a given NFA A, and a set of states SS,
- ?-closure(SS) SS
- If there exists a state s in SS, which has a
?-edge referring to a state s and
s??-closure(SS), add to s to ?-closure(SS) - Repeat until there is no state having ?-edge to
states that is not in ?-closure(SS)
54?-closure (???) -- Example
?-closure(S0, S10) ? S0, S10 ? S0,
S10, S2 ? S0, S10, S2 ? S0, S10, S2,S
S10
S2
55Moving States
- For a given set of states SS and a symbol a in a
NFA A, - NextStates(SS, a) s if there is a state
s1?SS, and a - edge s1 s in A
a
56Moving States
NextStates(S0, S10, a) S10, S
S10
S2
NextStates(S0, S10, b) S2
57Algorithm
- Given a NFA A ?, SS, SS0, ?, TS
- Generating an equivalent DFA A ?, SS,S0, ?,
TS - Steps
- (1) S0 ?-closure(SS0), add S0 to SS
- (2) select one state s from SS, for any symbol
a??, - let s NextStates(?-closure(s), a),
- add (s, a) ? s to ?,
- if s?SS, add s to SS
- (3) repeat (2) until all states are handled
- (4) for a state s in SS, s S1, .., Sn, if
there exists Si?TS, then s is an accept state in
A, add s to TS
58Example
S0 ?-closure(S0, S10) S0, S10,
S2,S ,
S10
a b c
S0, S10, S2,S S10, S,S2 S10, S,S2 S
S10, S,S2 S10, S,S2 S
S S
S2
59Minimizing DFA
60Problem
- Equivalent of two DFAs
- If the set of strings accepted by two DFAs are
the same - Among those DFAs that accept the same set of
strings, the minimal DFA refers to the one that
has minimal number of states
How this happens?
61Equivalent DFAs
S1
S1
S2
There are states that accepting the same set of
strings!
62Main Idea
- Equivalent states(????)
- For two states s1 and s2 in a DFA, if treat s1
and s2 as start states and they accept the same
set of strings, s1 and s2 will be called
equivalent states - Two ways to minimizing DFA
- Merging equivalent states (????)
- Splitting non-equivalent states(????)
63Algorithm
- Given a DFA A ?, SS, S0, ?, TS
- Generating an equivalent DFA A ?, SS,S0,
?, TS - Splitting Steps
- (1) two groups non-terminal states, terminal
states - (2) select one group of states SSi Si1,,
Sin, - replace SSi with split(SSi)
- (3) repeat (2) until all groups are handled
- (4) SS set of groups
- (5) S0 is the group consisting of S0
- (6) if the group consisting of terminal states of
A, it is terminal state of A - (7) ? SSi SSj , if there is Si
Sj in A, Si?SSi, Sj?SSj
a
a
64Splitting a Set of States
- Given
- a NFA A ?, SS, S0, ?, TS
- Groups of states SS1, , SSm, SS1? ?SSm SS
- SSi Si1,, Sin,
- split(SSi) is to split SSi into two group G1 and
G2, - For j 1 to n
- for any a??,
- If (Si1,a ) ? Sk ? (Sij, a) ?Sl ? Sk and Sl
belong to the same group SSp , add Sij to G1 - Otherwise, add Sij to G2
65Simple Example
S0, S1, S2, S3, S4
S1
S0 , S1, S2, S3, S4
S2
66Outline
v
- 2.1 Overview
- 2.1.1 General Function of a Scanner
- 2.1.2 Some Issues about Scanning
- 2.2 Finite Automata
- 2.2.1 Definition and Implementation of DFA
- 2.2.2 Non-Determinate Finite Automata
- 2.2.3 Transforming NFA into DFA
- 2.2.4 Minimizing DFA
- 2.3 Regular Expressions
- 2.3.1 Definition of Regular Expressions
- 2.3.2 Regular Definition
- 2.3.4 From Regular Expression to DFA
- 2.4 Design and Implementation of a Scanner
- 2.4.1 Developing a Scanner from DFA
- 2.4.2 A Scanner Generator Lex
v
672.3 Regular Expressions
- Definition of Regular Expressions
- Regular Definition
- From Regular Expression to DFA
68Definition of Regular Expressions (RE)
- - Some Concepts
- - Formal Definition of RE
- - Example
- - Properties of RE
- - Extensions to RE
- - Limitations of RE
- - Using RE to define Lexical Structure
69Some Concepts
- alphabet(???)a non-empty finite set of
symbols,which is denoted as ?,one of its elements
is called symbol. - string(???)finite sequence of symbols, we use ?
or ?to represent empty string(??) - ???? is different from empty set ? ?
- length of a string(?????)the number of symbols
ina string, we use ? to represent the length of
the string ? - concatenate operator for strings(???????)
- if ? and ? are strings,we use ?? as the
concatenation of two strings, especially we have
?? ?? ?
70Some Operators on Set of Strings
- product of set of strings (???????)
- if A and B are two sets of strings, AB is
called the - product of two sets of strings, AB?? ??A,?
?B - especially ?AA?A,where ? represents empty
set? - power of set of strings(????????)
- if A is a set of strings, Ai is called ith
power of A, where i is a non-negative
integer(????)? - A0 ?
- A1 A , A2 A A
- AK AA......A (k)
- positive closure(?????????)A A1 ? A2 ?A3
...... - star closure(?????????)A A0 ? A1 ? A2 ?A3
......
71Formal Definition
- For a given alphabet ?, a regular expression for
? defines a set of strings of ?, - If we use R? to represent a regular expression
for ?, and L(R?) to represent the set of strings
that R? defines.
72Formal Definition
- ? is a regular expression,L(?)
- ? is a regular expression,L(?) ?
- for any c ??, c is a regular expression,
L(c)c - if A and B are regular expressions, following
operators can be used - Â ( A ) , L( (A) ) L(A)
- choice among alternatives A B,L( A B
)L(A)?L(B) - concatenation A B , L( A B ) L(A)L(B)
- repetation A , L( A) L(A)
73Example
74Comparing with DFA
- Equivalent in describing the set of strings
- Can be conversed into each other
- DFA is convenient for implementation
- RE is convenient for defining and understanding
- Both of them can be used to define the lexical
structure of programming languages
75Properties
- A B B A ?????
- A (B C) (A B ) C ?????
- A (B C) (A B )C ???????
- A (B C) A B A C ???????
- (A B ) C A C B C ???????
- A A ?????
- A?AA? ?????????
76Extensions
- Some extensions can be made to facilitate
definition - A
- any symbol .
- range 0-9 a-z A-Z
- not in the range (abc)
- optional r?(lr)
77Limitations
- RE can not define such structure like
- Pairing ??, ()
- Nesting??,
- RE can not describe those structures that include
finite number of repetitions - for examplew c w , w is a string
containing a - and b
(ab) c (ab) can not be used, because it
cannot guarantee that the strings on both sides
of c are the same all the time
78Regular Definition
79Definition
- It is inconvenient to define set of long strings
with RE, so another formal notation is
introduced, which is called Formal Definition - The main idea is that naming some sub-expressions
in RE - Example
- (129)(0129)
- NZ_digit 129
- digit NZ_digit 0
- NZ_digit digit
80Defining Lexical Structure of ToyL
- letter azAZ
- digit 09
- NZ-digit 19
- Reserved words
- Reserved var if then else while
read write intIdentifiers letter digit - Constant
- integer int NZ-digit digit 0
- Other symbols syms - / gt lt
( ) - Lexical structure
- lex Reserved identifier int syms
81From RE to NFA
82Rules
- ? is a regular expression,L(?)
- ? is a regular expression,L(?) ?
- for any c ??, c is a regular expression,
L(c)c
c
83Rules
- ( A ), L( (A) ) L(A), no change
- A B, L( A B ) L(A)L(B)
NFA(B)
NFA(A)
?
84Rules
- ( A ), L( (A) ) L(A), no change
- A B,L( A B )L(A)?L(B)
NFA(A)
?
?
?
NFA(B)
?
85Rules
?
NFA(A)
?
?
?
86Attention
- The rules introduced above are effective for
those NFAs that have one start state and one
terminal state - Any NFA can be extended to meet this requirement
NFA
?
?
?
?
87Example
?
a
?
a
b
b
?
b
a
b
88Outline
v
- 2.1 Overview
- 2.1.1 General Function of a Scanner
- 2.1.2 Some Issues about Scanning
- 2.2 Finite Automata
- 2.2.1 Definition and Implementation of DFA
- 2.2.2 Non-Determinate Finite Automata
- 2.2.3 Transforming NFA into DFA
- 2.2.4 Minimizing DFA
- 2.3 Regular Expressions
- 2.3.1 Definition of Regular Expressions
- 2.3.2 Regular Definition
- 2.3.4 From Regular Expression to DFA
- 2.4 Design and Implementation of a Scanner
- 2.4.1 Developing a Scanner from DFA
- 2.4.2 A Scanner Generator Lex
v
v
892.4 Design and Implementation of a Scanner
- Developing a Scanner Manually
- A Scanner Generator Lex
90Developing a Scanner Manually
transforming
Lexical Definition in Regular Expression
NFA
transforming
DFA
minimizing
Develop a Scanner
minimized DFA
implement
91Implement Scanner with DFA
- Implementation of DFA
- Just checking whether a string is acceptable by
the DFA - Implementation of a Scanner
- not checking
- but recognizing an acceptable string(word) and
establish its internal representation - lttoken-type, semantic informationgt
92Defining Lexical Structure of ToyL
- letter azAZ
- digit 09
- NZ-digit 19
- Reserved words
- Reserved var if then else while
read write int - Identifiers letter digit
- Constant
- integer int NZ-digit digit 0
- Other symbols syms - / gt lt
( ) - Lexical structure
- lex Reserved identifier int syms
93DFA for ToyL
digit
IntNum
NZ-digit
other
letter
ID
letter
other
start
done
digit
Assign
, -, , /, gt lt ( )
other refer to those symbols that are not allowed!
Reserved(key)-words will be decided by checking
identifier in reserved(key)-words table
94Developing a Scanner from DFA
- Input a sequence of symbols, with a special
symbol EOF as the end of the sequence - Output a sequence of tokens
95Developing a Scanner from DFA
Token Type typedef enum IDE, NUM, ASS,
//???,??,???
PLUS, MINUS, MUL, // , -, ,
DIV, GT, LT, EQ,
// /, gt, lt,
SEMI, LPAREN, RPAREN // , (. )
LG, RG, COLON, // ,
, VAR, THEN, ELSE,
INT , WHILE, READ,
WRITE , IF TkType
//keywords
96Developing a Scanner from DFA
Data Structure for TOKEN struct Token
TkType type
string val50
97Developing a Scanner from DFA
Global Variables - string str50
----- store the string has been read already
- int len 0 ----- the length of
the str - Token tk -----
current token - Token TokenList100 ----
the sequence of tokens - int total 0
----- the number of tokens generated
98Developing a Scanner from DFA
Predefined Functions - ReadNext() ---
read current symbol to CurrentChar,
if current symbol is EOF
returns false
else returns true - IsKeyword(str) ---
checking whether str is one of keywords,
if str is a
keyword, it returns the number
of the keywords
else it returns -1
99Developing a Scanner from DFA
if (not ReadNext()) exit start case
CurrentChar of 1..9 strlen
CurrentChar len goto IntNum
a..z, A..Z strlen CurrentChar len
goto ID goto Assign
tk.type PLUS if (not ReadNext()) exit
(-,,/, gt, lt, , , , ( , ) , ,
) other error()
100Developing a Scanner from DFA
IntNum if (not ReadNext()) if len !0
error exit case CurrentChar of
0..9 strlen CurrentChar len goto
IntNum other tk.type NUM,
strcpy(tk.val, str)
101Developing a Scanner from DFA
ID if (not ReadNext()) if len !0 error
exit case CurrentChar of
0..9 strlen CurrentChar len goto ID
a..z, A..Z strlen CurrentChar
len goto ID other if
IsKeyword(str)
tk.type IsKeyword(str)
else tk.type IDE, strcpy(tk.val, str)
goto done
102Developing a Scanner from DFA
Assign if (not ReadNext()) if len !0
error exit case CurrentChar of
Tk.type ASS if (not
ReadNext()) exit goto Done
othererror()
103Developing a Scanner from DFA
Done TokenListtotal tk //
add new token to the token list total
// len 0
//start storing
new token string strcpy(str, )
// reset the token string goto
start //start
scanning new token
104A Scanner Generator Lex
- Different versions of Lex
- flex is distributed by GNU compiler package
produced by the Free Software Foundation, which
is freely available from Internet
flex
.l RE-like definition
lexyy.c ( yylex() )
105Summary for 2. Scanning
106Summary
- About finite automata
- Definition of DFA
- (?, start state, set of states, set of terminate
states, f) - Definition of NFA
- (?, set of start states, set of states, set of
terminate states, f) - Differences between NFA and DFA
- Number of start states
- ?
- Allows more than one ledges for a state and one
same symbol
107Summary
- About finite automata
- From NFA to DFA
- main idea
- solve problem
- Minimizing DFA
- main idea
- solve problem
- Implementing DFA
108Summary
- About regular expressions
- Definition of regular expression
- Regular definition
- From regular expression to NFA
- main idea
- solving problem
- Defining lexical structure with regular expression
109Summary
- About scanner
- Defining lexical structure of the programming
language with regular expression - Transforming regular expression into NFA
- Transforming NFA into DFA
- Minimizing DFA
- Implementing DFA
110Summary
- Original Problem
- Develop a Scanner
- Read source program in the form of stream of
characters, and recognize tokens with respect to
lexical rules of source language - General techniques
- Use RE to define Lexical structure
- RE -gt NFA -gt DFA -gt minimized DFA -gt implement
- General Problem
- Use RE/FA to define the structural rules
- Check whether the input meets the structural rules
111Summary
- Application in similar problems
- Use RE(DFA) to formally describe the structures
- Strings
- Security policies
- Interface specification
- Check
- whether a string meets the structural rules
- Whether certain execution meets security
policies - Properties checking
112Any Questions?
113Reading Assignment
- Topic How to develop a parser(?????)?
- Objectives
- Get to know
- What is a parser? (input, output, functions)
- The Syntactical structure of a C programs?
- Different Parsing techniques and their main idea?
- References
- Optional textbooks
- Hand in a report either in English or in Chinese,
and one group will be asked to give a
presentation at the beginning of next class - Tips
- Collect more information from textbooks and
internet - Establish your own opinion