Compiler Construction Principles - PowerPoint PPT Presentation

1 / 113
About This Presentation
Title:

Compiler Construction Principles

Description:

Title: Network Management Author: Valued Gateway Customer Last modified by: JINYING Created Date: 12/8/1997 7:20:36 PM Document presentation format – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 114
Provided by: ValuedGate1999
Category:

less

Transcript and Presenter's Notes

Title: Compiler Construction Principles


1
Compiler Construction Principles Implementation
Techniques
  • Dr. Ying JIN
  • Associate Professor
  • Sept. 2007

2
What we have already introduced
  • What this course is about?
  • What is a compiler?
  • The ways to design and implement a compiler
  • General functional components of a compiler
  • The general translating process of a compiler

3
What will be introduced
  • Scanning
  • The first phase in a compiler
  • Functional requirement (input, output, functions)
  • Data structures
  • General techniques in developing a scanner
  • Two formal languages
  • Regular expression
  • Finite automata (DFA, NFA)
  • Three algorithms
  • From regular expression to NFA
  • From NFA to DFA
  • Minimizing DFA
  • One implementation
  • Implementing DFA to get a scanner

4
Functional components of a Compiler
Target Code Generation
Lexical Analysis scanning
Intermediate Code Optimization
Syntax Analysis Parsing
Intermediate Code Generation
Semantic Analysis
analysis/front end
synthesis/back end
5
Presentation Time
  • What you have read and known?

6
Outline
  • 2.1 Overview
  • 2.1.1 General Function of a Scanner
  • 2.1.2 Some Issues about Scanning
  • 2.2 Finite Automata
  • 2.2.1 Definition and Implementation of DFA
  • 2.2.2 Non-Determinate Finite Automata
  • 2.2.3 Transforming NFA into DFA
  • 2.2.4 Minimizing DFA
  • 2.3 Regular Expressions
  • 2.3.1 Definition of Regular Expressions
  • 2.3.2 Regular Definition
  • 2.3.4 From Regular Expression to DFA
  • 2.4 Design and Implementation of a Scanner
  • 2.4.1 Developing a Scanner from DFA
  • 2.4.2 A Scanner Generator Lex

7
Knowledge Relation Graph
using
transforming
Lexical definition
Regular Expression
NFA
transforming
basing on
DFA
Develop a Scanner
minimizing
minimized DFA
implement
8
Outline
  • 2.1 Overview
  • 2.1.1 General Function of a Scanner
  • 2.1.2 Some Issues about Scanning
  • 2.2 Finite Automata
  • 2.2.1 Definition and Implementation of DFA
  • 2.2.2 Non-Determinate Finite Automata
  • 2.2.3 Transforming NFA into DFA
  • 2.2.4 Minimizing DFA
  • 2.3 Regular Expressions
  • 2.3.1 Definition of Regular Expressions
  • 2.3.2 Regular Definition
  • 2.3.4 From Regular Expression to DFA
  • 2.4 Design and Implementation of a Scanner
  • 2.4.1 Developing a Scanner from DFA
  • 2.4.2 A Scanner Generator Lex

9
2.1 Overview
  • General function of scanning
  • Input
  • Output
  • Functional description
  • Some issues about scanning
  • Tokens
  • Blank/tab, return, newline, comments
  • Lexical errors

10
General Function of Scanning
  • Input
  • Source program
  • Output
  • Sequence of words (tokens)
  • Functional description (similar to spelling)
  • Read source program
  • Recognize words one by one according to the
    lexical definition of the source language
  • Build internal representation of words tokens
  • Check lexical errors
  • Output the sequence of words

11
Some Issues about Scanning
12
Tokens
  • Source program
  • Stream of characters
  • Token
  • A sequence of characters that can be treated as a
    unit in the grammar of a programming language
  • Token types
  • Identifier x, y1,
  • Number 12, 12.3,
  • Keywords (reserved words) int, real, main,
  • Operator , -, , / , gt, lt ,
  • Delimiter , , ,

13
Tokens
  • The content of a token

token-type
Semantic information
Semantic information - Identifier the
string - Number the value - Keywords
(reserved words) the number in the keyword table
- Operator itself - Delimiter itself
14
Keywords
  • Keywords
  • Words that have special meaning
  • Can not be used for other meanings
  • Reserved words
  • Words that are reserved by a programming language
    for special meaning
  • Can be used for other meaning, overload previous
    meaning
  • Keyword table
  • to record all keywords defined by the source
    programming language

15
Sample Source Program

y
,
?
n
i
?
v
a
r
t
x

?
?
a
d
e
y
r
(
d
r
?
x
)
a
(

e
)

t
?
n
h
e
?
y
gt
x
?
i
f
?
1
s
l
e
?
)
(
t
e
i
r
w
e

)

?
(
0
t
e
i
r
16
Sequence of Tokens
var,k ? int, k x, ide , y, ide ?
read, k ( x, ide ) ? read, k
( y, ide ) ? if, k x, ide gt
y , ide then, k write,k ( 1, num ) ?
else, k write,k ( 0, num ) ?
17
Blank, tab, newline, comments
  • No semantic meaning
  • Only for readability
  • Can be removed
  • Line number should be calculated

18
The End of Scanning
  • Two optional situations
  • Once read the character representing the end of a
    program
  • PASCAL .
  • The end of the source program file

19
Lexical Errors
  • Limited types of errors can be found during
    scanning
  • illegal character
  • , ?
  • the first character is wrong
  • /abc
  • Lexical Error Recovery
  • Once a lexical error has been found, scanner will
    not stop, it will take some measures to continue
    the process of scanning
  • Ignore current character, start from next
    character

if a then x 12.else
20
Scanner
  • Two forms

call
Attached Scanner
Syntax Analysis
  • CharList

Token
CharList
Independent Scanner
TokenList
Syntax Analysis
21
To Develop a Scanner
  • Now we know what is the function of a Scanner
  • How to implement a scanner?
  • Basis lexical rules of the source programming
    language
  • Set of allowed characters
  • What kinds of tokens it has?
  • The structure of each token-type
  • Scanner will be developed according to the
    lexical rules

22
How to define the lexical structure of a
programming language?
  • Natural language (English, Chinese )
  • - easy to write, ambiguous and hard to implement
  • Formal languages
  • - need some special background knowledge
  • - concise, easy to implement
  • - automation

23
  • Two formal languages for defining lexical
    structure of a programming language
  • Finite automata
  • Non-deterministic finite automata (NFA)
  • Deterministic finite automata (DFA)
  • Regular expressions (RE)
  • Both of them can formally describing lexical
    structure
  • The set of allowed words
  • They are equivalent
  • FA is easy to implement RE is easy to use

24
Outline
v
  • 2.1 Overview
  • 2.1.1 General Function of a Scanner
  • 2.1.2 Some Issues about Scanning
  • 2.2 Finite Automata
  • 2.2.1 Definition and Implementation of DFA
  • 2.2.2 Non-Determinate Finite Automata
  • 2.2.3 Transforming NFA into DFA
  • 2.2.4 Minimizing DFA
  • 2.3 Regular Expressions
  • 2.3.1 Definition of Regular Expressions
  • 2.3.2 Regular Definition
  • 2.3.4 From Regular Expression to DFA
  • 2.4 Design and Implementation of a Scanner
  • 2.4.1 Developing a Scanner from DFA
  • 2.4.2 A Scanner Generator Lex

25
2.2 Finite Automata
  • Definition of DFA
  • Implementation of DFA
  • Non-Determinate Finite Automata
  • Transforming NFA into DFA
  • Minimizing DFA

26
Definition of DFA
  • - formal definition
  • - two ways of representations
  • - examples
  • - some concepts

27
Formal Definition of DFA
  • (?,SS, S0, ?, TS)
  • ?(alphabet),set of allowed characters, each
    character can be called as input symbol
  • SS S0, S1, S2, ,a finite set, each element
    is called state
  • S0? SS, start state
  • ? SS ? ? ? SS ? ? , transforming function
  • TS?SS, set of terminal (accept) states
  • Note? is a function which accepts a state and a
    symbol and returns either one unique state or
    ?(no definition)

One DFA defines a set of strings each string is
a sequence of characters in ? Start state gives
the start point of generating strings Terminal
states give the end point Transforming function
give the rules how to generate strings
28
  • Features of a DFA
  • One start state
  • For a state and a symbol, it has at most one
    edge
  • Functions of DFA
  • It defines a set of strings
  • It can be used for defining lexical structure of
    a programming language

29
Two ways of Representations
  • Table
  • Convenient for implementation
  • Graph
  • easy to read and understand

30
Two ways of Representations (Table)
  • Transforming Table
  • start state S0
  • terminal state
  • Row(?) characters
  • Column(?)states
  • Cell(??) states or ?

31
Example of Transforming Table
  • a, b, c, d
  • SS S0, S1, S2, S
  • Start state S0
  • Set of terminal states S
  • (S0,a)? S1, (S0,c)?S2,
  • (S0,d)?S, (S1,b)?S1,
  • (S1,d)?S2, (S2,a)?S,
  • (S, c)?S

a b c d
S0 S1 ? S2 S
S1 ? S1 ? S2
S2 S ? ? ?
S ? ? S ?
32
Two ways of Representations (graph)
  • Graph
  • start state
  • terminal state
  • State
  • Edge

S
33
Example of Graphical DFA
  • a, b, c, d
  • SS S0, S1, S2, S
  • Start state S0
  • Set of terminal states S
  • (S0,a)? S1, (S0,c)?S2,
  • (S0,d)?S, (S1,b)?S1,
  • (S1,d)?S2, (S2,a)?S,
  • (S, c)?S

S1
S2
34
Some Concepts
  • String acceptable by a DFA
  • If A is a DFA, a1 a2 an is a string, if there
    exists a sequence
  • of states (S0, S1 , ,Sn), which satisfies
  • S0 S1 , S1 S2 , , Sn-1
    Sn
  • where S0 is the start symbol, Sn is one of the
    accept states,
  • the string a1 a2 an is acceptable by the DFA A.
  • Set of strings defined by DFA
  • The set of all the strings that are acceptable by
    a DFA A is called the set of strings defined by
    A, which is denoted as L(A)

35
Some Concepts
  • Special case
  • If a DFA is composed by one state, which is the
    start state and the accept state, the set of
    strings defined by the DFA is an empty set ?
    (?????).

36
Relating DFA to Lexical Structure of a
Programming language
  • Use a DFA to define the lexical structure of one
    word-type in a programming language
  • usigned real number (?????)
  • A DFA can be defined for the lexical structure of
    all the words in a programming language
  • The set of strings defined by the DFA is the set
    of allowed words in the programming language
  • The implementation of the DFA can be used as a
    scanner for the programming language

37
Assignment
  • For standard C programming language
  • Find out the token types and their lexical
    structures
  • Write down DFA for each token type

38
Implementation of DFA
39
Implementation of DFA
  • Objective (meaning of implementing a DFA)
  • Given a DFA which defines rules for a set of
    strings
  • Develop a program, which
  • Read a string
  • Check whether this string is accepted by the DFA
  • Two ways
  • Basing on transforming table of DFA
  • Basing on graphical representation of DFA

40
Transforming Table based Implementation
  • Main idea
  • Input a string
  • Output true if acceptable, otherwise false
  • Data structure
  • Transforming table (two dimensional array T)
  • Two variables
  • CurrentState record current state
  • CurrentChar record current character that is
    read in the string

41
Transforming Table based Implementation
  • Main idea
  • General Algorithm
  • 1.CurrentState S0
  • 2. read the first character as
    CurrentChar
  • 3. if CurrentChar is not the end of the
    string,
  • if T(CurrentState,CurrentChar)?
    error,
  • CurrentState
    T(CurrentState,CurrentChar),
  • read next character of
    the string as CurrentChar,
  • goto 3
  • 4. if CurrentChar is the end of the
    string and CurrentState is one of the
  • terminal states, return true
    otherwise, return false.

42
Example
a b c d
S0 S1 ? S2 S
S1 ? S1 ? S2
S2 S ? ? ?
S ? ? S ?
1) abbacc true
2) cab
false
43
Transforming table for the DFA
Variables CurrentChar, CurrentState
Read the string that want to be checked
Checking process
44
Graph based Implementation of DFA
  • each state corresponds to a case statement
  • each edge corresponds to a goto statement
  • for accept state, add one more branch, if current
    char is the end of the string then accept

Li case CurrentChar of a
goto Lj b goto Lk
other Error( )
Li case CurrentChar of a
goto Lj b goto Lk
return true
other Error( )
i
45
LS0 read character to CurrentChar case
CurrentChar of a goto LS1
c goto LS2 d goto LS3
other return false
LS1 read character to CurrentChar case
CurrentChar of b goto LS1 d
goto LS2 other return false
46
LS2 read character to CurrentChar case
CurrentChar of a goto LS3
other return false
LS3 read character to CurrentChar
case CurrentChar of c goto LS2
return true other return
false
47
Definition of NFA
48
Formal Definition
  • (?,SS, SS0, ?, TS)
  • ?(alphabet),set of allowed characters, each
    character can be called as input symbol
  • SS S0, S1, S2, ,a finite set, each element
    is called state
  • S0? SS, set of start states
  • ? SS ? ? ? power set of SS ? ? , transforming
    function
  • TS?SS, set of terminal (accept) states
  • Note? is a function which accepts a state and a
    symbol and returns a set of states or ?(no
    definition)

49
Differences between DFA NFA
DFA NFA
Start state One start state Set of start states
? no allowed
T (S, a) S or ? S1, , Sn or ?
implementation easy Non-deterministic
50
Example of NFA
  • a, b, c, d
  • SS S0, S10, S2, S
  • Set of Start state S0 , S10
  • Set of terminal states S
  • (S0,a)? S10, S,(S0,?)? S2,
  • (S10,b)?S10, (S10, ?)?S2,
  • (S2, ?)?S,
  • (S, c)?S

S10
S2
51
From NFA to DFA
52
Main Idea
  • Solve two problems
  • ? edge
  • ?-closure (SS)
    ???
  • Merging those edges with the same symbol
  • NextStates(SS, a)
  • Conversion of NFA to DFA
  • Using a set of states in NFA as one state in DFA
  • Assuring accepting the same set of strings

53
?-closure (???)
  • For a given NFA A, and a set of states SS,
  • ?-closure(SS) SS
  • If there exists a state s in SS, which has a
    ?-edge referring to a state s and
    s??-closure(SS), add to s to ?-closure(SS)
  • Repeat until there is no state having ?-edge to
    states that is not in ?-closure(SS)

54
?-closure (???) -- Example
?-closure(S0, S10) ? S0, S10 ? S0,
S10, S2 ? S0, S10, S2 ? S0, S10, S2,S
S10
S2
55
Moving States
  • For a given set of states SS and a symbol a in a
    NFA A,
  • NextStates(SS, a) s if there is a state
    s1?SS, and a
  • edge s1 s in A

a
56
Moving States
NextStates(S0, S10, a) S10, S
S10
S2
NextStates(S0, S10, b) S2
57
Algorithm
  • Given a NFA A ?, SS, SS0, ?, TS
  • Generating an equivalent DFA A ?, SS,S0, ?,
    TS
  • Steps
  • (1) S0 ?-closure(SS0), add S0 to SS
  • (2) select one state s from SS, for any symbol
    a??,
  • let s NextStates(?-closure(s), a),
  • add (s, a) ? s to ?,
  • if s?SS, add s to SS
  • (3) repeat (2) until all states are handled
  • (4) for a state s in SS, s S1, .., Sn, if
    there exists Si?TS, then s is an accept state in
    A, add s to TS

58
Example
  • a, b, c,

S0 ?-closure(S0, S10) S0, S10,
S2,S ,
S10
a b c
S0, S10, S2,S S10, S,S2 S10, S,S2 S
S10, S,S2 S10, S,S2 S
S S
S2
59
Minimizing DFA
60
Problem
  • Equivalent of two DFAs
  • If the set of strings accepted by two DFAs are
    the same
  • Among those DFAs that accept the same set of
    strings, the minimal DFA refers to the one that
    has minimal number of states

How this happens?
61
Equivalent DFAs
S1
S1
S2
There are states that accepting the same set of
strings!
62
Main Idea
  • Equivalent states(????)
  • For two states s1 and s2 in a DFA, if treat s1
    and s2 as start states and they accept the same
    set of strings, s1 and s2 will be called
    equivalent states
  • Two ways to minimizing DFA
  • Merging equivalent states (????)
  • Splitting non-equivalent states(????)

63
Algorithm
  • Given a DFA A ?, SS, S0, ?, TS
  • Generating an equivalent DFA A ?, SS,S0,
    ?, TS
  • Splitting Steps
  • (1) two groups non-terminal states, terminal
    states
  • (2) select one group of states SSi Si1,,
    Sin,
  • replace SSi with split(SSi)
  • (3) repeat (2) until all groups are handled
  • (4) SS set of groups
  • (5) S0 is the group consisting of S0
  • (6) if the group consisting of terminal states of
    A, it is terminal state of A
  • (7) ? SSi SSj , if there is Si
    Sj in A, Si?SSi, Sj?SSj

a
a
64
Splitting a Set of States
  • Given
  • a NFA A ?, SS, S0, ?, TS
  • Groups of states SS1, , SSm, SS1? ?SSm SS
  • SSi Si1,, Sin,
  • split(SSi) is to split SSi into two group G1 and
    G2,
  • For j 1 to n
  • for any a??,
  • If (Si1,a ) ? Sk ? (Sij, a) ?Sl ? Sk and Sl
    belong to the same group SSp , add Sij to G1
  • Otherwise, add Sij to G2

65
Simple Example
S0, S1, S2, S3, S4
S1
S0 , S1, S2, S3, S4
S2
66
Outline
v
  • 2.1 Overview
  • 2.1.1 General Function of a Scanner
  • 2.1.2 Some Issues about Scanning
  • 2.2 Finite Automata
  • 2.2.1 Definition and Implementation of DFA
  • 2.2.2 Non-Determinate Finite Automata
  • 2.2.3 Transforming NFA into DFA
  • 2.2.4 Minimizing DFA
  • 2.3 Regular Expressions
  • 2.3.1 Definition of Regular Expressions
  • 2.3.2 Regular Definition
  • 2.3.4 From Regular Expression to DFA
  • 2.4 Design and Implementation of a Scanner
  • 2.4.1 Developing a Scanner from DFA
  • 2.4.2 A Scanner Generator Lex

v
67
2.3 Regular Expressions
  • Definition of Regular Expressions
  • Regular Definition
  • From Regular Expression to DFA

68
Definition of Regular Expressions (RE)
  • - Some Concepts
  • - Formal Definition of RE
  • - Example
  • - Properties of RE
  • - Extensions to RE
  • - Limitations of RE
  • - Using RE to define Lexical Structure

69
Some Concepts
  • alphabet(???)a non-empty finite set of
    symbols,which is denoted as ?,one of its elements
    is called symbol.
  • string(???)finite sequence of symbols, we use ?
    or ?to represent empty string(??)
  • ???? is different from empty set ? ?
  • length of a string(?????)the number of symbols
    ina string, we use ? to represent the length of
    the string ?
  • concatenate operator for strings(???????)
  • if ? and ? are strings,we use ?? as the
    concatenation of two strings, especially we have
    ?? ?? ?

70
Some Operators on Set of Strings
  • product of set of strings (???????)
  • if A and B are two sets of strings, AB is
    called the
  • product of two sets of strings, AB?? ??A,?
    ?B
  • especially ?AA?A,where ? represents empty
    set?
  • power of set of strings(????????)
  • if A is a set of strings, Ai is called ith
    power of A, where i is a non-negative
    integer(????)?
  • A0 ?
  • A1 A , A2 A A
  • AK AA......A (k)
  • positive closure(?????????)A A1 ? A2 ?A3
    ......
  • star closure(?????????)A A0 ? A1 ? A2 ?A3
    ......

71
Formal Definition
  • For a given alphabet ?, a regular expression for
    ? defines a set of strings of ?,
  • If we use R? to represent a regular expression
    for ?, and L(R?) to represent the set of strings
    that R? defines.

72
Formal Definition
  • ? is a regular expression,L(?)
  • ? is a regular expression,L(?) ?
  • for any c ??, c is a regular expression,
    L(c)c
  • if A and B are regular expressions, following
    operators can be used
  •   ( A ) , L( (A) ) L(A)
  • choice among alternatives A B,L( A B
    )L(A)?L(B)
  • concatenation A B , L( A B ) L(A)L(B)
  • repetation A , L( A) L(A)

73
Example
  • ? a,b .

74
Comparing with DFA
  • Equivalent in describing the set of strings
  • Can be conversed into each other
  • DFA is convenient for implementation
  • RE is convenient for defining and understanding
  • Both of them can be used to define the lexical
    structure of programming languages

75
Properties
  • A B B A ?????
  • A (B C) (A B ) C ?????
  • A (B C) (A B )C ???????
  • A (B C) A B A C ???????
  • (A B ) C A C B C ???????
  • A A ?????
  • A?AA? ?????????

76
Extensions
  • Some extensions can be made to facilitate
    definition
  • A
  • any symbol .
  • range 0-9 a-z A-Z
  • not in the range (abc)
  • optional r?(lr)

77
Limitations
  • RE can not define such structure like
  • Pairing ??, ()
  • Nesting??,
  • RE can not describe those structures that include
    finite number of repetitions
  • for examplew c w , w is a string
    containing a
  • and b

(ab) c (ab) can not be used, because it
cannot guarantee that the strings on both sides
of c are the same all the time
78
Regular Definition
79
Definition
  • It is inconvenient to define set of long strings
    with RE, so another formal notation is
    introduced, which is called Formal Definition
  • The main idea is that naming some sub-expressions
    in RE
  • Example
  • (129)(0129)
  • NZ_digit 129
  • digit NZ_digit 0
  • NZ_digit digit

80
Defining Lexical Structure of ToyL
  • letter azAZ
  • digit 09
  • NZ-digit 19
  • Reserved words
  • Reserved var if then else while
    read write intIdentifiers letter digit
  • Constant
  • integer int NZ-digit digit 0
  • Other symbols syms - / gt lt
    ( )
  • Lexical structure
  • lex Reserved identifier int syms

81
From RE to NFA
82
Rules
  • ? is a regular expression,L(?)
  • ? is a regular expression,L(?) ?
  • for any c ??, c is a regular expression,
    L(c)c

c
83
Rules
  • ( A ), L( (A) ) L(A), no change
  • A B, L( A B ) L(A)L(B)

NFA(B)
NFA(A)
?
84
Rules
  • ( A ), L( (A) ) L(A), no change
  • A B,L( A B )L(A)?L(B)

NFA(A)
?
?
?
NFA(B)
?
85
Rules
  • A ,L( A) L(A)

?
NFA(A)
?
?
?
86
Attention
  • The rules introduced above are effective for
    those NFAs that have one start state and one
    terminal state
  • Any NFA can be extended to meet this requirement

NFA
?
?
?
?
87
Example
  • (a b) a b b (a b)

?
a
?
a
b
b
?
b
a
b
88
Outline
v
  • 2.1 Overview
  • 2.1.1 General Function of a Scanner
  • 2.1.2 Some Issues about Scanning
  • 2.2 Finite Automata
  • 2.2.1 Definition and Implementation of DFA
  • 2.2.2 Non-Determinate Finite Automata
  • 2.2.3 Transforming NFA into DFA
  • 2.2.4 Minimizing DFA
  • 2.3 Regular Expressions
  • 2.3.1 Definition of Regular Expressions
  • 2.3.2 Regular Definition
  • 2.3.4 From Regular Expression to DFA
  • 2.4 Design and Implementation of a Scanner
  • 2.4.1 Developing a Scanner from DFA
  • 2.4.2 A Scanner Generator Lex

v
v
89
2.4 Design and Implementation of a Scanner
  • Developing a Scanner Manually
  • A Scanner Generator Lex

90
Developing a Scanner Manually
transforming
Lexical Definition in Regular Expression
NFA
transforming
DFA
minimizing
Develop a Scanner
minimized DFA
implement
91
Implement Scanner with DFA
  • Implementation of DFA
  • Just checking whether a string is acceptable by
    the DFA
  • Implementation of a Scanner
  • not checking
  • but recognizing an acceptable string(word) and
    establish its internal representation
  • lttoken-type, semantic informationgt

92
Defining Lexical Structure of ToyL
  • letter azAZ
  • digit 09
  • NZ-digit 19
  • Reserved words
  • Reserved var if then else while
    read write int
  • Identifiers letter digit
  • Constant
  • integer int NZ-digit digit 0
  • Other symbols syms - / gt lt
    ( )
  • Lexical structure
  • lex Reserved identifier int syms

93
DFA for ToyL
digit
IntNum
NZ-digit
other
letter
ID
letter
other
start
done

digit

Assign
, -, , /, gt lt ( )
other refer to those symbols that are not allowed!
Reserved(key)-words will be decided by checking
identifier in reserved(key)-words table
94
Developing a Scanner from DFA
  • Input a sequence of symbols, with a special
    symbol EOF as the end of the sequence
  • Output a sequence of tokens

95
Developing a Scanner from DFA
Token Type typedef enum IDE, NUM, ASS,
//???,??,???
PLUS, MINUS, MUL, // , -, ,
DIV, GT, LT, EQ,
// /, gt, lt,
SEMI, LPAREN, RPAREN // , (. )
LG, RG, COLON, // ,
, VAR, THEN, ELSE,
INT , WHILE, READ,
WRITE , IF TkType
//keywords
96
Developing a Scanner from DFA
Data Structure for TOKEN struct Token
TkType type
string val50
97
Developing a Scanner from DFA
Global Variables - string str50
----- store the string has been read already
- int len 0 ----- the length of
the str - Token tk -----
current token - Token TokenList100 ----
the sequence of tokens - int total 0
----- the number of tokens generated
98
Developing a Scanner from DFA
Predefined Functions - ReadNext() ---
read current symbol to CurrentChar,
if current symbol is EOF
returns false
else returns true - IsKeyword(str) ---
checking whether str is one of keywords,
if str is a
keyword, it returns the number
of the keywords
else it returns -1
99
Developing a Scanner from DFA
if (not ReadNext()) exit start case
CurrentChar of 1..9 strlen
CurrentChar len goto IntNum
a..z, A..Z strlen CurrentChar len
goto ID goto Assign
tk.type PLUS if (not ReadNext()) exit
(-,,/, gt, lt, , , , ( , ) , ,
) other error()
100
Developing a Scanner from DFA
IntNum if (not ReadNext()) if len !0
error exit case CurrentChar of
0..9 strlen CurrentChar len goto
IntNum other tk.type NUM,
strcpy(tk.val, str)
101
Developing a Scanner from DFA
ID if (not ReadNext()) if len !0 error
exit case CurrentChar of
0..9 strlen CurrentChar len goto ID
a..z, A..Z strlen CurrentChar
len goto ID other if
IsKeyword(str)
tk.type IsKeyword(str)
else tk.type IDE, strcpy(tk.val, str)
goto done
102
Developing a Scanner from DFA
Assign if (not ReadNext()) if len !0
error exit case CurrentChar of
Tk.type ASS if (not
ReadNext()) exit goto Done
othererror()
103
Developing a Scanner from DFA
Done TokenListtotal tk //
add new token to the token list total
// len 0
//start storing
new token string strcpy(str, )
// reset the token string goto
start //start
scanning new token
104
A Scanner Generator Lex
  • Different versions of Lex
  • flex is distributed by GNU compiler package
    produced by the Free Software Foundation, which
    is freely available from Internet

flex
.l RE-like definition
lexyy.c ( yylex() )
105
Summary for 2. Scanning
106
Summary
  • About finite automata
  • Definition of DFA
  • (?, start state, set of states, set of terminate
    states, f)
  • Definition of NFA
  • (?, set of start states, set of states, set of
    terminate states, f)
  • Differences between NFA and DFA
  • Number of start states
  • ?
  • Allows more than one ledges for a state and one
    same symbol

107
Summary
  • About finite automata
  • From NFA to DFA
  • main idea
  • solve problem
  • Minimizing DFA
  • main idea
  • solve problem
  • Implementing DFA

108
Summary
  • About regular expressions
  • Definition of regular expression
  • Regular definition
  • From regular expression to NFA
  • main idea
  • solving problem
  • Defining lexical structure with regular expression

109
Summary
  • About scanner
  • Defining lexical structure of the programming
    language with regular expression
  • Transforming regular expression into NFA
  • Transforming NFA into DFA
  • Minimizing DFA
  • Implementing DFA

110
Summary
  • Original Problem
  • Develop a Scanner
  • Read source program in the form of stream of
    characters, and recognize tokens with respect to
    lexical rules of source language
  • General techniques
  • Use RE to define Lexical structure
  • RE -gt NFA -gt DFA -gt minimized DFA -gt implement
  • General Problem
  • Use RE/FA to define the structural rules
  • Check whether the input meets the structural rules

111
Summary
  • Application in similar problems
  • Use RE(DFA) to formally describe the structures
  • Strings
  • Security policies
  • Interface specification
  • Check
  • whether a string meets the structural rules
  • Whether certain execution meets security
    policies
  • Properties checking

112
Any Questions?
113
Reading Assignment
  • Topic How to develop a parser(?????)?
  • Objectives
  • Get to know
  • What is a parser? (input, output, functions)
  • The Syntactical structure of a C programs?
  • Different Parsing techniques and their main idea?
  • References
  • Optional textbooks
  • Hand in a report either in English or in Chinese,
    and one group will be asked to give a
    presentation at the beginning of next class
  • Tips
  • Collect more information from textbooks and
    internet
  • Establish your own opinion
Write a Comment
User Comments (0)
About PowerShow.com