Title: The scanning process
1The scanning process
- Goal automate the process
- Idea
- Start with an RE
- Build a DFA
- How?
- We can build a non-deterministic finite automaton
(Thompson's construction) - Convert that to a deterministic one (Subset
construction) - Minimize the DFA (Hopcroft's algorithm)
- Implement it
- Existing scanner generator flex
2The scanning process step 1
- Let's build a mini-scanner that recognizes
exactly those strings of as and bs that end in ab - Step 1 Come up with a Regular Expression
- (ab)ab
3The scanning process step 2
- Step 2 Use Thompson's construction to create an
NFA for that expression - We want to be able to automate the process
- Thompson's construction gives a systematic way to
create an NFA from a RE. - It builds the NFA in a bottom-up manner.
- At any time during construction
- there is only one final state
- no transitions leave the final state
- components are linked together using
?-productions.
4The scanning process step 2
- Step 2 Use Thompson's construction to create an
NFA for that expression
?
a
a
a
?
?
?
?
?
?
?
?
b
b
?
?
b
?
ab
(ab)
5The scanning process step 2
- Step 2 Use Thompson's construction to create an
NFA for that expression
?
a
?
?
a
b
?
?
?
?
?
?
b
?
(ab)ab
6The scanning process step 3
- Step 3 Use subset construction to convert the
NFA to a DFA - Observation
- Two states qi, qk, linked together with an
?-productions in the NFA should be the same state
in the DFA because the machine goes from qi to qk
without consuming input. - The ?-closure() function takes a state q and
returns all the states that can be reached from q
on ?-productions only.
7The scanning process step 3
- Step 3 Use subset construction to convert the
NFA to a DFA - Observation
- If, on some input a, the NFA can go to any one of
k states, then those k state should be
represented by a single state in the DFA. - The ?() function takes as input a state q and a
character x and returns all states that we can go
to from q when reading a single x.
8The scanning process step 3
- Step 3 Use subset construction to convert the
NFA to a DFA - The start state Qo of the DFA is the ?-closure of
the start state q0 of the NFA - Compute ?-closure(?(Q0, x)) for each valid input
character x. This will generate new states. - Systematically compute ?-closure(?(Qi, x)) until
no new states can be created. - The final states of the DFA are those that
contain final states of the NFA.
9The scanning process step 3
- Step 3 Use subset construction to convert the
NFA to a DFA
?-closure(1) 1, 2, 3, 4, 8, 9
10The scanning process step 3
Q0 1,2,3,4,8,9 ?(Q0, a) 5,7,8,9,2,3,4,10,11
Q1 ?(Q0, b) 6,7,8,9,2,3,4 Q2 ?(Q1, a)
Q1 ?(Q1, b) 6,7,8,9,2,3,4,12 Q3
?(Q2, a) Q1 ?(Q2, b) Q2 ?(Q3, a) Q1 ?(Q3,
b) Q2
11The scanning process step 3
12The scanning process step 4
- Step 4 Use Hopcroft's algorithm to minimize the
DFA
States Q0 and Q2 behave the same way, so they
can be merged. Note that even though Q3 also
behaves the same way, it cannot be merged with Q0
or Q2 because Q3 is a final state while Q0 and Q2
are not.
?(Q0, a) Q1 ?(Q0, b) Q2 ?(Q2, a) Q1 ?(Q2,
b) Q2
a
a
a
1
b
0
3
b
b
13In practice
- flex is a scanner generator that takes a RE
specification and follows the described process
to generate a DFA. - The user additionally specifies
- actions to be performed whenever a valid string
has been recognized - e.g. insert identifier in symbol table
- error messages to be generated when the input
string is invalid.
14In practice
- Errors that are typically detected during
scanning include - Unterminated strings
- Unterminated comments
- Invalid characters