The scanning process - PowerPoint PPT Presentation

About This Presentation
Title:

The scanning process

Description:

The scanning process Goal: automate the process Idea: Start with an RE Build a DFA How? We can build a non-deterministic finite automaton (Thompson's construction) – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 15
Provided by: vdo66
Category:

less

Transcript and Presenter's Notes

Title: The scanning process


1
The scanning process
  • Goal automate the process
  • Idea
  • Start with an RE
  • Build a DFA
  • How?
  • We can build a non-deterministic finite automaton
    (Thompson's construction)
  • Convert that to a deterministic one (Subset
    construction)
  • Minimize the DFA (Hopcroft's algorithm)
  • Implement it
  • Existing scanner generator flex

2
The scanning process step 1
  • Let's build a mini-scanner that recognizes
    exactly those strings of as and bs that end in ab
  • Step 1 Come up with a Regular Expression
  • (ab)ab

3
The scanning process step 2
  • Step 2 Use Thompson's construction to create an
    NFA for that expression
  • We want to be able to automate the process
  • Thompson's construction gives a systematic way to
    create an NFA from a RE.
  • It builds the NFA in a bottom-up manner.
  • At any time during construction
  • there is only one final state
  • no transitions leave the final state
  • components are linked together using
    ?-productions.

4
The scanning process step 2
  • Step 2 Use Thompson's construction to create an
    NFA for that expression

?
a
a
a
?
?
?
?
?
?
?
?
b
b
?
?
b
?
ab
(ab)
5
The scanning process step 2
  • Step 2 Use Thompson's construction to create an
    NFA for that expression

?
a
?
?
a
b
?
?
?
?
?
?
b
?
(ab)ab
6
The scanning process step 3
  • Step 3 Use subset construction to convert the
    NFA to a DFA
  • Observation
  • Two states qi, qk, linked together with an
    ?-productions in the NFA should be the same state
    in the DFA because the machine goes from qi to qk
    without consuming input.
  • The ?-closure() function takes a state q and
    returns all the states that can be reached from q
    on ?-productions only.

7
The scanning process step 3
  • Step 3 Use subset construction to convert the
    NFA to a DFA
  • Observation
  • If, on some input a, the NFA can go to any one of
    k states, then those k state should be
    represented by a single state in the DFA.
  • The ?() function takes as input a state q and a
    character x and returns all states that we can go
    to from q when reading a single x.

8
The scanning process step 3
  • Step 3 Use subset construction to convert the
    NFA to a DFA
  • The start state Qo of the DFA is the ?-closure of
    the start state q0 of the NFA
  • Compute ?-closure(?(Q0, x)) for each valid input
    character x. This will generate new states.
  • Systematically compute ?-closure(?(Qi, x)) until
    no new states can be created.
  • The final states of the DFA are those that
    contain final states of the NFA.

9
The scanning process step 3
  • Step 3 Use subset construction to convert the
    NFA to a DFA

?-closure(1) 1, 2, 3, 4, 8, 9
10
The scanning process step 3
Q0 1,2,3,4,8,9 ?(Q0, a) 5,7,8,9,2,3,4,10,11
Q1 ?(Q0, b) 6,7,8,9,2,3,4 Q2 ?(Q1, a)
Q1 ?(Q1, b) 6,7,8,9,2,3,4,12 Q3
?(Q2, a) Q1 ?(Q2, b) Q2 ?(Q3, a) Q1 ?(Q3,
b) Q2
11
The scanning process step 3
12
The scanning process step 4
  • Step 4 Use Hopcroft's algorithm to minimize the
    DFA

States Q0 and Q2 behave the same way, so they
can be merged. Note that even though Q3 also
behaves the same way, it cannot be merged with Q0
or Q2 because Q3 is a final state while Q0 and Q2
are not.
?(Q0, a) Q1 ?(Q0, b) Q2 ?(Q2, a) Q1 ?(Q2,
b) Q2
a
a
a
1
b
0
3
b
b
13
In practice
  • flex is a scanner generator that takes a RE
    specification and follows the described process
    to generate a DFA.
  • The user additionally specifies
  • actions to be performed whenever a valid string
    has been recognized
  • e.g. insert identifier in symbol table
  • error messages to be generated when the input
    string is invalid.

14
In practice
  • Errors that are typically detected during
    scanning include
  • Unterminated strings
  • Unterminated comments
  • Invalid characters
Write a Comment
User Comments (0)
About PowerShow.com