Title: Regular Expressions Finite State Automaton
1Regular ExpressionsFinite State Automaton
2Regular expressions
- Terminology on Formal languages
- alphabet a finite set of symbols
- string a finite sequence of alphabet symbols
- language a (finite or infinite) set of strings.
- Regular Operations on languages
- Union R ? S x x ? R or x ? S
- Concatenation RS xy x ? R and y ? S
- Kleene closure R R concatenated with itself 0
or more times - ? ? R ? RR ?
RRR ? - strings obtained
by concatenating a finite - number of
strings from the set R.
3Regular Expressions
- A pattern notation for describing certain kinds
of sets over strings - Given an alphabet ?
- ? is a regular exp. (denotes the language ?)
- for each a ? ?, a is a regular exp. (denotes the
language a) - if r and s are regular exps. denoting L(r) and
L(s) respectively, then so are - (r) (s) ( denotes the language L(r) ? L(s) )
- (r)(s) ( denotes the language L(r)L(s) )
- (r) ( denotes the language L(r) )
4Common Extensions to r.e. Notation
- One or more repetitions of r r
- A range of characters a-zA-Z, 0-9
- An optional expression r?
- Any single character .
- Giving names to regular expressions, e.g.
- letter a-zA-Z_
- digit 0 1 2 3 4 5 6 7 8 9
- ident letter ( letter digit )
- Integer_const digit
5Examples of Regular Expressions
- Identifiers
- Letter ? (abc zABC Z)
- Digit ? (012 9)
- Identifier ? Letter ( Letter Digit )
- Numbers
- 0-9 0-9 0-9
- 1-90-9 (1-90-9)0
- -?0-9
- 0-9\.0-9 (0-9)(0-9\.0-9)
- eE-?0-9 (0-9\.0-9)(eE-?0-9)
? - -?( (0-9) (0-9\.0-9)(eE-?0-9)?
)
6Examples of Regular Expressions
- Numbers
- Integer ? (-?) (0 (123 9)(Digit ) )
- Decimal ? Integer . Digit
- Real ? ( Integer Decimal ) E (-?)
Digit - Complex ? ( Real , Real )
7Exercise of Regular Expressions
- ???
- a-z a-zA-Z a-zA-Z0-9
- a-zA-Za-zA-Z0-9
- ???
- "this is a string"
- \".\" lt- wrong!!! why?
- \""\"
- ??? ??
- 0? 1? ???? ??? ???...
- 0?? ???? ??? 001
- 0?? ???? 0?? ??? ??? 0010
- 0? 1? ??? ??? ??? ______
- 0? ?? ?? ??? ?? ??? ______
8Recognizing Tokens Finite Automata
- A finite automaton is a 5-tuple (Q, ?, T, q0, F),
where - ? is a finite alphabet
- Q is a finite set of states
- T Q ? ? ? Q is the transition function
- q0 ? Q is the initial state and
- F ? Q is a set of final states.
9Finite Automata An Example
- A (deterministic) finite automaton (DFA) to match
C-style comments
10Example 2
- Consider the problem of recognizing register
names - Register ? r (012 9) (012 9)
- Allows registers of arbitrary number
- Requires at least one digit
- RE corresponds to a recognizer (or DFA)
- Transitions on other inputs go to an
error state, se
11Example 2 (continued)
- DFA operation
- Start in state S0 take transitions on each
input character - DFA accepts a word x iff x leaves it in a final
state (S2 ) - So,
- r17 takes it through s0, s1, s2 and accepts
- r takes it through s0, s1 and fails
- a takes it straight to se
12Example 2 (continued)
- To be useful, recognizer must turn into code
All others
0,1,2,3,4,5,6,7,8,9
r
?
se
se
s1
s0
se
s2
se
s1
se
s2
se
s2
se
se
se
se
Table encoding RE
13What if we need a tighter specification?
- r Digit Digit allows arbitrary numbers
- Accepts r00000
- Accepts r99999
- What if we want to limit it to r0 through r31 ?
- Write a tighter regular expression
- Register ? r ( (012) (Digit ?)
(456789) (33031) ) - Register ? r0r1r2 r31r00r01r02 r09
- Produces a more complex DFA
- Has more states
- Same cost per transition
- Same basic implementation
14Tighter register specification (continued)
- The DFA for
- Register ? r ( (012) (Digit ?)
(456789) (33031) ) - Accepts a more constrained set of registers
- Same set of actions, more states
15Tighter register specification (continued)
All others
4-9
3
2
0,1
r
?
se
se
se
se
se
S1
s0
se
s4
s5
s2
s2
se
s1
se
s3
s3
s3
s3
se
s2
se
se
se
se
se
se
s3
se
se
se
se
se
se
s4
se
se
se
se
s6
se
s5
se
se
se
se
se
se
s6
se
se
se
se
se
se
se
Table encoding RE for the tighter register
specification
16Automating Scanner Construction
- RE? NFA (Thompsons construction)
- Build an NFA for each term
- Combine them with e-moves
- NFA ? DFA (subset construction)
- Build the simulation
- DFA ? Minimal DFA
- Hopcrofts algorithm
- DFA ?RE (Not part of the scanner construction)
- All pairs, all paths problem
- Take the union of all paths from s0 to an
accepting state