Title: Regular Languages and Expressions
1RegularLanguages and Expressions
- Surinder Kumar Jain,
- University of Sydney
2Regular Languages Expressions
- Automaton
- DFA
- NFA
- ?-NFA
- CFG as a DFA
- Equivalence
- Minimal DFA
- Expressions
- Definition
- Conversion from/to Automaton
- Regular Langauges
- Pumping Lemma proving regularness
- Closures
- Equivalence
3Deterministic Finite Automaton
- A system with many states
- Can transition from one state to another
- Usually caused by external input
- Set of states is finite
- System is in one state at any given time
4DFA
- Mathematical Definition of a DFA
- A (Q, S,d, q0,F)
- Q States, DFA is in one of these finite states
at any time. - S Input symbols, DFA changes its state from
one state to another state on consuming an input
symbol. - d Transition function.
- Given a state and an input symbols, gives the
next DFA state - Function over QxS -gt Q.
- q0 Initial DFA state
- F Accepting states. Once DFA reaches one of
these states, it may not accept any more input
symbols.
5DFA Example
Q waiting, pending, rejected, approved, paid
S receive, reject, accept, pay d
(waiting -gt receive -gt pending), (pending -gt
reject -gt rejected), (pending -gt accept -gt
accepted), (accepted -gt pay -gt paid) q0
waiting F rejected, paid
6Transition Diagrams
start
receive
accept
Accepted
pay
Paid
Waiting
Pending
Paid
reject
Paid
Rejected
Q waiting, pending, rejected, approved, paid
S receive, reject, accept, pay d
(waiting -gt receive -gt pending), (pending -gt
reject -gt rejected), (pending -gt accept -gt
accepted), (accepted -gt pay -gt paid) q0
waiting F rejected, paid
7Language
- Set of alphabets
- Concatenation (joining)
- Strings
- A subset of strings is a language
- A DFA defines a language
- Alphabet set is the set of input symbols
- Concatenation - one symbol follows another
- Acceptance sequence of symbols takes DFA from
start state to one of the accepting states
8Non-deterministic Finite Automaton (DFA)
- Five-tuple like a DFA, (Q, S,d, q0,F)
- Transition function returns a set not one state
- Several outgoing arcs with same symbol
- In several states at the same time
- Language of NFA
9Equivalence of DFA NFA
- Any NFA language can be described by some DFA
- Adding non-determinism does not give any thing
more - Why use NFAs then
- Easier to make for some languages
- May have fewer states and less complex
- Algorithm to convert NFA to DFA
- For n state NFA,DFA may have up to 2n states
- Can throw away inaccessible states
- Observation DFA has practically the same number
of states as NFA though it often has more
transitions
10NFA to DFA conversion
- For an NFA, N Q, S, d, q0, F,
- Construct the DFA, D Qd, S, dd, q0, Fd
- Qd Powerset of Q
- dd(S, a) Up in S d(p,a) for every S in Qd.
- Fd S S is subset of Q and S has an accepting
state of NFA - DFA operates on one state at a time, NFA operates
on sets of states. - Given a state, NFA gives a set of new states
- Make all possible sets of DFA states as NFA
states - Transit from one set of states to a new set of
all possible state set - Any set with an accepting state is the accepting
state in NFA
11NFA to DFA conversion complexity
- O(2n) (number of subsets of a set)
- Efficient algorithm
- Do not construct the entire power set
- Start with start state
- Only construct subsets that can reach an
accepting state from the start state - The number of states in DFA is much less than 2n.
- DFA has practically the same number of states as
NFA though it often has more transitions
12epsilon - NFA
- Includes e (the empty string, not in alphabet
set) as a transition - e is identity in concatenation
- a.e e.a a for all a
- Spontaneous transition without an input
13Equivalence to NFA
- An e-NFA language can be described by some NFA
- Every NFA can be described by some DFA
- Adding e transition does not give any thing more
- Why use e-NFAs then
- Easier to make for some languages
- Useful in proving equivalence of languages
14Conversion to NFA
- Conversion aims to remove e transitions
- Define a new set of states
- e are contained inside the set
- No e arc leaves or enters the new set of states
- Epsilon closure (eclose)
- For a state, set of all states reachable
spontaneously - Follow the e arcs recursively and include
reachable states in the epsilon closure
15epsilon-NFA to DFA conversion
- For an e-NFA, N Q, S, d, q0, F,
- Construct the DFA, D Qd, S, dd, eclose(q0),
Fd - Qd eclose(q) q eclose(q) and q in Q
- dd(S, a) Up in S d(p,eclose(a)) for every S
in Qd. - Fd S S is subset of Q and S has an accepting
state of NFA - DFA operates on one state at a time, e-NFA
operates on sets of states with no e transition
leaving the set - Make all eclose sets as DFA states
- Transit from one set of states to a new set of
all eclose state set - Any set with an accepting state is the accepting
state in NFA
16Programs as Automatan
- An imperative program can be represented as a
Control Flow Graph (CFG) with - statements at nodes and
- predicates at edges
- It can be converted into a CFG with
- both statements and predicates at edges
- by pushing node statements up incoming edges
- Such a CFG is a DFA
- Program points are States
- Statements are input symbols that change program
state from program point to point
17Regular Expression
- Algebraic expression to denote languages
- Composed of symbols e, Ø, , , ., (,
) and alphabets - The language is generated using rules
- L(e) empty set
- L(Ø) empty set
- L(a) a for all alphabets a
- L(pq) L(p) U L(q)
- L(p.q) p.q p in L(p) q in L(q)
- L(p) qn q in L(p) and n gt 0 , q0 e,
qkq.qk-1
18Regular Expression Example
- ab.c
- The language generated is
- a, b.c
- a.b.c.d
- the language generated is
- a.b.d, a.b.c.d, a.b.c.c.d, a.b.c.c.c.d,
- A finite way to express an infinite language
19Equality of Languages
- DEFINITION
- Two regular expression (or automaton)
- are EQUAL
- if they both generate same languages
- Thus
- (a.b) (b.a) a.(b.a) b.(b.a)
- (e b).(a.b).(ea)
20Algebraic laws of regular expressions
- p q q p
- (p q) r p (q r)
- (p.q).r p.(q.r)
- Ø p p Ø p
- e.p p.e p
- Ø.p p.Ø Ø
- p.(qr) p.q p.r
- (p q).r p.r q.r
- p p p
- (p) p
- Ø e
- e e
- p.p p.p
- (p q) (p.q)
21Finite Automaton and Regular Expressions
- Every language
- defined by a finite automaton is also defined by
some regular expression - defined by a regular expression is also defined
by some DFA
22DFA to Regular expression
- Hopcrofts formula
- Rij(k) Rij(k-1)Rik(k-1).(Rkk(k-1)).Rkj(k-1)
- Rij(n) is the regular expression of all paths
from i to j. (n is the number of states) - States are sorted in some order and numbered 1 to
n - Rij(k) is regular expression of all paths from i
to j passing thru nodes whose sort order is less
than k - Computed for all i,j for k0, then k1,,kn
- Rs,f1(n)Rs,fk(n) is the regular expression of
the DFA - s is the start state, f1,,fk are accepting
states, n is the number of states.
23DFA to RE - complexity
- Hopcroft formula is O(n34n),
- n3 to compute the table and
- 4n as size of regular expression grows by 4 every
time. - In practice it is close to O(n3)
- By simplifying the regular expression at every
step and - using judicious algorithm avoiding recomputation
of Rkk(k) - Most DFAs have almost n and not 2n accessible
states - A faster state elimination method close to O(n2)
is also available
24RE to Automatan conversion
- Regular expression is converted to e-NFA
- e-NFA can the be converted to NFA and to DFA
- RE to e-NFA conversion rules
- e -gt One edge (two state) DFA with e
transition - Ø -gt Two state DFA with no edges
- a -gt Two state with a transition
- -gt A new start/accept statejoining two
- arguments of in parallel
- . -gt Accept of first is start of second
- -gt An e edge joining star/accept of
argument and - a new start/accept state
- Convert resulting e-NFA to a DFA
25Direct conversion
- Augment regular expression r to (r).
- Position number for each occurrence of alphabet
- Compute for each node of syntax tree
- nullable (e in the language)
- firstpos (set of possible first alphabets)
- lastpos (set of possible last alphabets)
- Compute for each position
- followpos (set of possible next alphabet after
this position) - Construct the DFA
26Applications
- Unix text search, search matching patterns (grep)
- Lexical/Parser analysis
- Parse text against a regular expression
- find set of first tokens at this expression root
- find set of last tkens at this expression root
- can the expression at this root be null set
- find set of next tokens after an alphabet
position in a regular expression - Efficient search of patterns in very large
repository (web text search)
27Regular Language
- DEFINITION
- A language (a set of strings)
- is defined to be a regular language if
- it can be defined by a finite automaton
- by a DFA or
- by an NFA or
- by an e-NFA or
- by a regular expression
- Four different ways to describe a regular
language
28Pumping Lemma
- If L is a regular language then there exists
- integer n such that
- for every string w in L
- we can break w into x, y, z such that wx.y.z
- y ? e
- x.y lt n
- x.yk.z is in L (for all k gt 0)
- Proof based on
- For a DFA of length n
- any string of length gt n
- must revisit a state
- Used to prove that a language is not regular
29Closure property
- Language is a set of string over finite alphabets
- Language operators
- Union of two languages L(A ? B) L(A) ? L(B) -
re - Intersection
- Concatenation L(A.B) a.b a in A, b in B
- Kleene Closure L(A) an a in A, n gt 0
- a0 e for all a and an an-1
- Compliment L(A) a a not in A (with
respect to some overall alphabet set) - dfa - Difference L(A-B) L(A) L(B) - dfa switch q0
F - Reversal L (A) ak.ak-1a1 a1ak-1.ak in A
- Homomorphism replace an alphabet with another
regular expression - Inverse homomorphism
30Decision properties
- Is the language described empty?
- Is a particualr string in the described language?
- Do two different of languages actually describe
the same language?
31Conversions
- Decision properties may require conversion
between various forms. - Can the conversion be done in reasonable time?
Conversion Complexity
Computing e closures O(n3) Warshalls O(n)
Subset construction O(2n)
NFA to DFA O(n32n) (In practice O(n3s)
DFA to NFA conversion O(n)
NFA/DFA to Regular Expression O(n34n) (worst case) (Actual is much less)
Regular Expression to eNFA O(n)
Regular Expression to NFA O(n3)
Regular Expression to DFA O(n34n32n)
32Equivalence of automata
- Equivalence of two states
- States p and q in an automaton are Defined to be
equivalent if - For all input strings applied at state p or q
- p ends up in an accepting state
- if and only if
- q also ends up in an accepting state
- The accepting state reached by p does not have to
be same accepting state as that reached by q
33Minimization of DFA
- If two states p and q are equivalent
- we can combine them together into a single state
- it wont affect the language accepted by the DFA
- This process of combining states together is
called Minimization - Table-filling algorithm can find if two states
are equivalent or not. Complexity O(n2) - Non-equivalent pairs are distinguishable
34MinimuM DFA
- Minimum DFA is unique
- Eliminate all states not reachable from start
- Determine which states are equivalent
- Partition states into blocks of equivalent states
- Equivalence is transitive
- Thus no state is in two blocks
- Equivalence of two Regular Languages
- Convert them into their minimum DFAs
- and check for isomorphism
- Union method
- Make a minimum DFA of the union of the two
- Start state of the two original DFAs must be
equivalent if and only if DFAs are equivalent