Regular Languages and Expressions - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Regular Languages and Expressions

Description:

Surinder Kumar Jain, University of Sydney – PowerPoint PPT presentation

Number of Views:115

Avg rating:3.0/5.0

Slides: 35

Provided by: edua2239

Category:

more less

Transcript and Presenter's Notes

Title: Regular Languages and Expressions

1
RegularLanguages and Expressions

Surinder Kumar Jain,
University of Sydney

2
Regular Languages Expressions

Automaton
DFA
NFA
?-NFA
CFG as a DFA
Equivalence
Minimal DFA
Expressions
Definition
Conversion from/to Automaton
Regular Langauges
Pumping Lemma proving regularness
Closures
Equivalence

3
Deterministic Finite Automaton

A system with many states
Can transition from one state to another
Usually caused by external input
Set of states is finite
System is in one state at any given time

4
DFA

Mathematical Definition of a DFA
A (Q, S,d, q0,F)
Q States, DFA is in one of these finite states
at any time.
S Input symbols, DFA changes its state from
one state to another state on consuming an input
symbol.
d Transition function.
Given a state and an input symbols, gives the
next DFA state
Function over QxS -gt Q.
q0 Initial DFA state
F Accepting states. Once DFA reaches one of
these states, it may not accept any more input
symbols.

5
DFA Example
Q waiting, pending, rejected, approved, paid
S receive, reject, accept, pay d
(waiting -gt receive -gt pending), (pending -gt
reject -gt rejected), (pending -gt accept -gt
accepted), (accepted -gt pay -gt paid) q0
waiting F rejected, paid
6
Transition Diagrams
start
receive
accept
Accepted
pay
Paid
Waiting
Pending
Paid
reject
Paid
Rejected
Q waiting, pending, rejected, approved, paid
S receive, reject, accept, pay d
(waiting -gt receive -gt pending), (pending -gt
reject -gt rejected), (pending -gt accept -gt
accepted), (accepted -gt pay -gt paid) q0
waiting F rejected, paid
7
Language

Set of alphabets
Concatenation (joining)
Strings
A subset of strings is a language
A DFA defines a language
Alphabet set is the set of input symbols
Concatenation - one symbol follows another
Acceptance sequence of symbols takes DFA from
start state to one of the accepting states

8
Non-deterministic Finite Automaton (DFA)

Five-tuple like a DFA, (Q, S,d, q0,F)
Transition function returns a set not one state
Several outgoing arcs with same symbol
In several states at the same time
Language of NFA

9
Equivalence of DFA NFA

Any NFA language can be described by some DFA
Adding non-determinism does not give any thing
more
Why use NFAs then
Easier to make for some languages
May have fewer states and less complex
Algorithm to convert NFA to DFA
For n state NFA,DFA may have up to 2n states
Can throw away inaccessible states
Observation DFA has practically the same number
of states as NFA though it often has more
transitions

10
NFA to DFA conversion

For an NFA, N Q, S, d, q0, F,
Construct the DFA, D Qd, S, dd, q0, Fd
Qd Powerset of Q
dd(S, a) Up in S d(p,a) for every S in Qd.
Fd S S is subset of Q and S has an accepting
state of NFA
DFA operates on one state at a time, NFA operates
on sets of states.
Given a state, NFA gives a set of new states
Make all possible sets of DFA states as NFA
states
Transit from one set of states to a new set of
all possible state set
Any set with an accepting state is the accepting
state in NFA

11
NFA to DFA conversion complexity

O(2n) (number of subsets of a set)
Efficient algorithm
Do not construct the entire power set
Start with start state
Only construct subsets that can reach an
accepting state from the start state
The number of states in DFA is much less than 2n.
DFA has practically the same number of states as
NFA though it often has more transitions

12
epsilon - NFA

Includes e (the empty string, not in alphabet
set) as a transition
e is identity in concatenation
a.e e.a a for all a
Spontaneous transition without an input

13
Equivalence to NFA

An e-NFA language can be described by some NFA
Every NFA can be described by some DFA
Adding e transition does not give any thing more
Why use e-NFAs then
Easier to make for some languages
Useful in proving equivalence of languages

14
Conversion to NFA

Conversion aims to remove e transitions
Define a new set of states
e are contained inside the set
No e arc leaves or enters the new set of states
Epsilon closure (eclose)
For a state, set of all states reachable
spontaneously
Follow the e arcs recursively and include
reachable states in the epsilon closure

15
epsilon-NFA to DFA conversion

For an e-NFA, N Q, S, d, q0, F,
Construct the DFA, D Qd, S, dd, eclose(q0),
Fd
Qd eclose(q) q eclose(q) and q in Q
dd(S, a) Up in S d(p,eclose(a)) for every S
in Qd.
Fd S S is subset of Q and S has an accepting
state of NFA
DFA operates on one state at a time, e-NFA
operates on sets of states with no e transition
leaving the set
Make all eclose sets as DFA states
Transit from one set of states to a new set of
all eclose state set
Any set with an accepting state is the accepting
state in NFA

16
Programs as Automatan

An imperative program can be represented as a
Control Flow Graph (CFG) with
statements at nodes and
predicates at edges
It can be converted into a CFG with
both statements and predicates at edges
by pushing node statements up incoming edges
Such a CFG is a DFA
Program points are States
Statements are input symbols that change program
state from program point to point

17
Regular Expression

Algebraic expression to denote languages
Composed of symbols e, Ø, , , ., (,
) and alphabets
The language is generated using rules
L(e) empty set
L(Ø) empty set
L(a) a for all alphabets a
L(pq) L(p) U L(q)
L(p.q) p.q p in L(p) q in L(q)
L(p) qn q in L(p) and n gt 0 , q0 e,
qkq.qk-1

18
Regular Expression Example

ab.c
The language generated is
a, b.c
a.b.c.d
the language generated is
a.b.d, a.b.c.d, a.b.c.c.d, a.b.c.c.c.d,
A finite way to express an infinite language

19
Equality of Languages

DEFINITION
Two regular expression (or automaton)
are EQUAL
if they both generate same languages
Thus
(a.b) (b.a) a.(b.a) b.(b.a)
(e b).(a.b).(ea)

20
Algebraic laws of regular expressions

p q q p
(p q) r p (q r)
(p.q).r p.(q.r)
Ø p p Ø p
e.p p.e p
Ø.p p.Ø Ø
p.(qr) p.q p.r
(p q).r p.r q.r
p p p
(p) p
Ø e
e e
p.p p.p
(p q) (p.q)

21
Finite Automaton and Regular Expressions

Every language
defined by a finite automaton is also defined by
some regular expression
defined by a regular expression is also defined
by some DFA

22
DFA to Regular expression

Hopcrofts formula
Rij(k) Rij(k-1)Rik(k-1).(Rkk(k-1)).Rkj(k-1)
Rij(n) is the regular expression of all paths
from i to j. (n is the number of states)
States are sorted in some order and numbered 1 to
n
Rij(k) is regular expression of all paths from i
to j passing thru nodes whose sort order is less
than k
Computed for all i,j for k0, then k1,,kn
Rs,f1(n)Rs,fk(n) is the regular expression of
the DFA
s is the start state, f1,,fk are accepting
states, n is the number of states.

23
DFA to RE - complexity

Hopcroft formula is O(n34n),
n3 to compute the table and
4n as size of regular expression grows by 4 every
time.
In practice it is close to O(n3)
By simplifying the regular expression at every
step and
using judicious algorithm avoiding recomputation
of Rkk(k)
Most DFAs have almost n and not 2n accessible
states
A faster state elimination method close to O(n2)
is also available

24
RE to Automatan conversion

Regular expression is converted to e-NFA
e-NFA can the be converted to NFA and to DFA
RE to e-NFA conversion rules
e -gt One edge (two state) DFA with e
transition
Ø -gt Two state DFA with no edges
a -gt Two state with a transition
-gt A new start/accept statejoining two
arguments of in parallel
. -gt Accept of first is start of second
-gt An e edge joining star/accept of
argument and
a new start/accept state
Convert resulting e-NFA to a DFA

25
Direct conversion

Augment regular expression r to (r).
Position number for each occurrence of alphabet
Compute for each node of syntax tree
nullable (e in the language)
firstpos (set of possible first alphabets)
lastpos (set of possible last alphabets)
Compute for each position
followpos (set of possible next alphabet after
this position)
Construct the DFA

26
Applications

Unix text search, search matching patterns (grep)
Lexical/Parser analysis
Parse text against a regular expression
find set of first tokens at this expression root
find set of last tkens at this expression root
can the expression at this root be null set
find set of next tokens after an alphabet
position in a regular expression
Efficient search of patterns in very large
repository (web text search)

27
Regular Language

DEFINITION
A language (a set of strings)
is defined to be a regular language if
it can be defined by a finite automaton
by a DFA or
by an NFA or
by an e-NFA or
by a regular expression
Four different ways to describe a regular
language

28
Pumping Lemma

If L is a regular language then there exists
integer n such that
for every string w in L
we can break w into x, y, z such that wx.y.z
y ? e
x.y lt n
x.yk.z is in L (for all k gt 0)
Proof based on
For a DFA of length n
any string of length gt n
must revisit a state
Used to prove that a language is not regular

29
Closure property

Language is a set of string over finite alphabets
Language operators
Union of two languages L(A ? B) L(A) ? L(B) -
re
Intersection
Concatenation L(A.B) a.b a in A, b in B
Kleene Closure L(A) an a in A, n gt 0
a0 e for all a and an an-1
Compliment L(A) a a not in A (with
respect to some overall alphabet set) - dfa
Difference L(A-B) L(A) L(B) - dfa switch q0
F
Reversal L (A) ak.ak-1a1 a1ak-1.ak in A
Homomorphism replace an alphabet with another
regular expression
Inverse homomorphism

30
Decision properties

Is the language described empty?
Is a particualr string in the described language?
Do two different of languages actually describe
the same language?

31
Conversions

Decision properties may require conversion
between various forms.
Can the conversion be done in reasonable time?

Conversion Complexity
Computing e closures O(n3) Warshalls O(n)
Subset construction O(2n)
NFA to DFA O(n32n) (In practice O(n3s)
DFA to NFA conversion O(n)
NFA/DFA to Regular Expression O(n34n) (worst case) (Actual is much less)
Regular Expression to eNFA O(n)
Regular Expression to NFA O(n3)
Regular Expression to DFA O(n34n32n)
32
Equivalence of automata

Equivalence of two states
States p and q in an automaton are Defined to be
equivalent if
For all input strings applied at state p or q
p ends up in an accepting state
if and only if
q also ends up in an accepting state
The accepting state reached by p does not have to
be same accepting state as that reached by q

33
Minimization of DFA

If two states p and q are equivalent
we can combine them together into a single state
it wont affect the language accepted by the DFA
This process of combining states together is
called Minimization
Table-filling algorithm can find if two states
are equivalent or not. Complexity O(n2)
Non-equivalent pairs are distinguishable

34
MinimuM DFA

Minimum DFA is unique
Eliminate all states not reachable from start
Determine which states are equivalent
Partition states into blocks of equivalent states
Equivalence is transitive
Thus no state is in two blocks
Equivalence of two Regular Languages
Convert them into their minimum DFAs
and check for isomorphism
Union method
Make a minimum DFA of the union of the two
Start state of the two original DFAs must be
equivalent if and only if DFAs are equivalent