Title: Regular Languages and Finite Automata
1Regular Languages and Finite Automata
- CS311
- Western Washington University
2Regular Languages
Before we talk about regular languages, lets
give them some context. Have you ever faced the
challenge of validating formatted input?
Hopefully YES! The Recognition Problem Write an
algorithm to recognize input strings that have a
certain property e.g. complex numbers. We can
solve this class of problems by
identifying regular languages and designing
finite automata that accept these languages.
3Language Basics
A Language L over a finite alphabet A is a set
of strings of letters from A. We can also say
that L is a subset of A. A is the set of all
strings over A. Sothe recognition problem can
be stated as If we are given a language L and a
string w, can we tell if w is in L? Well, we
need to be able to describe L. We can construct
languages from the letters of the alphabet by
using the language operations of union,
concatenation, and closure. There languages
are called regular languages.
4Definition
The collection of regular languages over
alphabet A can be defined recursively as Basis
?, ?, and a are regular languages ?a ?A,
where ? denotes the empty string Induction If L
and M are regular languages, then the following
languages are also regular languages L ?
M L ? M L Consider the alphabet A
a,b The basis gives us the following regular
languages ?, ?, a, b Which languages
can we construct using the induction step?
5Regular Expressions
We can also describe a regular language in
terms of an algebraic expression called a regular
expression. Regular expressions over an
alphabet A are also defined inductively Basis
?, ?, and a are regular expressions ?a ?A,
where ? denotes the empty string Induction If R
and S are regular expressions, then the
following expressions are also regular expression
s (R) R S R ? S R Regular
expressions support the following precedence
hierarchy highest ? lowest
6We can construct infinitely many regular
expressions over the alphabet A a, b such
as? ?, ?, a, b, a b, a, (a b), ... We
can represent concatenation, ?, using
juxtaposition of letters. So we can write a ? b
as ab. The reason we mention regular expressions
is to describe regular languages. So for each
regular expression E, we can associate a regular
language L(E) according to the following
rules L(?) ? L(?) ? L(a) a
?a ?A L(RS) L(R) ? L(S) L(R? S) L(R) ?
L(S) L(R) L(R)
7Practice
Given a regular expression, identify the
language a bc ab bc What about the other
way aroundgiven a language, describe it as a
regular expression. This problem can be simple
for finite languages such as a, b, c, which we
can represent as a b c, but not so simple
for infinite languages. Infinite languages can
be either regular or nonregular. Heres an
infinite regular language a, aa, aaa, , an,
Does every regular expression describe a
distinct regular language? Heck no!
8Properties of Regular Expressions
Consider the languages described by a b and b
a they both represent the language a, b. We
can express this equality as L(ab) L(ba)
a,b For regular expression R and S, we can say
that R S if L(R) L(S) Summary of Properties
of Regular Expressions 1) properties R T
T R R ? ? R R R R R (R
S) T R (S T) 2) ? properties R? ?R
? R? ?R R (RS)T R(ST) 3)
Distributive R(S T) RS RT (S T)R SR
TR
9Closure Properties 4) ? ? ? 5) R RR
(R) R R R ? R (? R) (?
R)R 6) RR RR 7) (R S) (R S)
(RS) 8) R(SR) (RS)R 9) (RS) ? (R
S)S (RS) ? R(R S)
10Finite Automata
Recall that we have the goal of solving the
string recognition problem. Were getting
closer now that we have a formal algebraic
representation for languages. Now we want some
mechanical approach for determining if a string
belongs to a language. We can create a
computing machine called a finite automata to
help. There are 2 types of finite automata 1)
Deterministic Finite Automata (DFA) 2)
Nondeterministic Finite Automata (NFA)
11DFA
A DFA over a finite alphabet A is a finite
directed graph with the property that each node
emits one labeled edge for each distinct element
in A. The nodes are called states. There is
exactly one special state called the start or
initial state, and at least one state called the
final state. Create a DFA over the alphabet A
a,b. How many DFAs can you create for this
alphabet?
1
0
Start
3
2
12So now we know what a DFA looks like, what does
it mean? A DFA accepts a string w in A if there
is a path from the start state to some final
state such that w is the concatenation of the
labels on the edges of the path. Otherwise the
DFA rejects w. The set of all strings accepted
by a DFA M is called the language of M and is
expressed L(M). Which strings are accepted by
our sample DFA? Is this set finite or
infinite? How do these DFAs relate to regular
languages? Actuallythe class of regular
languages is exactly the same as the class of
languages accepted by DFAs. There is an
algorithm for transforming any RE into a DFA and
vice versa.
13Without knowing the algorithm, can we construct
some DFAs for some REs? Sure! Create a DFA to
recognize L((ab)) over the alphabet A
a,b What about L(a b)? L(a b)?
14NFA
The NFA is less restrictive than a DFA. It
differs from a DFA in that each node can have
zero or more edges, and each edge is labeled with
either a letter from A or ?. Repetitions are
also allowed on edges from the same
nodetherefore nondeterminism can occur. If an
edge is labeled ?, then we can travel that edge
without consuming an input letter. Acceptance
and rejection of a string is the same as for
DFAs. NFAs can be easier to construct because
they dont need an edge out of each node for
each letter in the alphabet.
15How do NFAs relate to regular languages? Actually
the class of regular languages is exactly the
same as the class of languages accepted
by NFAs. So we have quite a few ways of
representing the regular languages Regular
expressions DFAs NFAs Is every DFA a NFA? Is
every NFA a DFA? Consider the regular expression
aa.construct both a DFA and NFA to represent
it.
16Transforming REs into FAs
As promised, there is an algorithm for
transforming an RE into a FA. Start the
algorithm with a machine having a start state, a
single final state, and an edge labeled with the
given RE.
- Apply the following rules until all edges are
- labeled with either a letter or ?
- If an edge is labeled with ?, erase it.
- Transform any diagram like
R S
i
j
17into the diagram
3) Transform any diagram like
into the diagram
4) Transform any diagram like
into the diagram
R
?
?
?
i
j
?
18Practice
Now that we know the algorithm, it might make it
a bit easier to construct FAs from REs. Try a
ab. Then ab bc.
19Transforming FAs into REs
- And now an algorithm for transforming FAs into
- REs.
- Assume that we have either a DFA or an NFA.
- Create a new start state s, and draw a new edge
- labeled with ? from s to the original start
state. - Create a new final state f, and draw new edges
- labeled with ? from all the original final
states - to f.
- For each pair of states i and j that have more
- than 1 edge from i to j, replace all edges by a
- single edge labeled with the regular expression
- formed by the sum of the labels on each of the
- edges from i to j.
- Construct a sequence of new machines by
- eliminating one state at a time until the only
- states remaining are s and f. As each state is
- eliminated, construct a new machine from the
- previous machine as follows
20Eliminate State k. Let old(i,j) denote the label
on edgelti,jgt of the current machine. If there is
no edgelti,jgt, then set old(i,j) ?. Then. for
each pair of edgeslti,kgt and ltk,jgt, where i ? k
and j ? k, calculate a new edge label, new(i,j)
as follows new(i,j) old(i,j)
old(i,k)old(k,k)old(k,j) for all other edges
lti,jgt where i ? k and j ? k, set new(i,j)
old(i,j) The states of the new machine are those
of the current machine with state k eliminated.
The edges of the new machine are those edges
lti,jgt for which label new(i,j) has been
calculated. After all states except s and f have
been eliminated, we wind up with a single edge
lts,fgt labeled with the regular expression
representing the regular language accepted by
the original FA.
21Practice the Algorithm
Use the FA to RE algorithm to convert the
following FA into an RE.
a
a,b
Start
0
1
b
2
a,b
22Regular Grammars
We can also describe a regular language in
terms of a regular grammar. A regular grammar is
a grammar where each production takes one of the
following two forms S ? w , where w is a string
of terminals or ? S ? wT , where T is a
nonterminal We can construct many different
grammars to describe the same languagejust as
we could construct many regular expressions to
describe a single language.
23Examples
Regular Expression Regular Grammar a S ? ?
aS (a b) S ? ? aS bS a b S ? ? A
B A ? aA B ? bB ab S ? b
aS ba S ? b A A ? ? aA abc S ?
aS bC C ? ? cC ab ??????
24NFA to Regular Grammar
- We can construct a regular grammar from an NFA
- by performing the following steps
- Rename the states to a set of capital letters.
- The start symbol is the NFAs start state
- For each transition from I to J labeled with a,
- create the production I ? aJ.
- For each transition from I to J labeled with ?,
- create the production I ? J.
- For each final state K, create the production
- K ? ?