Title: Closure Properties of Regular Languages
1Closure Properties of Regular Languages
- Regular languages are closed under many set
operations. Let L1 and L2 be regular - languages.
- (1) L1 ? L2 (the union) is regular.
- (2) L1L2 (the concatenation) is
regular. - (3) L1? (the Kleene star) and L1
(the Kleene plus) are regular. - (4) L1R (the reversed language)
is regular. - (5) L1 (the complement) is
regular. - (6) L1? L2 (the intersection) is regular.
- (7) L1 - L2 (the set subtraction) is
regular.
2Proof of the Closure Properties
- We can either use regular grammars, FA, or
regular expressions for the simplicity of the
proof. - Let r1 and r2 be regular expressions that,
respectively, express the languages L1 and L2 . - Clearly, r1 r2 is a regular expression which
denotes the union of two languages L1 and L2,
respectively, denoted by r1 and r2. Since every
regular expression denotes a regular language,
L1 ? L2 is regular. -
- We can also constructively prove this
property as follows let G1 ( VN1, VT1, P1, S1
) and G2 ( VN2, VT2 , P2, S2 ) be regular
grammars that generate L1 and L2 , respectively.
Without loss of generality, assume that VN1 and
VN2 are disjoint, i.e., VN1 ? VN2 ?.
Otherwise, we can always convert the given
grammars to the ones that satisfy such property.
Construct a regular grammar G with production
rules S ? S1 S2 and all the rules in P1
and P2. Clearly, L(G) L1 ? L2 . - (2) Clearly r1r2 is a regular expression which
denotes the language L1L2, which means L1L2 is
regular. - (3) Let r1 be regular expression for L1. Clearly,
(r1) is regular expression for L1? Since L1
L1? - ?, by property (7) that will be
proved, L1 is regular.
3Proof of the closure Properties (conted)
(4) Suppose that the following FSA M1 accepts L1.
We modify M1 as shown below. Clearly,
the resulting automaton recognizes the reversed
language of L1.
4Proof of the Closure Properties (conted)
- You can also prove part (4) using a regular
grammar G1 using another form of regular
grammars, where the production rules are
restricted to the form either A ? Bx, or A ?
x, where A and B are arbitrary nonterminal
symbols, and x is a string of terminal symbol, or
?. (Recall that we chose to restrict to A ? xB,
or A ? x.) If we reverse the right side of
each production rule, then the resulting grammar
G generates L1R . - (5) As for part (4), we modify the finite
transition graph M1 of an automaton that
recognizes L1 as follows. - Add the dead state, if it is not shown in the
transition graph. (Recall that we usually do not
show the dead state for convenience.) - Change accepting states to non-accepting states
and non-accepting states to accepting states. - (6) Since L1 ? L2 L1 ? L2 L1 ? L2 , and
regular language are closed under union and
complementation (properties (1) and (5) above),
L1 ? L2 is regular. - (7) Since L1 - L2 L1 ? L2 , it is regular
by properties (5) and (6)above.
5Properties of Context-free Languages
- Let L1 and L2 be CFLs.
- (1) L1 ? L2 (the union) is CFL.
- (2) L1L2 (the concatenation) is CFL.
- (3) L1 (the Kleene star) and L1 (the Kleene
plus) are CFL. - (4) L1R (the reversed language) is CFL.
- (5) L1 ? L2 (the intersection) is not
necessarily CFL. - (6) L1 (the complement) is not necessarily CFL.
6Proof of the Context-free Language Properties
- Let G1 ( VN1, VT1, P1, S1 ) and G2 (
VN2, VT2 , P2, S2 ) be CF grammars that - generate L1 and L2 , respectively. Without
loss of generality, assume that VN1 and - VN2 are disjoint, i.e., VN1 ? VN2 ?.
(Otherwise, we can modify them.) - (1) Construct a CFG G by merging the rules of
grammars G1 and G2 and adding new rules S ? S1
S2. (This is the same technique for regular
languages.) - (2) Construct a CFG G by merging the rules of G1
and G2 and adding a new rule - S ? S1S2.
- (3) For L1 add rules S ? S1S ? in grammar
G1. For L1 add rules S ? S1S S1 - ,where S is new start symbol.
- (4) Construct a CFG from G1 by changing each rule
A ? ? to A ? ?R, i. e., reverse right side of
each production rule.
7Proof of the Context-free Language Properties
(conted)
- (5) We know that L1 a ib ic j ? i, j ? 0
and L2 a k b nc n ? k, n ? 0 - are CFLs. But L1 ? L2 a i b i c i ?
i ? 0 is not CFL. - (6) Suppose that CFLs are closed under
complementation. Since CFLs are - closed under union (property (1)), and L1 ?
L2 L1 ? L2 , which implies - CFLs are closed under intersection. This
contradicts to the proven fact of - property (5).
8Minimizing the Number of ?-Production Rules
- Theorem. Given an arbitrary CFG G, we can
construct a CFG G such that - L(G) L(G) and if ? is not in L(G), then G
dose not have ?- production rule. - If ? ? L(G), then S ? ? is the only ?-production
rule of G. - Proof (an algorithm). Let G (VT, VN, P, S), and
let A, B ? VN. We construct a CFG G (VT ,VN
,P, S) from G by the following steps. - (1) Find the set W of all nonterminals of G which
derive ? as follows - W0 A A ? VN and A ? ? is in P
- Do Wi1 Wi ?A A ? VN and A ? ? is in
P, for some ? ? Wi - until (Wi1 Wi)
- W Wi //W contains all nonterminal
symbols from which ? can be derived. - (2) Delete all ?-production from P. Call this new
set of productions P1. - (3) Modify P1 to P as follows If a production A
? ? is in P1, then put the rules A ? ? and A ?
? into P, for all ? (? ? ) which are obtained
from ? by deleting one or more nonterminals in
the set W constructed by step (1). - (4) If S is in W, then add S ? ? in P.
9Minimizing the Number of ?-Production Rules
(example)
Convert the following CFG G to another CFG G
such that L(G) L(G) and G has the smallest
possible number of ?-production rules. G S
?ADC EFg A ?aA ? D ?
FGH b C ?c ? E
? a F ? f ?
G ? Gg H H ? h ?
Computing W W0 A, C, F, H W1 W0 ?
G A, C, F, G, H W2 W1 ? D A, C,
D, F, G, H W3 W2 ? S A, C, D, F, G,
H, S W4 W3 ? A, C, D, F, G, H,
S P1 S ?ADC EFg A ?aA
D ? FGH b C ?c
E ? a F ? f
G ? Gg H H ? h P S
?ADC AD AC DC A D C ? EFg Eg
A ?aA a D ? FGH FG
FH GH F G H b C ?c
E ? a F ? f
G ? Gg g H H ? h
10Eliminating Useless Symbols from a CFG
- Lemma 1. Given a CFG G (VT , VN , P, S), we can
construct an equivalent CFG G (VT , VN , P,
S), such that every nonterminal symbol A in VN
derives a string x ? (VT) - Proof. Let OLDV and NEWV be sets of
nonterminals, and A be an arbitrary nonterminal.
We construct VN and P as follows. - OLDV ? NEWV A A ? w is in P for
some w ? (VT) - while (OLDV ? NEWV) do
-
- OLDV NEWV
- NEWV OLDV ?A A ? ? for some ? in (VT ?
OLDV) -
- VN NEWV P A ? ? A ? ? is in P and
? ?(VN ? VT)
11Eliminating Useless Symbols from a CFG(conted)
Lemma 2. Given a CFG G (VT,VN, P, S), we can
construct an equivalent CFG G (VT
,VN , P, S), such that, for each symbol X ? VT
? VN , the start symbol derives ?X?, for some ?,
? ? (VT ? VN), i.e., S can derive a
sentential form (a string of terminals and
nonterminals) which contains symbol X. Proof.
The following algorithm computes VT, VN and
P. (1) Let VT and VN be the empty sets.
(2) Put S into VN. (3) If A ? VN is put into
VN and A ? ?1 ?2 .... ?n , then all
nonterminals in ?i, 1 ? i ? n, are put into
VN and all terminals in are put into VT.
(4) Repeat (3) until there is no symbol to be
added to VN . (5) Let P contain all the
productions in P except for the ones which have a
symbol not in VT ? VN.
12Eliminating Useless Symbols from a CFG (conted)
- Theorem. Given arbitrary CFG G (VT , VN , P,
S), we can construct an - equivalent CFG G (VT , VN , P, S), such
that, - (1) for each A ? VN , A ? (V)T (i.e., A
derives a terminal string or ? ), and - (2) for each X ? VT ? VN , S ? ?X?, for some
?, ? ? VN ? (V)T , - (i.e., the start symbol can drive a
sentential form which contains X). - Proof. Use Lemmas 1 and 2.
13Eliminating Useless Symbols from a CFG (example)
Example. Eliminate useless symbols from the
following CFG G. G S ?AD EFg A
?aGD D ? FGd C
?cCEc E ? Ee F
? Ff ? G ? Gg g H
? hH h
Step 1 Apply Lemma 1 to find the set of
nonterminals VN such that every nonterminal
symbol in VN derives a string x ? (VT). OLDV
NEWV F, G, H OLDV NEWV NEWV OLDV
? D D, F, G, H OLDV NEWV NEWV OLDV
? A A, D, F, G, H OLDV NEWV NEWV
OLDV ? S A, D, F, G, H, S OLDV NEWV
NEWV OLDV ? A, D, F, G, H, S VN
NEWV A, D, F, G, H, S Find the set of rules
P . P S ?AD A ?aGD
D ? FGd F ? Ff ? G ? Gg
g H ? hH h
14- P S ?AD A ?aGD D ?
FGd F ? Ff ? - G ? Gg g H ? hH h
- Step 2 Find the set of symbols V VT ? VN
such that each symbol in V can be derived
starting from S. - VT VN // initialize with empty set
- VN VN ?S VT VT ?
- VN VN ?A, D S, A, D VT VT ?
- VN VN ?G, F S, A, D, G, F VT
VT ? a, d a, d - VN VN ? S, A, D, F, G VT VT ?
a, d a, d, g, f - VN VN ? S, A, D, F, G VT VT ?
a, d, g, f -
- Cleaned set of rules
- P S ?AD A ?aGD D ?
FGd F ? Ff ? - G ? Gg g
15Eliminating Useless Symbols from a CFG (conted)
Remark Notice that applying Lemma 2 first and
then Lemma 1 may fail to eliminate all useless
productions. Example. Consider grammar with rules
P S ? AB a A ? a By applying Lemma 1
first, we have P S ? a A ? a , then
applying Lemma 2, we have P S ? a .
However, if we apply Lemma 2 first, we have P
S ? AB a A ? a . Then applying Lemma 1,
we have P S ? a A ? a , which still has
a useless production.
16Ambiguous Context-free Grammars
- There are two kinds of ambiguities in a
language. - Lexical ambiguity (or semantic ambiguity) A
symbol or an expression has more than one meaning
(e.g., story, saw). - Syntactic ambiguity (or structural ambiguity) An
expression can be parsed in two different ways. A
CFG is ambiguous if the language has a string for
which there are more than one parse tree.
For a given context-free grammar G and a string
x, the parse tree shows how x is derived with the
rules of G (see an example on the next slide). In
programming language different parse trees give
different object codes. In this course we will
only study syntactic ambiguity of context-free
grammars.
Example 1 (in natural language). A man entered
a room with a picture can be interpreted in two
different ways.
17Ambiguous Context-free Grammars (conted)
- Example 2 (in formal language). The following
context-free grammar is ambiguous, because it - has two parse trees shown in Figures (a) and (b)
below for string p ? q ? r. - G S ? S?S?S?S??S?A A ?
p?q?r -
18Some Techniques for Designing Unambiguous CFG
(1) Use parenthesis such that each derivation
tree generates unique string. Notice that
this technique changes the language by
introducing new terminal symbols, the
parentheses. Example Ambiguous G1 S ?
S?S?S?S??S?A A ? p?q?r
Unambiguous G2 S ? (S?S)?(S?S)??(S) ?A
A ? p?q?r
19Some Techniques for Designing Unambiguous CFG
- (2) Modify the production rules that cause the
ambiguity. - Examples (a) Grammar G3 below is clearly
ambiguous grammar because it can - either generate left side b first and then right
side b or vice versa for string bcb. - Grammar G4 doesnt have this possibility because
it generates left side bs first, - if any. Ambiguous G3 S ?
bS?Sb?c - Unambiguous G4 S ?
bS?A A ? Ab?c
Figure (a). Ambiguity of G3
Figure (b). Unambiguous G4.
20Some Techniques for Designing Unambiquous
CFG(conted)
- (b) The following grammar G5 is ambiguous, since
it can generate ? in two ways. - We eliminate this possibility by applying the
technique of reducing ?-production - rules. Grammar G6 is the result .
- G5 S ? B?D B ? bBc??
D ? dDe?? - G6 S ? B?D?? B ? bBc?bc
D ? dDe?de - (c) Grammar G1 can be modified in two different
ways to make it unambiguous. - Notice that for G7 we used the same technique
for Example (a) above. - G7 S ? A?S?A?S??S?A A ? p?q?r
- G8 S ? D?S?D D ? C ? D?C C
? ?C?A A ? p?q?r - For G8 we set up a precedence rule such that ?,
if any, is derived (by S) first, then - ? (by D) and ? in that order from the top of
the parse tree. The later an operator is - derived the higher precedence it has over the
others.
21Known facts about ambiguous context-free grammars.
- There is no algorithm that can tell whether an
arbitrary CFG is ambiguous or not. - There is so called inherently ambiguous
context-free languages for which every CFG is
ambiguous. Here is an example. - anbncmdm ? n, m ? 1 ? anbmcmdn ? n, m ? 1.
- There is no algorithm that can convert an
arbitrary ambiguous CFG, which is not inherently
ambiguous, to an unambiguous one.
22Normal Forms of Context-free Grammars
When we investigate context-free grammars and
their languages, sometimes it is convenient to
make the right side of each production rule meet
certain form. Such form is called normal form.
There are two normal forms for context-free
grammars Chomsky Normal Form(CNF) and Greibach
Normal Form(GNF). Let G (VN, VT, P, S) be a
context-free grammar. Grammar G is in CNF, if all
the production rules of the grammar are of the
form A ? BC or A ? a, where A, B, C?VN, a ?VT. A
context-free grammar is in GNF, if every
production rule of the grammar is of the form A ?
a?, where A ? VN , a ?VT , and ? ? (VN). Notice
that ? is a string of nonterminal symbols or a
null string. We can show that every context-free
grammar whose language does not contain ? can be
converted to CNF and GNF. (Recall that we can
eliminate all ?-production rules from a given
context-free grammar, if its language does not
contain ?.) The following example shows how to
convert a context-free grammar to CNF. We can
easily generalize the idea. Converting a
context-free grammar to GNF is quite involved
(see the text Chapter 6). We shall not study the
proof.
23Converting a Context-free Grammar to CNF(example)
- Suppose that a context-free grammar has a
production rule A ? aBCDbE, which is not in CNF.
We introduce new nonterminal symbols and
production rules in CNF such that A can derive
the right side string aBCDbE as follows - A ? A1B1 A1 ? a // and we let B1
derive BCDbE as follows - B1 ? BC1 //
and we let C1 derive CDbE as follows - C1 ? CD1 // and we let
D1 derive DbE as follows - D1 ? DE1 // and we let
E1 derive bE as follows - E1 ? F1E F1 ? b // and we let E1
derive bE as follows
Example. Convert the following context-free
grammar to CNF. S ? AaBCb A ? abb
B ? aC C ? aCb ac
Answer S ? AA1 A1 ? A2A3 A2 ? a A3
? BA4 A4 ? CA5 A5 ? b A ? B1B2
B1? a B2 ? B3B4 B3 ? b B4 ? b B ?
C1C C1 ? a C ? D1D2 E1E2 D1 ?
a D2 ? CD3 D3 ? b E1 ? a E2 ?
c