Title: CSE 3813 Introduction to Formal Languages and Automata
1CSE 3813Introduction to Formal Languages and
Automata
- Chapter 8
- Properties of Context-free Languages
- These class notes are based on material from our
textbook, An Introduction to Formal Languages and
Automata, 4th ed., by Peter Linz, published by
Jones and Bartlett Publishers, Inc., Sudbury, MA,
2006. They are intended for classroom use only
and are not a substitute for reading the textbook.
2The pumping lemma for context-free languages
- Suppose you have a CFG G in which the variable
A is used in two different rules, to derive two
different strings, e.g., - (1) S ? vAz
- (2) A ? wAy
- (3) A ? x
- We can use these rules, applying rule 2
recursively, to generate the following string - S ? vAz ? vwAyz ? vwwAyyz ? vwwwAyyyz ? ...
? vwnxynz.
3The pumping lemma for CFLs
- Of course, we can apply rule 3 at any point along
the way to bring the process to a halt. Thus,
the following strings are all legitimate strings
in the language - vwxyz, vwwxyyz, vwwwxyyyz, etc.
- In fact, with rules 2 and 3 in the language,
there is no way to prevent the language from
containing an infinite number of strings of the
form vwnxynz.
4The pumping lemma for CFLs
- Remember the definition of Chomsky Normal Form
grammars A CFG is in Chomsky Normal Form if
every production is of one of these two types - A ? BC
- A ? a
- Remember also that we can put any CFG grammar
into CNF (omitting the null string, if it belongs
to the original language).
5The pumping lemma for CFLs
- If a grammar is in CNF, then its derivation tree
will be binary that is, every node will have at
most two children. Why? There are only 3
possibilities - (1) The node represents the first type of rule
above, in which a single variable produces two
variables. - (2) The node represents the second type of rule
above, in which a single variable produces a
single terminal. - (3) The node is a terminal node and so has no
children.
6The pumping lemma for CFLs
- A path in a binary tree is either empty, or
consists of a node, one of its descendants, and
all of the nodes in between. - The length of a path is the number of nodes it
contains (for this class, we will us this
definition however, most of the time length and
height are in terms of the number of edges, not
number of nodes). - The height of a binary tree is the length of its
longest path.
7The pumping lemma for CFLs
- You could create a very tall binary tree by
having all branches be unary. - You can create the shortest possible binary
tree by having all of its branches be binary,
except possibly for some or all of the branches
at the bottom level of the tree.
8The pumping lemma for CFLs
- What is the smallest height possible in a
binary tree of 7 nodes? How many leaf nodes does
it have?
height 3 num. leaves 4
9The pumping lemma for CFLs
- What is the smallest height possible in a binary
tree of 15 nodes? How many leaf nodes does it
have?
height 4 num. leaves 8
10The pumping lemma for CFLs
- What is the smallest height possible in a
binary tree of 31 nodes? How many leaf nodes
does it have?
height 5 num. leaves 16
11The pumping lemma for CFLs
- What is the smallest height possible in a binary
tree of (2n) - 1 nodes? How many leaf nodes does
it have? - height n
- num. leaves 2n-1
12The pumping lemma for CFLs
- Note the pattern here
- In a completely filled binary tree with (2n) 1
nodes, half of the nodes (rounding up) will be
leaves. That is, (2n) / 2 nodes will be leaf
nodes. And we can rewrite (2n) / 2 as 2n-1. - This leads us to the following lemma
13The pumping lemma for CFLs
- Lemma
- For any h ? 1, a binary tree which has more than
2h-1 leaf nodes must have a height greater than
h. - Example
- If a binary tree has 17 leaf nodes, can it have
a height of 5? - No a complete binary tree of height 5 has only
16 leaf nodes. A binary tree with 17 leaves must
have a height greater than 5.
14The pumping lemma for CFLs
- Here is the point of all this
- If the height of the derivation tree for a given
string in the language is h, and there are fewer
than h production rules in the grammar, then at
least one rule must recur on the same path in the
derivation of this string.
15The pumping lemma for CFLs
- For a variable to recur farther down in the same
path, it must be either - self-recursive (e.g., A ? aA)
- or
- path-recursive (e.g., A ? aB, and B ? bA )
- In either case, this variable may be pumped an
unrestricted number of times.
16Theorem 8.1
- Let L be a CFL. Then there is an integer m so
that for any w ? L satisfying w ? m, there are
strings u, v, x, y, and z satisfying - w uvxyz
- vy gt 0
- vxy ? m
- for any i gt 0, uvixyiz ? L
17The pumping lemma for CFLs
- We can use the pumping lemma for context-free
languages to prove that there must exist some
language that is not context-free. - We do this by assuming that the language is
context free this means that there must be an m
satisfying the conditions given above. - If we find that this causes a contradiction,
then we know the language cant be a CFL.
18Proof
- Given the language L aibici i ? 1, assume
that L is context-free. - Let w ambmcm, with w ? m.
- According to theorem 8.1, vy gt 0. Thus, v
and y together must contain at least one type of
symbol. - According to theorem 8.1, vxy ? m. Thus, the
string vxy can contain at most two distinct types
of symbols.
19Proof
- The string vxy cant contain all three symbols,
a, b, and c. (Why? Because vxy ? m.) - The string uv2xy2z contains additional
occurrences of the symbols in v and y. - Therefore, uv2xy2z cannot contain equal numbers
of all three symbols. - But the pumping lemma says that uv2xy2z must be
a legitimate string in L. Obviously, this is a
contradiction. - Consequently, L cannot be a context-free
language.
20Example
- Given the language L aibici i ? 1, how
would you try to process this language using a
push-down automaton? - We can insure that we have an equal number of as
and bs, by pushing the as onto the stack one at
a time, then popping them off and matching them
up with the bs one by one.
21Example
- However, once we have done that, we dont have
anything left to match the cs with, so we cant
guarantee that we have the same number of cs as
as and bs. - We cant solve this problem by pushing as or
bs back onto the stack. - This is due to the limitations of the type of
memory we have in a PDA.
22Pumping lemma (again)
- The pumping lemma for regular languages states
every sufficiently long string in a regular
language contains a short substring that can be
pumped. - The pumping lemma for context-free languages
states every sufficiently long string in a
context-free language contains two short (and
close-together) substrings that can be pumped
(the same number of times).
23Formal statement (again)
Let L be a context-free language. Then there
exists some positive integer m such that any
string w ? L of length w ? m can be decomposed
into substrings, u, v, x, y, z, such that w
uvxyz, and vxy ? m, v gt 0 or y gt
0, uvkxykz ? L, for k ? 0
24Informal statement
Every context-free language has a pumping
length such that every string in the language
that is longer than this can be pumped to yield
another string in the language. The string can
be divided into five parts such that the second
and fourth parts can be repeated together, or
pumped, any number of times, and the resulting
string remains in the language.
25What is m?
In the pumping lemma for regular languages, the
pumping length m reflects the number of states
of the finite automaton. In the pumping lemma
for context-free languages, what does m reflect?
Roughly, it is the length of the longest string
that can be generated by a parse tree in which
the same nonterminal never occurs twice on the
same path through the tree.
26In a sufficiently large parse tree, some
nonterminal must repeat along some path from the
root. This follows from the pigeonhole principle.
S
A
A
u v x y
z
27Proof Idea
- The repetition of some nonterminal along a path
through the parse tree allows us to replace the
subtree under the last occurrence of the
nonterminal with the subtree under an earlier
occurrence of the nonterminal and still get a
valid parse tree - This corresponds to pumping v and y
- Note that the parse tree of the previous slide
corresponds to the following derivation
28Important to remember
You can use a pumping lemma to prove that a
language is not context-free (or regular). You
cannot use a pumping lemma to prove that a
language is context-free (or regular).
29Exercise
The language L ww w ? a, b is not
context-free. Pick a string in L. Try ambmambm.
Then note that you must consider three cases.
It must be the case that vxy is a substring of
the prefix ambm, or the middle bmam, or the
suffix ambm. Intuitively, why cant a PDA accept
this language, although it can accept the
language wwR w ? a, b?
30Definition 8.1 Linear Languages
A context-free language L is said to be linear if
there exists a linear context-free grammar G such
that L L(G). (Remember that a linear grammar
has at most one variable on the right side of
each production rule.)
31Theorem 8.2 Pumping Lemma for Linear Languages
Let L be an infinite linear language. Then there
exists some positive integer m, such that any w ?
L, with w ? m can be decomposed as w uvxyz
with uvyz ? m vy ? 1 such that uvixyiz
? L for all i 0,1,2
32Pumping Lemma for Linear Languages
Note that the conclusion for this theorem is
different from Theorem 8.1, since in 8.1 we
have vxy ? m and in Theorem 8.2 we
have uvyz ? m This implies that the strings v
and y to be pumped must now be within m symbols
of the left and right ends of w, respectively.
The middle string x can be of arbitrary
length. Theorem 8.2 helps establish the fact that
the family of linear languages is a proper subset
of the family of context-free languages.
33Closure properties for context-free languages
The family of context-free languages is closed
under the operations of Union Concatenation K
leene closure but not under the operations
of Intersection Complementation
34Definition
- A context-free grammar (CFG) is a 4-tuple
- G (V, T, S, P) where V and T are disjoint
sets, S ? V, and P is a finite set of rules of
the form A ? x, where A ? V and x ? (V ? T). - V non-terminals or variables
- T terminals
- S Start symbol
- P Productions or grammar rules
35Closure properties of CFGs
- CFLs are closed under Union, Concatenation and
Kleene closure. - Proof by construction
- Let
- G1 (V1, T1, S1, P1) and
- G2 (V2, T2, S2, P2)
- with
- L1 L(G1) and
- L2 L(G2)
36Union
- We create grammar Gu (Vu, T1 ? T2, Su, Pu)
generating - L1 ? L2
- 1. Rename the elements of V2 if necessary so
that V1 ? V2 ?. - 2. Create a new start symbol Su, not already in
V1 or V2. - 3. Set Vu V1 ? V2 ? Su
- 4. Set Pu P1 ? P2 ? Su ? S1 S2
- Construction completed.
37Concatenation
- We create grammar Gc (Vc, T1 ? T2, Sc, Pc)
generating L1L2 - 1. Rename the elements of V2 if necessary so
that V1 ? V2 ?. - 2. Create a new start symbol Sc, not already in
V1 or V2. - 3. Set Vc V1 ? V2 ? Sc
- 4. Set Pc P1 ? P2 ? Sc ? S1S2
- Construction completed.
38Closure under Kleene star
- Let G1 be any context-free grammar with the
starting symbol S. Adding the rules - S ? ? and
- S ? SS
- creates a new context-free grammar G2 such that
L(G2) is the result of applying the Kleene star
operator to L(G1).
39Kleene Closure
- We create grammar G (V, T, S, P) generating
L1 - 1. Create a new start symbol S, not already in
V1. - 2. Set V V1 ? S
- 3. Set P P1 ? S ? S1S l
- Construction completed. (See text for
justification.)
40Not closed under intersection
- The context-free languages are not closed under
Intersection. However, the intersection of a
context-free language with a regular language is
always a context-free language. - The context-free languages are not closed under
Complementation
41Corollary
- Are Regular Languages context free?
- Yes.
- Why?
- We can express any Regular language in the form
of a CFG. - Regular languages are a proper subset of CFGs.
42Are Regular Languages context free?
- Proof
- According to your textbook, the set of regular
languages is the smallest set that contains all
languages ?, l, and a (for every a ? S) and
is closed under the operations of union,
concatenation, and Kleene. We just demonstrated
that the operations of union, concatenation, and
Kleene on CFGs produce CFGs, so all we need to
do is show that the languages ?, l, and a
have CFGs.
43Are Regular Languages context free?
- The empty language can be written
- S ? S
- The language consisting of a null string can be
written - S ? l
- The language consisting of single characters can
be written - S ? a
- QED
44Decision properties of context-free languages
Can decide Membership Empty Infinite But there
is no algorithm for deciding whether two CFGs
generate the same language!