CFL Pumping Lemma - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

CFL Pumping Lemma

Description:

Outline of Proof of PL. Let there be a Chomsky-normal-form CFG for L with m variables. ... v does not contain both a's and b's and b s or c's and the same holds for ... – PowerPoint PPT presentation

Number of Views:638
Avg rating:3.0/5.0
Slides: 42
Provided by: iden
Category:
Tags: cfl | lemma | pumping

less

Transcript and Presenter's Notes

Title: CFL Pumping Lemma


1
CFL Pumping Lemma
  • Similar to regular-language PL, but you have to
    pump two strings in the middle of the string, in
    tandem (i.e., the same number of copies of each).
    Formally
  • ? CFL L
  • ? integer n
  • ?z in L, with z ?n
  • ?uvwxy z such that vwx ?n and vx gt 0
  • ?i? 0, uviwxiy is in L.

2
Outline of Proof of PL
  • Let there be a Chomsky-normal-form CFG for L with
    m variables. Pick n 2m.
  • Because CNF grammars have bodies of no more than
    2 symbols, a string z of length n must have some
    path with at least m 1 variables.
  • Thus, some variable must appear twice on the
    path.
  • Compare with the DFA argument about a path longer
    than the number of states.
  • Focus on some path that is as long as any path in
    the tree. In this path, we can find a duplication
    of some variable A among the bottom m 1
    variables on the path.
  • Let the lower A derive w and the upper A derive
    vwx.

3
  • CNF guarantees us that vwx ?n and vx??.
  • By repeatedly replacing the lower A's tree by the
    upper A's tree, we see uviwxiy has a parse tree
    for all igt 1.
  • And replacing the upper by the lower shows the
    case i 0 i.e., uwy is in L.

4
(No Transcript)
5
  • Consider the derivation S? uAy ?uvAxy ?uvwxy

S
A
y
u
A
A
x
v
? uviwxiy ? L
w
6
Another formulation
  • A path in a binary tree representing a derivation
    for a grammar in CNF either is empty or consists
    of a node, one of its descendents, and all nodes
    in between.
  • Length of the path is the number of nodes it
    contains
  • Height of a binary tree is the length of its
    longest path.

LEMMA For any h? 1, a binary tree having more
than 2h leaf nodes, must have a height greater
than h 1
7
  • Pumping lemma constant for CFLs is 2n1 where n
    is the number of variables in the grammar in CNF
  • Why? Because the derivation tree for a
    sufficiently long string must have a height of at
    least n 2
  • It has more than 2nleaf nodes, and therefore its
    height is greater than n 1
  • If n non-terminals, string must have tree with
    height n2 to be sufficiently long for the PL to
    be applied
  • Height of n1 gets us to the level of the
    bottom-most non-leaf nodes (nodes labeled with a
    non-terminal)
  • Consider a leaf and the n 1 nodes above it
    since there are only n variables, one must appear
    twice.

8
S
z (ab)(b)(ab)(b)(a) u v w x y
C1
C2
Root of vwx
C3
C5
A
C4
? Height of this subtree ? n 2
a
a
b
B
C6
Cannot overlap w
b
C7
A
?vx gt 0
b
C9
C8
a
b
9
Example
The classic non-CFL
  • L aibici i? 0 is not a CFL.
  • Suppose it were. Then let n be the PL constant
    for L.
  • Consider z anbncn. We can write z uvwxy, with
    vwx ?n and vx gt 0 (i.e., either v or x is
    non-empty).
  • Two cases to consider
  • Both v and x contain only one type of alphabet
    symbol v does not contain both a's and b's or
    both b's and c's, and the same holds for x. But
    in this case uv2wx2y cannot contain equal numbers
    of a's, b's, and c's.
  • Either v or x contain more than one type of
    symbol in this case uv2wx2y may contain equal
    numbers of a's, b's, and c'sbut they won't be in
    the correct order.
  • One of these cases must occur, and both result in
    contradiction. So the assumption that L is a CFL
    is false.

10
Example
  • L 0k2 k is any integer is not a CFL.
  • Suppose it were. Then let n be the PL constant
    for L.
  • Consider z 0n2. We can write z uvwxy, with
    vwx ?n and vx gt 0.
  • Then uvvwxxy should be in L.
  • But n2lt uvvwxxy ?n2 nlt (n 1)2, so there is
    no perfect square that uvvwxxy could be.
  • By "proof by contradiction, L is not a CFL.

11
Example
  • L ww w? 0,1 is not a CFL.
  • Suppose it were. Then let n be the PL constant
    for L.
  • Choosing a string is less obvious for this
    language.
  • Try z 0n10n1.
  • But it can be pumped by dividing as follows

12
  • Try another string 0n1n0n1n
  • seems to capture more of the "essence" of the
    language
  • Use PL condition that the string can be pumped by
    dividing into z uvwxy, where vwx ? n.
  • vwx must straddle the midpoint of z. Otherwise,
    if only in the first half of z, pumping up
    touv2wx2y moves a 1 into the first position of
    the second half, so it cannot be of form ww. If
    in the second half, a 0 is moved into the last
    position of the first half, so cannot be of form
    ww.
  • If vwx straddles the midpoint of z, pumping z
    down to uwy yields 0n1i0j1n, where i and j cannot
    both be n. This string cannot be of form ww.
    Contradiction!

13
Example
  • L aibjck iltjltk is not a CFL
  • Suppose it were. Then let n be the PL constant
    for L.

Consider z anbn1cn2. We can write z uvwxy,
with vwx ?n and vx gt 0, and uviwxiy ?L for
every i ? 0.
This time must pump down as well as pump
up. First we consider the case where vx contains
at least one a. Then sincevwx ?n, vx can
contain no c's. Therefore, uv2wx2y has at least n
1 a's and exactly n 2 c's, which is
impossible for strings in L. If vx contains no
a's, then it must contain either b or c.In this
case, uv0wx0y uwy has either fewer than n 1
b's or fewer than n 2 c's, but in either case
exactly no a's. This is also impossible for
strings in L.
By "proof by contradiction," L is not a CFL.
14
Example
  • L aibjck 0 ? i ? j ? k is not a CFL
  • Suppose it were. Then let n be the PL
    constant for L.

Consider z anbncn. We can write z uvwxy, with
vwx ?n and vx gt 0, and uviwxiy ?L for every i
? 0..
  • When both v and x contain only one type of
    symbol, v does not contain both as and bs and
    bsor cs and the same holds for x. Must divide
    into three sub-cases
  • No as. Then try pumping down to obtain uv0wx0y
    uwy. Contains too few bs or cs.
  • No bs. Then either as or cs must appear in v
    or x because both cant be the empty string. If
    as appear, then uv2wx2y contains more as than
    bs. If cs appear, then uv0wx0y contains more
    bs than cs.
  • No cs. The string uv2wx2y contains more as or
    more bs than cs.
  • When either v or x contain more than one type of
    symbol, uv2wx2y will not contain symbols in the
    correct order.

By "proof by contradiction," L is not a CFL.
15
Example
  • L xyx x,y ? a,b and x 1is not a CFL
  • Suppose it were. Then let n be the PL
    constant for L.

Let z anbnanbn. Then z uvwxy for some u, v,
w, x, and y, satisfying vx gt 0, vwx ?n, and
uviwxiy ?L for every i ? 0.
Suppose that vx contains either only a's from
the first group or only b's from the last group.
Then uv2wx2y is either anibnanbn or anbnanbni
for some 0 lti ?n, and in neither case can this
string be in the form xyx for any x with x gt
0. Otherwise, vx contains either a b from the
first group or an a from the second. In this case
uv0wx0y is either aibjakbn or anbiajbk where in
either case i and k are positive and jltn. Neither
of these strings can be in the form required for
L either.
By "proof by contradiction," L is not a CFL.
16
Closure Properties of CFLs
  • CFLs are closed under union, concatenation, and
    Kleene star.
  • CFLs are closed under reversal.
  • Proofs are the same as for regular languages by
    construction

17
If L1 and L2 are CFLs, then L1? L2is a CFL
Start with CFGs G1 (V1,?,S1,P1) and G2
(V2,?,S2,P2)
  • Construct Gu (Vu,?,Su,Pu) generating L1? L2
  • rename elements of V2 if necessary so that V1 ?
    V2 ?
  • define VuV1 ? V2 ? Su, Su ?V1 ,V2
  • define PuP1 ? P2 ? Su ? S1 S2

If L1 and L2 are CFLs, then L1L2is a CFL
Construct Gc (Vc,?,Sc,Pc) generating L1L2 -
rename elements of V2 so that V1 ? V2 ? -
define VcV1 ? V2 ? Sc, Sc ? V1 ,V2 - define
PcP1 ? P2 ? Sc ? S1S2
If L1 is a CFL, then L1is a CFL Construct
G (V,?,S,P) generating L1 - define VV1
? S - define PP1 ? S ? S1S ?
18
Non-closure Under Intersection
  • The following language L 0i1j2k3l i k and
    j l is not a CFL.
  • Intuitively, you need a variable and productions
    like A? 0A2 02 to generate the matching 0's and
    2's, while you need another variable to generate
    matching 1's and 3's. But these variables would
    have to generate strings that did not interleave.
  • However, the simpler language 0i1j2k3l i k
    is a CFL.
  • A grammar
  • S? S3 A
  • A? 0A2 B
  • B?1B ?
  • Likewise the CFL 0i1j2k3l j l.
  • Their intersection is L.

19
Closure under Complement?
  • The complements of some CFLs are also CFLs
  • Example anbn n 0
  • Complement can be accepted by a PDA
  • swap accepting states of PDA that recognizes
    anbn.

20
Non-closure of CFL's Under Complement
  • But not always!
  • The complement of non-CFL L 0i1j2k3l i k
    and j l is a CFL.
  • Here is a PDA P recognizing it
  • Non-deterministically choose whether to check i?k
    or j?l.
  • Non-deterministic PDAchecks one or the other,
    but capable of checking either one
  • Say we want to check i?k.
  • As long as 0's come in, count them on the stack.
  • Ignore 1's.
  • Pop the stack for each 2.
  • As long as we have not just exposed the
    bottom-of-stack marker when the first 3 comes in,
    accept, and keep accepting as long as 3's come
    in.
  • But we also have to accept, and keep accepting,
    as soon as we see that the input is not in
    L(0123).

21
Another Example
  • If L1 and L2 are CFLs, then if the CFLs are
    closed under complementation, then the language
  • Is context free because the CFLs are closed under
    union
  • But by DeMorgans Law,
  • It would then follow that CFLs are closed under
    intersection. But we have already seen that they
    are not!

22
Testing Emptiness of a CFL
  • As for regular languages, we really take a
    representation of some language and ask whether
    it represents ?.
  • In this case, the representation can be a CFG or
    PDA.
  • Our choice, since there are algorithms to convert
    one to the other.
  • The test Use a CFG check if the start symbol is
    useless

23
Testing Finiteness of a CFL
  • Let L be a CFL. Then there is some pumping lemma
    constant n for L.
  • Test all strings of length between n and 2n - 1
    for membership (as in next slide).
  • If there is any such string, it can be pumped,
    and the language is infinite.
  • If there is no such string, then n - 1 is an
    upper limit on the length of strings, so the
    language is finite.
  • Trick If there were a string z uvwxy of length
    2n or longer, you can find a shorter string uwy
    in L, but it's at most n shorter (why?). Thus, if
    there are any strings of length 2n or more, you
    can repeatedly cut out vx to get, eventually, a
    string whose length is in the range n to 2n - 1.

24
Testing Membership of a String in a CFL
  • Simulating a PDA for L on string w doesn't quite
    work, because the PDA can grow its stack
    indefinitely on ? input, and we never finish,
    even if the PDA is deterministic.
  • There is an O(n3) algorithm (n length of w)
    that uses a "dynamic programming" technique.
  • Called Cocke-Younger-Kasami (CYK) algorithm.

25
CYK Algorithm
  • Start with a CNF grammar for L.
  • Build a two-dimensional table
  • Row length of a substring of w.
  • Column beginning position of the substring.
  • Entry in row i and column j set of variables
    that generate the substring of w beginning at
    position j and extending for i positions.
  • These entries are denoted Xj,ij-1 i.e., the
    subscripts are the first and last positions of
    the string represented, so the first row is
    X11,X22, ,Xnn, the second row is X12,X23,
    ,Xn-1,n, and so on.

26
Table
  • The horizontal axis corresponds to the positions
    of the string w a1a2an
  • Table entry Xij is the set of non-terminals A
    such that A?aiai1aj
  • We are particularly interested in whether S is in
    Xn1 because that is the same as saying S?w (that
    is, w is in L)



27
  • Basis (row 1) Xii the set of variables A such
    that A ? a is a production, and a is the symbol
    at position i of w.
  • The grammar is in CNF, therefore the only way to
    derive a terminal is with a production of the
    form A ? a, so Xii is the set of non-terminals
    such that A ? ai is a production of G
  • Induction Suppose we want to compute Xij, which
    is in row j i 1
  • We can derive aiai1 aj from A if there is a
    production A ? BC, B derives any prefix of aiai1
    aj, and C derives the rest.
  • Thus, we must ask if there is any value of k such
    that
  • i ? k lt j.
  • B is in Xik.
  • C is in Xk1,j .

28
Example
  • We'll use the algorithm to determine if the
    string w aabbb is in the language generated by
    the grammar
  • S ?AB
  • A ?BB a
  • B ?AB b
  • Note that w11 a, so X11 is the set of all
    variables that immediately derive a, that is X11
    A. Since w22 a, we also have X22 A, and
    so on to get
  • X11 A, X22 A, X33 B, X44 B, X55
    B

29
  • Compute X12 since X11 A and X22 A, X12
    consists of all variables on the left side of a
    production whose right side is AA. None, so X12
    is empty.
  • Next X23 A A ?BB, B ? X22,B ? X33 so the
    required right side is AB, thus X23 S,B
  • Rest is easy
  • X12 ?, X23 S,B, X34 A, X45 A,
  • X13 S,B, X24 A, X35 S,B,
  • X14 A, X25 S,B,
  • X15 S,B
  • Since S is in X15, w? L(G)

30
S ?AB A ?BB a B ?AB b
31
S ?AB A ?BB a B ?AB b
a a b b b
A A B B B
A
S, B
S,B
S,B
32
Another Example
  • X ? aXb ab
  • Step 1 put into CNF
  • Apply CYK algorithm to aaabbb

33
X ?aXb ab
34
S ?AB BC A ?BA a B ?CC b C ?AB a
Another Example
Test for string baaba
35
CYK as a Parsing Algorithm
  • Applicability of the CYK algorithm as a parser
    limited by the computational requirements needed
    to find a derivation
  • For an input string of length n, (n2n)/2 sets
    need to be constructed to complete the dynamic
    programming table.
  • Each of these sets may require the consideration
    of several decompositions of the associated
    substring

36
Preview of Undecidable CFL Problems
  • Is a given CFG ambiguous?
  • Is a given CFG inherently ambiguous?
  • Is the intersection of two CFLs empty?
  • Are two CFLs the same?
  • Is a given CFL equal to S, where Sis the
    alphabet of the language?

37
The Chomsky Hierarchy
Turing Machine
r
Recursively Enumerable Languages
Context Sensitive Languages
Linear Bounded Automata
Context Free Languages
Regular Languages
Push Down Automata
Finite Automata
38
Context-Sensitive Grammars
  • The next grammar type, more powerful than CFGs,
    is a "somewhat restricted" grammar
  • A grammar is context-sensitive if all
    productions are of the form x?y, where x,yare
    in(V ?T) and x y
  • Fundamental property
  • grammar is non-contracting--i.e., the length of
    successive sentential forms can never decrease
  • Why "context-sensitive"?
  • All productions can be rewritten in a normal form
    xAy?xvy
  • Effectively, "A can be replaced by vonly in the
    context of a preceding xand a followingy"

39
Example
S ? aAbc ?abAc ? abBbcc ? aBbbcc ?
aaAbbcc ? aabAbcc ? aabbAcc ? aabbBbccc ?
aabBbbccc ? aaBbbbccc ? aaabbbccc
  • CSG for anbncn n 1
  • Try to derive a3b3c3

S ? abc aAbc Ab ? bA Ac ? Bbcc bB ? Bb aB
? aa aaA
A and B are "messengers"- an A is created on the
left, travels to the right to the first c,
creates another b and c. Then sends B back to
create the corresponding a. Similar to the way
one would program a TM to accept the language.
40
Linear-Bounded Automata
  • A limited TM in which tape use is restricted
  • Use only part of the tape occupied by the input
  • I.e., has an unbounded tape, but the amount that
    can be used is a function of the input
  • Restrict usable part of tape to exactly the cells
    taken by the input
  • LBA is assumed to be nondeterministic

41
Relation between CSLs and LBAs
  • If a language L is accepted by some linear
    bounded automaton, then there is a
    context-sensitive grammar that generates L.
  • Every step in a derivation from a CSG is a
    bounded function of w because any CSG G is
    non-contracting
Write a Comment
User Comments (0)
About PowerShow.com