Title: Properties of Context-Free Languages
1Properties of Context-Free Languages
- Juan Carlos Guzmán
- CS 6413 Theory of Computation
- Southern Polytechnic State University
2Summary
- Normal Forms
- Pumping Lemma
- Closure Properties
- Decision Properties
3Normal Forms
- Recall that many different grammars generate the
same language - We would like to restrict the form of the
productions of the CFG - Chomsky Normal Form
- Greibach Normal Form
- Tasks to accomplish
- Eliminate useless symbols
- Eliminate e-productions
- Eliminate unit productions
4Grammar Transformations
- We are about to present a series of
transformations on grammars - You should consider each of them as a
transformation function - T Grammar ? Grammar
5Elimination of Useless Symbols
- Let G(V,T,P,S)
- X?V is useful if there exist ?, ?, and w such
that - S ? ? X ? ? w
- Two considerations
- X must generate strings
- X ? v
- X must be reachable from S
- S ? ? X ?
6Elimination of Non-Generating Symbols (Tg)
- Let G(V,T,P,S ) be a CFG
- G (V ? S ,T,P,S ), where
- V A (A??)?P ? ? ? (T ?V )
- P (A??) (A??)?P
- ? A ?V
- ? ? ? (T ?V )
- contains only generating symbols
7Example
- G (S,A,B,C ,a,b ,P ,S ), where
- P S ? a A,
- A ? AB BCA a,
- B ? b,
- C ? ACA BCB
- V S,A,B
- G (S,A,B ,a,b ,P ,S ), where
- P S ? a A,
- A ? AB a,
- B ? b
8Elimination of Non-Reachable Symbols (Tr)
- Let G(V,T,P,S ) be a CFG
- G (V,T,P,S ), where
- V S ? B (A??B ?)?P ? A?V
- P A?? (A??)?P ? A ?V
- contains only reachable symbols
9Example
- G (S,A,B,C ,a,b ,P ,S ), where
- P S ? a A,
- A ? AB a,
- B ? b,
- C ? ACA BCB
- V S,A,B
- G (S,A,B ,a,b ,P ,S ), where
- P S ? a A,
- A ? AB a,
- B ? b
10Useful Symbols
- Remove
- non-generating symbols
- non-reachable symbols
11Elimination of e-Productions (Te)
- Let G(V,T,P,S ) be a CFG
- Ve A (A??)?P ? ??Ve
- G (V-Ve,T,P,S ), where
- P A??0X1 Xk?k
- A??0B1 Bk?k ?P ?
- for all 1?i ?k Bi ? Ve ? Xi ? e,
Bi ? - for all 0?i ?k ?i ?(T ?V-Ve) ?
- ?0X1 Xk?k gt 0
- does not contain e-prods and generates L(G) - e
12Example
- G (S ,a,b ,P ,S ), where
- P S ? aSbS bSaS e
- Ve S
- G (S ,a,b ,P ,S ), where
- P S ? aSbS aSb abS ab
- bSaS bSa baS ba
- Note that G does not generate e
13Elimination of Unit Productions (Tu )
- Let G(V,T,P,S ) be a CFG
- Let Up (A,A) A?V
- ? (A,C ) (A,B)?Up ? (B?C )?P
- G (V,T,P,S ), where
- P A?? (A,B)?Up ? (B??)?P ? ??V
- does not contain unit prods and generates L(G )
14Example
- G (E,T,F ,,,(,),a ,P ,E ), where
- P E ?ET T, T ?TF F, F ? a (E )
- Up (E,E ),(E,T ),(E,F ),(T,T ),(T,F ),(F,F )
- G (V,T,P,S ), where
- P E ?ET TF a (E ),
- T ?TF a (E ),
- F ? a (E )
15Summary of Transformations
- Given a CFG G, we can obtain a new grammar G
such that - no e-productions
- no unit productions
- no useless symbols
- by transforming the original grammar in this
order - Tr ? Tg ? Tu ? Te
16Results of the Transformations
- After the transformations
- the grammars do not have useless symbols (and
associated productions) - their productions (A??) are not
- e-productions
- Unit productions
- Therefore, ? must satisfy
- ?gt1, or
- ??T
17Implications for Transformed Grammars
- Transformed grammars have some nice properties
- No unit productions
- No e-productions
- However, they produce bushy trees
18Chomsky Normal Form
- Any CFG without e can be transformed so that each
of its productions is of the form - A ? BC, where A,B,C ?V
- A ? a, where A ?V ? a ? T
- The idea behind CNF is to obtain grammars whose
parse trees are binary trees
19Chomsky Normal Form
- Productions of grammars not yet in CNF, but
already transformed, are of the following forms - A?X1 Xk k gt1, all Xi ?T ?V, or
- A?a a ?T
- We need to further transform the first kind of
productions so that - the right-hand-side consists only of variables,
and - break long RHSs into chains of productions
20Chomsky Normal Form
- Transformations
- For every terminal a that appears on a RHS of
length 2 or more - Create a production A?a
- Replace a in all such productions with A
- Replace every production A?B1 Bk (k gt2) with
- A?B1C1
- C1?B2C2
-
- Ck-2?Bk-1Bk
21Example
22Greibach Normal Form
- All productions must be of the form
- A?aB1 Bk k ?0
- Note that each derivation step is associated with
the generation of a terminal - This translates nicely to PDAs where each
movement of the automaton will be guided by the
recognition of an input character - To convert to GNF
- Order the variables (A1 An)
- Modify the production set so that
- Ai ? Aj? implies that i ? j
- remove left recursion i.e., Ai ? Aj? implies that
i lt j - Ai ? a?
- Ai ? a?, ? ?V
- The algorithm resembles matrix triangularization
- It appears in 1st edition of our book
23Relation Between Height and Yield of a CNF Parse
Tree
- Note that tree nodes of grammars in CNF are
- binary nodes for productions (A ? BC)
- unit terminal nodes for productions (A?a)
- The yield of a complete CNF parse tree of height
n is of size 2n-1 or less
S
height n-1 At most 2n-1
height n
a1 a2 a3 at
24Pumping Lemma
- Let L be a context-free language. Then there
exists a constant n (which depends on L) such
that for every string z in L such that z?n, we
can break z into five strings, z uvwxy, such
that - vwx ? n
- vx ? e
- For all i ? 0, the string uviwxiy is also in L
25Pumping Lemma
- In plain words
- For any context-free language
- Words of large size will contain a substring
- Somewhere in the middle
- Not null, not too big
- That substring can itself be broken into three
pieces vwx - v not null or x not null
- v and x can be pumped (together) over and over
again - The new words are guaranteed in the language
- How large the words must be in order to be
considered large depends on the actual language
26Pumping Lemma Proof
- Find a CNF for the language
- The size of the word relates to the height of the
tree
A0
A1
A2
Ak
a
27Pumping Lemma Proof
- Find a CNF for the language
- For large words, a variable must be repeated
S
Ai
Aj
Note Ai Aj , i lt j
v
u
x
y
w
28Related Strings
- The strings
- uwy
- uvvwxxy
- uvnwxny
- are also in the language
29How about e?
- If the language contains e
- The transformations remove e from the grammar
- Therefore you get a different language!!!
- CNF is not defined for languages with e
- If a language contains e
- A new grammar can be given, which generates the
same language - e will be generated in one derivation
- All other productions comply with CNF
30Closure Properties
- Context-free languages are closed under
- Substitution
- Regular Operators
- Homomorphism
- Reversal
- Intersection with regular language
- Inverse homomorphism
31Substitution
- A substitution is an operation which replaces
characters with strings - These strings are pulled from a particular
language
32SubstitutionFormally
- Let S be an alphabet
- Let La a language associated to a ? S
- s(a) La
- s(a1a2an) s(a1)s(a2)s(an) La1 La2 Lan
- s(L) s(w) w ? L
33Substitution
- CFLs are closed under substitution with CFLs
- Let G (V, S,P,S ), such that L(G ) L
- Let Ga (Va,Ta,Pa,Sa), such that L(Ga) La
- Let G (V,T,P,S ) where
- V V ? (?a?S Va )
- T (?a?S Ta )
- P (?a?S Pa ) ? P, where
- P is all productions of P, where each terminal
a was replaced by the corresponding Sa - G generates s(L)
34Example
- G (S,0,1,P,S), where
- P S ? SS 0S1 e
- L0 (
- L1 )
- Or
- L0 0
- L11
35Closure Under Regular Operators
- CFLs are closed under
- Union
- Concatenation
- Closure (), and positive closure()
36Closure Under Homomorphism
- CFLs are closed under homomorphism
- This is a special case of substitution
- Substitution with a single string
37Reversal
- CFLs are closed under reversal
- Just reverse all productions
38Intersection with a Regular Language
- CFLs are not closed under intersection
- They are closed under intersection with a regular
language
39Inverse Homomorphism
- CFLs are closed under inverse homomorphism
40Decision Properties of CFLs
- Complexity to transform grammars to PDAs, and
within PDAs - Complexity of transformation to CNF
- Testing Emptyness of CFLs
- Testing Membership in a CFL
41Undecidable Problems
- Is a given CFG G ambiguous?
- Is a given CFL L inherently ambiguous?
- Is the intersection of two CFLs empty?
- Are two CFLs the same?
- Is a given CFL equal to S, where S is the
alphabet of the language?