Title: COSC 341 Lecture 14 Is English contextfree
1COSC 341 Lecture 14Is English context-free?
S ? NP VP NP ? ADJ NP ? ART NP ? N
VP ? V NP N ? apple ? mouse ? cat V ?
eats ? hugs ADJ ? itchy ? scratchy ART ? a ?
an ? the. Derivation S ? NP VP ? ART
NP VP ? ART N VP ? ART N V NP ?
ART N V ART NP ? ART N V ART N ?
the N V ART N ? the mouse V ART N ?
the mouse hugs ART N ? the mouse hugs the
N ? the mouse hugs the cat.
2The importance of context
In English base can mean cowardly and ball
can mean dance. If we use CFG rules like base
? cowardly ball ? dance to express this,
they would allow us to derive baseball ?
cowardly dance Typically words have many
synonyms base ? foundation?alkali?headquarters?co
wardly Need knowledge of context before
substituting, and so want to take into account
adjoining words base line ? starting
point base metal ? not precious metal base
villain ? cowardly evil-doer. Instead of CFG,
need a grammar whose rules allow us to replace
one whole string of symbols (terminals and
non-terminals) by another. We need a Type 0 or
phrase-structure grammar.
3Phrase-structure grammars
- A type 0 grammar has rules u ? v where u and v
are any strings of terminals and non-terminals. - Example Let L anbncn n gt 0.
- We use non-terminals A, B, C, plus two more that
we leave until later, to build a type 0 grammar
for L. - We need 3 kinds of rules
- rules that give strings with an equal number of
As, Bs, and Cs (possibly not in the right
order) - rules that allow us to correct the order of
non-terminals - rules that let us replace non-terminals by
terminals (provided they are in the right order). - To generate all strings of form (ABC)n we use
- S ? ABCS ABC
- To let the non-terminals be realigned correctly
- BA ? AB, CA ? AC and CB ? BC
- The problem with the third kind of rule is that
we cant just use rules like A ? a, because such
a rule might be used before the non-terminals
have been put into the right order.
4Example (continued)
So we say that C can be replaced by c but only if
it is preceded by c or b cC ? cc and bC ?
bc Similarly B can be replaced by b if preceded
by b or a bB ? bb and aB ? ab And A can be
replaced by a if preceded by a aA ?
aa. Everything works fine as long as we have an a
at the front to get us started but where does
that a come from? We still cant finish things
off with a rule A ? a because then ABCABC could
become abcabc, in other words the terminals could
be substituted too soon. A sneaky solution is to
use a non-terminal F for Front. Using F, we can
allow A to be replaced by a only if it is
preceded by a or F aA ? aa and FA ? a To put
non-terminal F on the front (left) we need a new
start symbol S? ? FS
5Example (continued)
Now the complete grammar is ready S? ?
FS S ? ABCS ABC BA ?
AB CA ? AC CB ?
BC cC ? cc bC ?
bc bB ? bb aB ?
ab aA ? aa FA ? a Try the
grammar out by generating a few strings and see
whether you feel convinced it produces anbncn
n gt 0. Exercise What language is generated by
the following type 0 grammar? S ? ABS ?
? AB ? BA BA ? AB A
? a B ? b
6Recursive languages
Fact Languages generated by type 0 grammars are
the recursively enumerable languages (accepted by
TMs that may loop on strings not in the
language). Some of these languages (in fact all
those weve designed TMs for) are recursive
(accepted by TMs that halt on all input strings,
so for them the decision problem of Membership is
solvable). Question Is there a type of grammar
generating precisely the recursive
languages? Answer No. It follows from Rices
Theorem (ch 11) that no algorithm exists which
can always tell, by looking at the structure of
an arbitrary grammar, whether the language it
generates is recursive. The closest we have is
the idea of a type 1 grammar, also called a
context-sensitive grammar.
7Type 1 grammars
A type 1 grammar is context-sensitive like a type
0 grammar but insists that, for each rule u ?
v, length(u) length(v). The clever idea is
that since the rules are monotonic and cannot
generate ?, a brute-force algorithm can be used
to decide membership of strings, and so every
type 1 grammar generates a recursive
language. Example S ? aSBA ? abA AB ?
BA bB ? bb bA ? ba aA ? aa Context-sensitive
languages are accepted by machines called linear
bounded automata (LBAs). LBAs are Turing
machines in which, for every input string w,
markers are placed on the tape length(w) apart
and only this space is used for computation.
8The Chomsky hierarchy
Outer space
Recursively enumerable
Recursive
Context-sensitive
Context-free
Deterministic context-free