Title: Introduction to Language Theory
1Introduction to Language Theory
Programming Language Translators
- Prepared by
- Manuel E. Bermúdez, Ph.D.
- Associate Professor
- University of Florida
2Introduction to Language Theory
- Definition An alphabet (or vocabulary) S is a
finite set of symbols. - Example Alphabet of Pascal
- - / lt (operators)
- begin end if var (keywords)
- ltidentifiergt (identifiers)
- ltstringgt (strings)
- ltintegergt (integers)
- , ( ) (punctuators)
- Note All identifiers are represented by one
symbol, because S must be finite.
3Introduction to Language Theory
- Definition A sequence t t1t2tn of symbols
from an alphabet S is a string. - Definition The length of a string t t1t2tn
(denoted t) is n. If n 0, the string is e,
the empty string. - Definition Given strings s s1s2sn and
- t t1t2tm, the concatenation of s and t,
denoted st, is the string s1s2snt1t2tm.
4Introduction to Language Theory
- Note eu u ue, uev uv, for any strings u,v
(including e) - Definition S is the set of all strings of
symbols from S. - Note S is called the reflexive, transitive
closure of S. - S is described by the graph (S, ), where
denotes concatenation, and there is a designated
start node, e.
5Introduction to Language Theory
- Example S a, b.
- (S, )
- S is countably infinite, so cant compute all of
S, and can only compute finite subsets of S,
but can compute whether a given string is in S.
aa
a
a
aba
b
a
ab
a
b
abb
b
ba
a
b
b
bb
6Introduction to Language Theory
- Example S Pascal vocabulary.
- S all possible alleged Pascal programs,
i.e. all possible inputs to Pascal compiler. - Need to specify L ? S, the correct Pascal
programs. - Definition A language L over an alphabet S is a
subset of S.
7Introduction to Language Theory
- Example S a, b.
- L1 ø is a language
- L2 e is a language
- L3 a is a language
- L4 a, ba, bbab is a language
- L5 anbn / n gt 0 is a language
- where an aaa, n times
- L6 a, aa, aaa, is a language
- Note L5 is an infinite language, but described
finitely.
8Introduction to Language Theory
- THIS IS THE MAIN GOAL OF LANGUAGE SPECIFICATION
- To describe (infinite) programming languages
finitely, and to provide corresponding finite
inclusion-test algorithms.
9Language Constructors
- Definition The catenation (or product) of two
languages L1 and L2, denoted L1L2, is the set - uv u?L1, v?L2.
- Example L1 e, a, bb, L2 ac, c
- L1L2 ac, c, aac, ac, bbac, bbc
- ac, c, aac, bbac, bbc
10Language Constructors
- Definition Ln LLL (n times),
- and L0 e.
- Example L a, bb
- L3 aaa, aabb, abba,
abbbb, bbaa, bbabb, bbbba, bbbbbb
11Language Constructors
- Definition The union of two languages L1 and L2
is the set L1 L2 u u?L1 v v?L2 - Definition The Kleene star (L) of a language is
the set L U Ln, n gt0. - Example L a, bb
- L any string composed of as and
- bbs
- Definition The Transitive Closure (L) of a
language L is the set L U Ln, n gt 1.
n
n
12Language Constructors
- Note
- In general, L L U e, but L ? L - e.
- For example, consider L e. Then
- e L ? L e e e ø.
13Grammars
- Goal Providing a means for describing languages
finitely. - Method Provide a subgraph (S, ?) of (S, ),
and a start node S, such that the set of
reachable nodes (from S) are the strings in the
language.
14Grammars
- Example S a, b
- L anbn / n gt 0
a
aaa
aaba
a
aa
a
aab
b
b
b
a
ab
a
aabb
a
ba
bbaa
a
b
a
b
bba
b
bb
bbab
b
bbb
b
15Grammars
- gt (derives) is a relation defined by a finite
set of rewrite rules known as productions. - Definition Given a vocabulary V, a production is
a pair (u, v) ? V x V, denoted u ? v. u is
called the left-part v is called the right-part.
16Grammars
- Example Pseudo-English.
- V Sentence, NP, VP, Adj, N, V, boy, girl,
the, tall, jealous, hit, bit - Sentence ? NP VP (one production)
- NP ? N
- NP ? Adj NP
- N ? boy
- N ? girl
- Adj ? the
- Adj ? tall
- Adj ? jealous
- VP ? V NP
- V ? hit
- V ? bit
- Note English is much too complicated to be
described this way.
17Grammars
- Definition
- Given a finite set of productions P ? V x V
the relation gt is defined such that - ?, ß, u, v ? V , ?uß gt ?vß iff
- u ? v ? P is a production.
- Example
- Sentence ? NP VP Adj ? the
- NP ? N Adj ? tall
- NP ? Adj NP Adj ? jealous
- N ? boy VP ? V NP
- N ? girl V ? hit
- V ? bit
18Grammars
- Sentence gt NP VP
- gt Adj NP VP
- gt the NP VP
- gt the Adj NP VP
- gt the jealous NP VP
- gt the jealous N VP
- gt the jealous girl VP
- gt the jealous girl V NP
- gt the jealous girl hit NP
- gt the jealous girl hit Adj NP
- gt the jealous girl hit the NP
- gt the jealous girl hit the N
- gt the jealous girl hit the boy
19Grammars
- Definition A grammar is a 4-tuple G (F, S, P,
S) - where
- F is a finite set of nonterminals,
- S is a finite set of terminals,
- V F U S is the grammars vocabulary,
- S ? F is called the start or goal symbol,
- and P ? V x V is a finite set of productions.
- Example Grammar for anbn / n gt 0.
- G (F, S, P, S), where
- F S,
- S a, b,
- and P S ? aSb, S ? e
20Grammars
- Derivations
- S gt aSb gt aaSbb gt aaaSbbb gt aaaaSbbbb ?
-
- e ab aabb aaabbb
aaaabbbb - Note Normally, grammars are given by simply
listing the productions.
gt
gt
gt
gt
gt
21Grammar Conventions
- TWS
convention - Upper case letter (identifier) nonterminal
- Lower case letter (string) terminal
- Lower case greek letter strings in V
- Left part of the first production is assumed to
be the start symbol, e.g. - S ? aSb
- S ? e
- Left part omitted if same as for preceeding
production, e.g. - S ? aSb
- ? e
22Grammars
- Example Grammar for identifiers.
- Identifier ? Letter
- ? Identifier Letter
- ? Identifier Digit
- Letter ? a ? A
- ? b ? B
- .
- .
- ? z ? Z
- Digit ? 0
- ? 1
- .
- .
- ? 9
23Grammars
- Definition The language generated by a grammar
G, is the set L(G) ? ? S S gt ? - Definition A sentential form generated by a
grammar G is any string a such that S gt ? . -
- Definition A sentence generated by a grammar G
is any sentential form ? such that ? ? S.
24Grammars
- Example
- sentential forms
- S gt aSb gt aaSbb gt aaaSbbb gt aaaaSbbbb gt
-
- e ab aabb aaabbb
aaaabbbb -
- Lemma L(G) ? is a sentence
- Proof Trivial.
gt
gt
gt
gt
gt
sentences
25Grammars
- Example A ? aABC
- ? aBC
- aB ? ab
- bB ? bb
- bC ? bc
- CB ? BC
- cC ? cc
-
26Grammars
- Derivations A gt aABC gt aaABCBC gt
-
- aBC aaBCBC
aaaBCBCBC -
- abC aabCBC aaaBBCBCC
-
- abc aabBCC
aaaBBBCCC -
- aabbCC aaabBBCCC
- (2)
- aabbcC aaabbbCCC
-
- aabbcc aaabbbcCC
-
(2) -
aaabbbccc - L (G) anbncn n gt 1
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
gt
27The Chomsky Hierarchy
- A hierarchy of grammars, the languages they
generate, and the machines the accept those
languages.
28The Chomsky Hierarchy
Type Language Name Grammar Name Restrictions On grammar Accepting Machine
0 Recursively Enumerable Unrestricted re-writing system None Turing Machine
1 Context-Sensitive Language Context- Sensitive Grammar For all ???, ?? Linear Bounded Automaton
2 Context- Free Language Context- Free Grammar For all ???, ??F. Push-Down Automaton (parser)
3 Regular Language Regular Grammar For all ???, ??F, ???U ?FU? Finite- State Automaton
29Language Hierarchy
0 Recursively Enumerable Languages
1 Context-Sensitive Languages
2 Context-free Languages
We will deal with type 2 (syntax) and type 3
(lexicon) languages.
3 Regular Languages an n gt 0
anbn ngt0
anbncn ngt0
English?