Title: RESUME THEORY OF COMPUUTATION
1RESUMETHEORY OF COMPUUTATION
2Problems in CS
- Are all problems programmable?
- What statement of a problem constitutes an
implementable program? - Do the specifications of a program always lead to
a program? - Is it always possible to find specifications of a
problem that lead to a program?
3Purposes of Finite Automata
- Design a language for the mathematical
specification of computer languages in general
for all computers - A DFA is a program to design programs that use a
constant amount of memory - Examples are ATMs, Memory machines, Parity
machines, Adding machines, Web searches, news
analysts search, stock analysts searches,
shopping robots, GREP in Unix
4Adding Machine
1 1 1 101101 111001 1100110
States are ( i1 i2 carry) (000), (001), (010),
(011), (100), (101), (110), (111)
5Definition of a DFA
DFA A (Q, ?, ?, s, F) Q - finite set of
states ? - finite input alphabet ? -
transition function from Q x ? -- Q s -
initial state s ? Q F - favorable, or
accepting, states F ? Q
6Example 1
a
a
b
s
r
q
b
a
b
Accepts language, L(anbm)
Ex statement, statement,, if-then, if-then, ,
end
7Example 2
a
s
r
q
a
b
b
Accepts any language with 2 consecutive
as L(, a, a, )
Ex Any program with a nested pair of while
loops
8Example 3
b
b
a
a
s
f
q
a
b
L(a, bn, a, bm)
a
b
p
Ex one case followed by multiple statements
9Applications in Computer Science
- Programming sequences, branching, loops
- Pattern matching (for WWW AI)
- Lexical analysis in compilers
- Finite state machines in software specs design
- Word processors
- Design of telecommunication protocols
- Design of circuits for VLSI
- Hardware design
- Control mechanisms
10Definition of a configuration
A configuration is a composite of state (pieces
left), position (pattern on the board), input
(next moves). Ex In reading input aaaabba,
after state aaa has been reached, position q
can be reached by reading input abba (q,abba)
is a configuration towards acceptance. w is
accepted if it yields a favorable state. If
(q,w) --(q1,w1) then ? s ? ? w s w1 d
(q, s ) q1 then (q,w) yields (q1,w1) in one
step (q,w) yields (q1,w1) if ? a sequence of
configurations (q1,w1) (qk,wk) (q1,w1)
(q,w) , (qk,wk) (q1,w1) (qi,wi) yields
(qi1,wi1) in one step. All configurations
yield unique configurations as it is
deterministic.
11Non-deterministic Finite Automata
- Union of 2 DFAs gives a NDFA as more than 1 arrow
from any state with the same input. - For example, in pattern matching of A ? B, we may
match A or we may match B. - NDFA A ( Q,?,?,s,F )
- where ? ? Q x (? ? e) x Q is a transition
relation (q, a, p) ? ? - The empty string as input permits the
automaton to jump from one state to the other. - Examples are searches for one of two words,
finding all occurrences of a name in a DB, threads
12Equivalence of NDFA and DFA
- Definition A A1 that accept the same language
are equivalent. - Theorem
- For all non-deterministic automaton A there
exists a deterministic finite automaton A1
equivalent to A.
13NDFA -- DFA
- Combine all accepting states into one
- Combine states, q r, that go to the same next
state, p, on the same input, a. - If needed, add a virtual start state to above
with e jumps - If needed in case of one state, q, going to two
states, r s, on same input a, add a virtual
state with e jumps to q1 q2.
14Regular Expressions
- Regular expressions are algebraic expressions
used to - Design an algorithmic procedure for mathematical
specification of languages acceptable by a
computers of all types - To convert FA back to its specs through an
algorithmic procedure. - To do simulations of computers and newly invented
languages. - They are used in lexical analyzers
15Language for regular expressions
Some of the complex programming languages can be
obtained from simpler languages using ?, ?, ?,
concatenation Kleene stars. Concatenation of
strings u, v is uv of languages L1, L2 is
L1L2 uv u ?L1, v ? L2 Kleene Star L of
L is the infinite union e ? L ? L2 ?
L3 L w1w2 wk when wi ? L Theorem If
languages L M are acceptable by a finite
automaton, so are L ? M, L ? M, ? - L
(complement), L M, LM (conc), L (Kleene Star)
16Definition of a Regular Expression
- It is a string over an alphabet
- ? ? (,), e, ?, ?,
- It is mostly used in lexical analyzers.
- ? , e, ?a ? ? ( the ground elements) are regular
expressions. - If ?, ? are regular expressions then
? ? ?, ??, ? are regular expressions by
induction - No other string is a regular expression.
17Examples
- L((ab)a) w w of form abna
- Identifier in C begins with a letter may be
followed by a string of letters digits. The
identifiers can thus be expressed by the regular
expression - (a-z ? A-Z) (a-z ? A-Z ? 0-9)
- Identifiers of languages with underscores can be
expressed by - (a-z ? A-Z) ((-(a-z ? A-Z ? 0-9))
(a-z ? A-Z ? 0-9))
18Regular Expression -- FA
- ground singletons become
- doubletons become
- (a ? b) becomes
b
e
a
e
19FA -- Reg. Exp.
- Replace 2 favorable states with 1.
- Label nodes 1 to n.
- Replace arrows from i to j from j to k with an
arrow labeled li,jlj,k - If there is an arrow from j to j, add label
li,jlji - Different arrows from i to j will be replaced by
l1 ? l2
20Example
becomes
21Non-regular languages
- FA regular languages can describe programs
using a fixed amount of memory regardless of
input. For loops you need non-regular
languages. - A loop in a program can be represented by a
regular language or a FA if we can insert or
remove iterations without changing the nature of
the program. Then the FA does not have to
remember the number of iterations needed. - Example L(anbn) ab, a2b2, a3b3, is not a
regular language because there MUST be n
iterations of a before accepting a b. - If the program must remember the number of
iterations needed, it is not a regular
expression.
22Pumping Lemma for Reg. Exp.
- It states that we can stuff iterations into the
program without making a difference to the
program - For every Reg.lang. L there exists a constant n
such that every string w in L, where w n, we
can break w into 3 strings, w xyz - y ? e
- xy
- For all k 0 the string xykz is also in L
23Steps in proof
- Assume it is regular.
- Find the defining property of language
- Define a y such that xy
- Find a k such that xykz destroys the property of
the language - You have now found one example that is not in the
language - Therefore the language cannot be regular
24Examples
- L(anbn) is not regular
- L(w 1p where p is a prime) is not regular
- Languages of even length palindromes, L(wwr),
and odd length palindromes, L(wbwr) are not
regular. - L(anbcm) is not regular
- This is because we cannot introduce more
iterations without destroying the nature of the
language or the program.
25Equivalence Minimization of Automata
- p q are equivalent if
- ?w ?(p, w) is an accepting state
- iff ?(q,w) is an accepting state
- If 2 states are not equivalent, then they are
distinguishable, and one is accepting while the
other is not. - The idea is to find the FA (or RE) with the
minimal number of states equivalent to the one
being used. - Equivalent automata are indistinguishable by the
time they get to an accepting state. That is, if
they both get accepted with the same inputs, they
are indistinguishable.
26Definition of equivalent states
- States p q are equivalent if for all input
strings w, - ?(p, w) is an accepting if and only if
- ?(q, w) is an accepting state.
- Any pair of states that are not distinguishable
as they proceed to be accepted are equivalent. - Any state that is not accepting cannot be
equivalent to any accepting state. - States that reach an accepting state with the
same single input are equivalent. - States that reach an accepting state with the
same multiple input are equivalent.
27Proof of inequivalency
- It is easier to show the states that are not
equivalent than the ones that are equivalent on a
very large FA. To prove they are equivalent all
inputs (0, 1, 00, 01, 11, 000, ) must be
considered. - The accepting state(s) is(are) not equivalent to
any non-accepting state - Going back one step, then 2 then 3..., from the
accepting state(s), the states getting there on
different inputs are not equivalent. - This can be accomplished with an inequivalence
table. - Eliminate first all unreachable states.
- Equivalate all accepting states. All
non-accepting states are inequivalent to
accepting states.
28Context-free Grammars
- Context-free grammars are those that can be
recognized by a FA - Since the 1960s CFGs have been used to turn out
parsers automatically. - They are used to describe document formats with
Document Type Definitions, DTDs used in XML for
creating Web pages. - Grammars can define languages through the use of
Parse Trees.
29Example
- The language of palindromes cannot be represented
by REs, but it can be defined recursively as - e, 0, 1 are palindromes
- If w is a palindrome, so are 0w0 and 1w1
- The productions (or rules) are
- P ? e
- P ? 0
- P ? 1
- P ? 0P0
- P ? 1P1
30Definition of a CFG
- Set of symbols that form the strings of the
grammar are called terminals, ? - Set of variables, or strings, are called
non-terminals, NT - A start symbol, S ? NT
- Set of productions or rules, R
- R ? NT x (? ? NT)
- which serve to derive the terminals from the
non-terminals
31Example of a VERY limited language
From this strings can be generated automatically
but they may not be sensical.
S ? NpVp Np ? N ApN e Ap ? ApA e Vp ?
VNp A ? N ? ... V ? ...
Given a sentence, we can verify if it is in the
language if we can find a sequence of derivations
using the rules. This is what a parser
does. Derivations can be left or right.
32Parse trees
Parse trees are graphical representations of
productions, or rules, which does not
differentiate between left or right derivations.
33Applications
1. Parsers were the first applications of CFG 2.
The YACC command in Unix is a CFG that creates
either a tree or a piece of object code. It
allows to state the precedence of operators
in expressions. 3. XML (Extensible Mark-up
Language) was the precursor is a superset
of HTML (Hypertext Markup Language), a language
with which Web pages are created, both require
a DTD (Document Type Definition) which is a
CFG describing the tags allowed.
34Pushdown Automata
FA can easily be expressed as derivations, so
that any FA can be expressed as a CFG ??(a,s)
s1 can be expressed as s ? as1 However, there
are non-regular, context-free languages such as
L(anbn). If we add a stack to a FA such that
every time it reads a b it pops an a, then it
does not need to remember how many as there
were.
35Mechanism of stack state transitions
- q0 is the state that represents a guess that we
are not yet in the middle. In state q0 we read
input symbols push them onto the stack. - At any time we guess that we have seen the
middle go to state q1. Here the right part of w
will be on top of the stack and the left part on
the bottom. - In state q1 we compare input symbols with the
symbol at the top of the stack. If they do not
match, the guess was wrong this branch dies. - If the input symbol matches the symbol on the
top of the stack, we start popping until the
stack is empty enter an accepting state.
36Definition of Pushdown Automata
P (Q, ?, ?, s0, a0, ?, F) where Q is a finite
set of states ? is the input alphabet ? is the
set of stack symbols s0? Q is the initial
state a0 is the start symbol needed at bottom of
stack to get to a
favorable state after stack has been
emptied ?(s,a,X s?Q, a???e, X??) is the
transition relation with output (p,S p?Q, S is
string of symbols replacing X on top of stack)
F?Q is the set of favorable states.
37CFG ? PDA
There is a PDA that accepts input strings by
empty stack rather than by a favorable state, and
it is described by P (Q, ?, ?, s0, a0,
?) Theorem There is an algorithm that accepts a
string by empty stack rather than by favorable
state, and the two algorithms are
equivalent. Theorem Given any CFG, G, there
exists an algorithm that constructs a PDA, A,
such that L(A) L(G)
38PDA ? CFG RL ? PDA ? CFG
Theorem Given any PDA, A, there exists an
algorithm that constructs a CFG, G, such that
L(G) L(A) Theorem Every Regular Language is
context free.
39Languages that are not Context-Free
Take language S -- uAz -- uvAyz --
uvxyz Then it must be true that A -- x and A
-- vAy Then we can derive further, getting S
-- uAz -- uvAyz -- uv2Ay2z -- uv3Az3z -- ..
-- uvnAynz Lemma Let G (?, NT, R, S) be a
context-free grammar. Then there exists a number
n such that any string w ? L(G) with length w ?
n can be written as w uvxyz for some strings u,
v, x, y, z ? ?, and such that 1. v 0, or y
0 2. vxy ? n 3. For any k ? 0, uvkxykz ?
L(G)