Title: 91'304 Foundations of Theoretical Computer Science
191.304 Foundations of (Theoretical) Computer
Science
- Chapter 1 Lecture Notes
- David Martin
- dm_at_cs.uml.edu
This work is licensed under the Creative Commons
Attribution-ShareAlike License. To view a copy of
this license, visit http//creativecommons.org/lic
enses/by-sa/2.0/ or send a letter to Creative
Commons, 559 Nathan Abbott Way, Stanford,
California 94305, USA.
2Chapter 1 Regular Languages
- Simple model of computation
- Input a string, and either accept or reject it
- Models a very simple type of function, a
predicate on strings f ? ! 0,1 - See example of a state-transition diagram
3Syntax of DFA
- A deterministic finite automaton (DFA) is a
5-tuple (Q,?,delta,q0,F) such that - Q is a finite set of states
- ? (sigma) is an alphabet
- ?Q?!Q (delta) is the transition function
- q02 Q (q naught) is the start state
- F µ Q is the set of accepting states
- Usually these names are used, but others are
possible as long as the role is clear
4DFA syntax
- It is deterministic because for every input
(q,c), the next state is a uniquely determined
member of Q - because the codomain of ? is Q
- Fix the previous example to fit these constraints
- The same example DFA, specified formally
5DFA computation
- This definition is different from but equivalent
to the one in the text - Let M(Q,?,?,q0,F) be a DFA. We define the
extended transition function ?Q?!
Qinductively as follows. For all q2 Q,
?(q,?) q.If w2? and c2?, let ?(q,wc)
?(?(q,w),c) - According to this definition, ?(q,x) is the
state of the machine after starting in state q
and reading the entire string x - See example
6Language recognized by DFA
- The language recognized by the DFA M is written
L(M) and defined as L(M)x2? ?(q0,x) 2 F - Think of L() as an operator that turns a program
into the language it specifies - We will use L() for other types of machines and
grammars too
7Example
- Let L2x20,1 the binary number x is a
multiple of 2 and build a DFA M2 such that
L(M2) L2 - Remember this means L(M2) µ L2 and L2 L2 µ L(M2)
8Definition of regular languages
- A language L is regular if there exists a DFA M
such that L L(M) - The class of regular languages over the alphabet
? is called REG and defined REG L µ ? L
is regular L(M) M is a DFA
over ? - Now we know 4 classes of languages , FIN, REG,
and ALL
9Problems
- For all k1, let Ak0kn n0. Prove that (8
k1) Ak 2 REG - Solution is a scheme, not a single DFA
- (Harder) Build a DFA for L3x20,1 the binary
number x is a multiple of 3 - Build a DFA for L4x2a,b x contains an odd
of bs and an even of as
10Measuring DFA complexity
- Suppose
- you have a DFA with states named 00000000 ..
11111111 (28 256 unique states) - an LCD attached to the thing showing the current
state name - ? c (for clock pulse)
- ?(q, c) (q 1) 0xFF
- This is a simple counter machine feed it clocks
and it counts upwards
11Measuring DFA complexity
- Time complexity
- A DFA always takes one transition per input
character - So time complexity is not useful here
- Program complexity
- A DFAs program is (mostly) its ?
- The model specifies no particular programming
language for ? its just a table mapping
(state, input) pairs to (state) outputs - Though it can sometimes be specified concisely,
as in ?(q, c) (q 1) 0xFF - Reprogram the clock for any permutation of 0,18
and ?s table remains just as big
12Measuring DFA complexity
- Space complexity the amount of memory used
- But a DFA has no extra memory it only remembers
what state it is in - Cant look back or forward
- So a DFA always uses the same amount of memory,
namely the amount of memory required to remember
what state its in - Needs to remember current element of Q
- Can write down that number in log2 Q bits
13DFAs as real computers
- Consider a 256 MB computer that takes a finite
input and produces a finite output - Inputs clock pulses, interrupts, hard drive,
keyboard, mouse, network, etc. - Outputs video, hard drive, network, etc.
- Can code everything in binary
- But DFA only accepts or rejects input
14Recognition model for functions
- Can still sort of be modeled by a DFA
- PC x y x,y 20,1 and the input x
produces the output y - Note character is just a separator
- DFA plays the role of equipment verifier
- Verifying correctness seems easier than computing
the output, but at least its related
15Are DFAs reasonable?
- One issue is that the programs dont seem to
reflect much about the problem being solved - If you can figure out how many bits of memory are
needed for the solution, then you can always
build a DFA based on that knowledge could be
tedious and really large - No difference in program complexity between same
amount of memory means DFAs dont help us see the
difference between programs very easily - Neural nets??
16Are DFAs reasonable?
- Similarly An 8-bit counter is structurally very
different than a 9-bit counter - More memory needed ) totally different ? program
needed - Not very modular!
17Are DFAs reasonable?
- Another issue is that DFAs prefer the beginning
of their inputs to the end of their inputs - L5 x20,1 the fifth digit from the left
of x is 0 - L6 x20,1 the fifth digit from the right
of x is 0 - DFAs know where the input begins but not where it
ends
18Is REG reasonable?
- We should be able to combine computations as
subroutines in simple ways - logical OR (A B)
- logical AND (A Å B)
- concatenation (A B) and star (A)
- hard to prove!! motivation for NFA
- compl?ment (Ac)
- reversal (AR)
- All above are easy to do as logic circuits
- Will discuss further as closure under language
operations
19Nondeterministic Finite Automata
- Will relax two of these DFA rules
- Each (state, char) input must produce exactly one
(state) output - Must consume one character in order to advance
state - Example L6 ?bob?
- See M6
- The NFA accepts the input if there exists any way
of reading the input that winds up in an
accepting state at the end of the string - Otherwise it rejects the input
20NFAs
- Thus the NFA rejects the input if there doesnt
exist any way of reading the input that winds up
in an accepting state at the end of the string - In other words every way of reading the input
leads to a nonaccepting state - Example M7
- L7 ?
a
b
c
?
?
1
2
3
21Ways to think of NFAs
- NFAs want to accept inputs and will always take
the most advantageous alternative(s) - Because they will accept if there exists any way
to get to an accepting state at the end of the
string - The quickest way there may be just one of many
ways, but it doesnt matter - http//www.chompchomp.com/frag05/frag05.01.a.htm
22Ways to think of NFAs
a
a
a
- fork() model
- Input string is in a variable
- fork() at every nondeterministic choice point
- subprocess 1 (parent) follows first transition
- subprocess 2 (child) follows second
- subprocess 3 (child) follows third (if any), etc.
- A process that cant follow any transition calls
exit() -- and gives up its ability to accept - A process that makes it through the whole string
and is in an accepting state prints out ACCEPT - A single ACCEPT is enough
23Syntax of DFA (repeat)
- A deterministic finite automaton (DFA) is a
5-tuple (Q,?,delta,q0,F) such that - Q is a finite set of states
- ? is an alphabet
- ?Q ? ! Q is the transition
function - q02 Q is the start state
- F µ Q is the set of accepting states
- Usually these names are used, but others are
possible as long as the role is clear
24Syntax of NFA
- A nondeterministic finite automaton (NFA) is a
5-tuple (Q,?,delta,q0,F) such that - Q is a finite set of states
- ? is an alphabet
- ?Q(? ?)!P(Q) is the transition function
- q02 Q is the start state
- F µ Q is the set of accepting states
- Usually these names are used, but others are
possible as long as the role is clear
25Syntax of NFA
- Definition ?? ? ?
- Well use this frequently enough
- Differences on state-transition diagram
- ?(1,a) 1 (not ?(1,a) 1)
- ?(1,?) 1, 2
- ?(3, c) 2, 3
- ?(2,a)
- ?(3,?) 3
a
b
c
?
?
1
2
3
c
Example M8
26NFA computation
- This next definition is different from but
equivalent to the one in the text - Books definition may be easier to understand at
first, but that makes its version of Theorem 1.39
(subset construction) harder - Goal a function ?Q?! P(Q) where ?(q,x) is
the set of all states reachable in the machine
after starting in state q and reading the entire
string x - Then for an NFA M, we will define something like
L(M) x2? ?(q0,x) contains some
accepting state
27NFA computation
- Let M(Q,?,?,q0,F) be an NFA. We define some
auxiliary functions - E Q ! P(Q) by ("?-closure")
- E(q) p2 Q p is reachable from q by
following a chain of 0 or more ?
transitions - Although E takes elements of Q as input, we'll
also use it as a function that takes subsets of Q
as input (that is, elements of P(Q)). SoE P(Q)
! P(Q) by
In other words, given a set as input, just
process each element independently...
28NFA computation
- Thus E(q) is the set of all states you can get to
from q without reading any input - In M8, E(3) ? E(2,1) ?
- We define a simple extension of ? that takes a
set of states as input - ? Q ??! P(Q) (this comes with the NFA)
- ?P(Q)?? ! P(Q) defined by
-
Again, given a set as input, just process each
element independently...
29NFA computation
- We have a function E() that follows ?-transitions
and a function ? that behaves like ? but takes
sets as input - ?Q?! P(Q) is defined inductively For all q2
Q, ?(q,?) E( q ) - If w2? and c2?, let
- ?(q,wc) E(?(?(q,w),c))
30NFA computation
- Finally, we defineL(M) x2? ?(q0,x)
contains some accepting state
x2? - ?(1,ac) E(?(?(1,a),c))
- ?(1,a)E(?(?(1,?),a))
- ?(1,?) ?
- ?(1,ac) ?
?(q0,x) Å F ?
31Question
- "How do I know when to follow ? transitions and
when not to?" - If you're talking about ?, then don't--it's the
program itself. ? can express that "there is an
? transition here" but you never go any further
than that one hop. - If you're talking about ?, then do--because it
includes E() as part of its definition, which is
there precisely in order to follow ? transitions
32NFAs are good at union (or)
- L2x20,1 the binary number x is a multiple
of 2 - L3x20,1 the binary number x is a multiple
of 3 - Let A L2 L3
- NFA for A using guess-and-verify strategy
- Preview of Theorem 1.45
33The Subset Construction
- Theorem 1.39 For every NFA M1 there exists a DFA
M2 such that L(M1) L(M2) - Proof idea Well, how does fork() work on a
uniprocessor machine?
34The Subset Construction
- Proof Let M1(Q1,?,?1,init1,F1) be the NFA and
define the DFA M2(Q2,?,?2,init2,F2) as follows - Q2 P(Q1).
- Each state of the DFA records the set of states
that the NFA can simultaneously be in - Can compare DFA states for equality but also look
"inside" the state name to find a set of NFA
state names - Define ?2 Q2 ? ! Q2 ?2 P(Q1)? !
P(Q1) by - ?2(S,a) E1(?1(S,a)) Go to whatever states
are reachable from the states in S and reading
the character a
Remember in an NFA,?1 Q1 ?? ! P(Q1) from
def ?1P(Q1)?? ! P(Q1) extend to sets E1P(Q1)
!P(Q1) ?-closure
35The Subset Construction
- init2 E(init1)
- F2q 2 Q2 q Å F1? , in other wordsF2S µ
Q1 S Å F1? - The effect is that the DFA knows all states that
are reachable in the NFA after reading the string
so far. If any one of them is accepting, then
the current DFA state is accepting too, otherwise
it's not. - If you believe this then that's all it takes to
see that the construction is correct. So,
convince yourself with an example. QED
36Subset construction example
- Q2 ,1,2,3,1,2,1,3,2,3,1,2,3
- (On board)
- init21,2,3
- F23,1,3,2,3,1,2,3
a
b
c
?
?
3
1
2
c
Example M8 (think of this as M1 in the
construction)
37Be methodical
- Need to compute ?2(1,2,3,c)
E1(?1(1,2,3,c)) - By definition, ?1(1,2,3,c) ?1(1,c) ?1(2,c)
?1(3,c) -
2,3 - Then take E1( 2,3 ) 2,3
- Save intermediate results for reuse
- It's OK to eliminate unreachable states in
practice, even though that's not what the
construction really does
38Subset construction conclusion
- Adding nondeterminism makes programs shorter but
not able to do new things - Remember regular languages are defined to be
those "recognized by a DFA" - We now have a result that says that every
language that is recognized by an NFA is regular
too - So if you are asked to show that a language is
regular, you can exhibit a DFA or NFA for it and
rely on the subset construction theorem - Sometimes questions are specifically about DFAs
or NFAs, though... pay attention to the precise
wording
39More NFA examples
- Write an NFA for ab,abc with 3 states
- NFA and DFA for ? over ?0,1
- Rule ? 2 L(M) , ?
- NFA and DFA for over ?0,1
40Closure properties
- The presence or absence of closure properties
says something about how well a set tolerates an
operation - Definition. Let S µ U be a set in some universe
U and be an operation on elements of U. We say
that S is closed under if applying to
element(s) of S produces another element of S. - For example, if is a binary operation UU!U,
then we're saying that (8 x2S and y2S) x y 2 S
41Closure properties illustrated
U
Applying the operation to elements of S never
takes you ouside of S. S is closed with respect
to This example shows unary operations
S
42Closure properties
- Having a closure property usually means there is
some type of "natural fit" between the operation
and the set - Examples
- N is closed under and and but not - and
- Z is closed under and - and and unary -
(negation) but not or - Q-0 is closed under and but not or -
43More examples
- L1x2 0,1 x is a multiple of 3
- is closed under string reversal and concatenation
- L3x20,1 the binary number x is a multiple
of 3 - is also closed under string reversal and
concatenation, harder to see though - L4x2a,b x contains an odd of bs and an
even of as - is closed under string reversal
- is not closed under string concatenation
44Closure higher abstraction
- We will usually be concerned with closure of
language classes under language operations - Previous examples were closure of sets containing
non-set elements under various familiar
operations - We consider DFAs and NFAs to be programs and we
want assurance that their outputs can be combined
in desired ways just by manipulating their
programs (like using one as a subroutine for the
other) - Representative question is REG closed under
(language) concatenation?
45The regular operations
- The regular operations on languages are
- (union)
- (concatenation)
- (Kleene star)
- The name "regular operations" is not that
important - Too bad we use the word "regular" for so much
- REG is closed under these regular operations
- That's why they're called "regular" operations
- This does not mean that each regular language is
closed under each of these operations!
46The regular operations
- REG is closed under union Theorem 1.25 (using
DFAs), Theorem 1.45 (using NFAs) - REG is closed under concatenation Theorem 1.47
(NFAs) - REG is closed under Theorem 1.49 (NFAs)
- Study these constructions!!
- REG is also closed under complement and reversal
(not in book)
47Regular expressions
- You are probably familiar with these
- Example "int .\(.\)" is a (flex format)
regular expression that appears to match C
function prototypes that return ints - In our treatment, a regular expression is a
program that generates a language of matching
strings when you "run it" - We will use a very compact definition that
simplifies things later
48Regular expressions
- Definition. Let ? be an alphabet not containing
any of the special characters in this list ?
) ( We define the syntax of the
(programming) language REX(?), abbreviated as
REX, inductively - Base cases
- For all a2?, a2REX. In other words, each single
character from ? is a regular expression all by
itself. - ?2REX. In other words, the literal symbol ? is a
regular expression. In this context it is not
the empty string but rather the single-character
name for the empty string. - 2REX. Similarly, the literal symbol is a
regular expression.
49Regular expressions
- Definition continued
- Induction cases
- For all r1, r22 REX,( r1 r2 ) 2 REX
also - For all r1, r22 REX,( r1 r2 ) 2 REX also
literal symbols
variables
50Regular expressions
- Definition continued
- Induction cases continued
- For all r 2 REX,( r ) 2 REX also
- Examples over ?0,1
- ? and 0 and 1 and
- (((10)(?)))
- ?? is not a regular expression
- Remember, in the context of regular expressions,
? and are ordinary characters
51Semantics of regular expressions
- Definition. We define the meaning of the
language REX(?) inductively using the L()
operator so that L(r) denotes the language
generated by r as follows - Base cases
- For all a2?, L(a) a . A single-character
regular expression generates the corresponding
single-character string. - L(?) ? . The symbol for the empty string
actually generates the empty string. - L() . The symbol for the empty language
actually generates the empty language.
52Regular expressions
- Definition continued
- Induction cases
- For all r1, r22 REX,L( (r1 r2) ) L(r1)
L(r2) - For all r1, r22 REX,L( (r1 r2) ) L(r1)
L(r2) - For all r 2 REX,L( ( r ) ) (L(r))
- No other string is in REX(?)
- Example
- L( ( ((10)(?)) ) ) includes
- ?,10,1010,101010,10101010,...
53Orientation
- We used highly flexible mathematical notation and
state-transition diagrams to specify DFAs and
NFAs - Now we have a precise programming language REX
that generates languages - REX is designed to close the simplest languages
under , ,
54Abbreviations
- Instead of parentheses, we use precedence to
indicate grouping when possible. - (highest)
-
- (lowest)
- Instead of , we just write elements next to
each other - Example (((10)(?))) can be written as
(10(?)) but there is no further abbreviation - (Not in text) If r2 REX(?), instead of writing
rr, we write r
55Abbreviations
- Instead of writing a union of all characters from
? together to mean "any character", we just write
? - In a flex/grep regular expression this would be
called "." - Instead of writing L(r) when r is a regular
expression, we consider r alone to simultaneously
mean both the expression r and the language it
generates, relying on context to disambiguate
56Abbreviations
- Caution regular expressions are strings
(programs). They are equal only when they
contain exactly the same sequence of characters. - (((10)(?))) can be abbreviated (10(?))
- however (((10)(?))) ? (10(?)) as strings
- but (((10)(?))) (10(?)) when they are
considered to be the generated languages - more accurately then, L( (((10)(?))) )
L( (10(?)) ) - L( (10) )
57Facts
- REX(?) is itself a language over an alphabet ?
that is - ? ? ) , ( , , , ? ,
- For every ?, REX(?) 1
- ,(),(()),...
- even without knowing ? there are infinitely many
elements in REX(?) - Question Can we find a DFA or NFA M with L(M)
REX(?)?
58Examples
- Find a regular expression for w20,1 w ?
10 - Find a regular expression for x20,1 the
6th digit counting from the rightmost
character of x is 1 - Find a regular expression forL3x20,1 the
binary number x is a multiple of 3
59The DFA for L3
1
0
1
0
1
0
2
0
1
(0 1 0)
Regular expression(0 1 _____________ 1 )
60Regular expression for L3
- (0 1 (0 1 0) 1 )
- L3 is closed under concatenation, because of the
overall form ( ) - Now suppose x2L3. Is xR 2 L3?
- Yes see this is by reversing the regular
expression and observing that the same regular
expression results - So L3 is also closed under reversal
61Regular expressions generate regular languages
- Lemma 1.55 For every regular expression r, L(r)
is a regular language. - Proof by induction on regular expressions.
- We used induction to create all of the regular
expressions and then to define their languages,
so we can use induction to visit each one and
prove a property about it
62L(REX) µ REG
- Base cases
- For every a2 ?, L(a) a is obviously
regular - L(?) ? 2 REG also
- L() 2 REG
a
63L(REX) µ REG
- Induction cases
- Suppose the induction hypothesis holds for r1 and
r2. Namely, L(r1) 2 REG and L(r2) 2 REG. We
want to show that L( (r1 r2) ) 2 REG also. But
look by definition, L( (r1 r2) ) L(r1)
L(r2) - Since both of these languages are regular, we
can apply Theorem 1.45 (closure of REG under )
to conclude that their union is regular.
64L(REX) µ REG
- Induction cases
- Now suppose L(r1)2 REG and L(r2)2 REG. By
definition, L( (r1 r2) ) L(r1) L(r2) - By Theorem 1.47, this concatenation is regular
too. - Finally, suppose L(r)2 REG. Then by
definition, L( (r) ) (L(r)) - By Theorem 1.49, this language is also regular.
QED
65On to REG µ L(REX)
- Now we'll show that each regular language (one
accepted by an automaton) also can be described
by a regular expression - Hence REG L(REX)
- In other words, regular expressions are
equivalent in power to finite automata - This equivalence is called Kleene's Theorem (1.54
in book)
66Converting DFAs to REX
- Lemma 1.60 in textbook
- This approach uses yet another form of finite
automaton called a GNFA (generalized NFA) - The technique is easier to understand by working
an example than by studying the proof
67Syntax of GNFA
- A generalized NFA is a 5-tuple (Q,?,?,qs,qa) such
that - Q is a finite set of states
- ? is an alphabet
- ?(Q-qa)(Q-qs)! REX(?) is the transition
function - qs2 Q is the start state
- qa2 Q is the (one) accepting state
68GNFA syntax summary
- Arcs are labeled with regular expressions
- Meaning is that "input matching the label moves
from old state to new state" -- just like NFA,
but not just a single character at a time - Start state has no incoming transitions, accept
has no outgoing - Every pair of states (except start accept) has
two arcs between them - Every state has a self-loop (except start
accept)
69Construction strategy
- Will convert a DFA into a GNFA then iteratively
shrink the GNFA until we end up with a diagram
like thismeaning that exactly that input
that matches the giant regular expression is in
the langauge
giant regular expression
qa
qs
70Converting DFA to GNFA
1
0
1
0
DFA
1
0
2
0
1
qa
1
0
Adding new start state qs is straightforward Then
make each DFA accepting state have an ?
transition to the single accepting state qa
1
0
?
1
2
0
0
1
?
qs
GNFA
71Interpreting arcs
- ?(Q-qa)(Q-qs)! REX(?)In this diagram,
- ?(0,1)1 ?(2,0) ?(2,qa)
- ?(1,1) ?(2,2)1 ?(0,qa)?
qa
1
0
1
0
?
1
2
0
0
1
?
qs
72Eliminating a GNFA state
- We arbitrarily choose an interior state (not qs
or qa) to rip out of the machine
Question how is the ability of state i to get to
state j affected when we remove rip? Only the
solid and labeled states and transitions are
relevant to that question
R4
i
j
R1
R3
rip
R2
73Eliminating a GNFA state
- We produce a new GNFA that omits rip
- Its i-to-j label will compensate for the missing
state - We will do this for every (i,j) 2
(Q-qa)(Q-qs) - So we have to rewrite every label in order to
eliminate this one state - New label for i-to-j is
- R4 (R1 (R2) R3)
R4
i
j
R1
R3
rip
R2
74Don't overlook
- The case (i,i) 2 (Q-qa)(Q-qs)
- New label for i-to-i is still
- R4 (R1 (R2) R3)
- Example proceeds on whiteboard, or see textbook
for a different one
R4
i
R3
R1
rip
R2
75g/re/p
- What does grep do?
- (int float)_rec.emp becomes
- (?)(int float)_rec(?)emp(?)
- What does it mean?
- How does it work?
- Regular expression ! NFA ! DFA ! state reduction
- Then run DFA against each line of input, printing
out the lines that it accepts
76State machines
- Very common programming technique
- while (true)
- switch (state)
- case NEW_CONNECTION
- process_login()
- stateRECEIVE_CMD
- break
- case RECEIVE_CMD
- if (process_cmd() CMD_QUIT)
- stateSHUTDOWN
- break
- case SHUTDOWN
-
-
-
-
77This course so far
- 1.1 Introduction to languages DFAs
- 1.2 NFAs and DFAs recognize the same class of
languages - 1.3 REX generates the same class of languages
- Three different programming "languages" specified
in different levels of formality that solve the
same types of computational problems - Four, if you count GNFAs
- Five, if you count UFAs
78Strategies
- If you're investigating a property of regular
languages, then as soon as you know L 2 REG, you
know there are DFAs, NFAs, Regexes that describe
it. Use whatever representation is convenient - But sometimes you're investigating the properties
of the programs themselves changing states,
adding a to a regex, etc. Then the knowledge
that other representations exist might be
relevant and might not
79All finite languages are regular
- Theorem (not in book) FIN µ REG
- Proof Suppose L 2 FIN.
- Then either L , or L s1, s2, ?, sn where
n2N and each si2?. - A regular expression describing L is, therefore,
either or - s1 s2 ? sn QED
- Note that this proof does not work for n1
80Picture so far
ALL
Each point is a language in this Venn
diagram REG L(DFA) L(NFA) L(REX)
L(UFA) L(GNFA) ? FIN
REG
is there a language out here?
FIN
"the class of languages generated by DFAs"
811.4 Nonregular languages
- For each possible language L,
- µ L. So is the smallest language. And is
regular - L µ ?. So ? is the largest language. And ? is
regular - Yet there are languages in between these two
extremes that are not regular
82A nonregular language
- B 0n 1n n 0
- ?, 01, 0011, 000111, ?
- is not regular
- Why?
- Q how many bits of memory would a DFA need in
order to recognize B? - A there appears to be no single number of bits
that's big enough to work for every element of B - Remember, the DFA needs to reject all strings
that are not in B
83Other examples
- C w20,1 n0(w) n1(w)
- Needs to count a potentially unbounded number of
'0's... so nonregular - D w20,1 n01(w) n10(w)
- Needs to count a potentially unbounded number of
'01' substrings... so ?? - Need a technique for establishing nonregularity
that is more formal and... less intuitive?
84Proving nonregularity
- To prove a language that a language is
nonregular, you have to show that no DFA
whatsoever recognizes the language - Not just the DFA that is your best effort at
recognizing the language - The pumping lemma can be used to do that
- The pumping lemma says that every regular
language satisfies the "regular pumping property"
(RPP) - Given this, if we can show that a language like B
doesn't satisfy the RPP, then it's not regular
85Pumping lemma, informally
- Roughly "if a regular language contains any
'long' strings, then it contains infinitely many
strings" - Start with a regular language and suppose that
some DFA M(Q,?,?,q0,F) for it has Q10 states. - What if M accepts some particular string s where
sc1c2?c15 so that s15?
q0
86Pigeonhole principle
- With 15 input characters, the machine will visit
at most 16 states - But there are only 10 states in this machine
- So clearly it will visit at least one of its
states more than once - Let rpt be our name for the first state that is
visited multiple times on that particular input s - Let acc be our name for the accepting state that
s leads to, namely, ?(q0,s) acc - Let y be our name for the leftmost substring of s
for which ?(rpt, y)rpt - Since there are no ? transitions in a DFA, a
state being "visited multiple times" means that
it read at least one character. Therefore, y gt
0
87sequence of states that M visits after
readingthe characters below
gt0
10
After reading c1? c10 (first 10 chars of s), M
must have already been to state rpt and returned
to it at least once... because there are only 10
states in M. Of course the repetition could have
been encountered earlier than 10 characters too...
88sequence of states that M visits after
readingthe characters below
gt0
10
Assigning new names to the pieces of s...
89sequence of states that M visits after
readingthe characters below
gt0
10
Assigning new names to the pieces of s... So s
xyz as shown above. With these names, the other
constraints can be written y gt 0 xy 10
90M accepts other strings too
91M accepts other strings too
- Consider the string xz
- ?(q0,x) rpt
- ?(rpt,z) acc (from previous slide)
- So xz 2 L(M) too
92M accepts other strings too
- Consider the string xyyz
- ?(q0,xy)rpt (from 2 slides ago)
- ? (rpt,y)rpt (from same previous result)
- ? (rpt,z)acc (from same previous result)
- So xyyz2 L(M) also
- Apparently we can repeat y as many times as we
want
93p-regular-pumpable strings
- Definition (not in textbook) A string s is said
to be p-regular-pumpable in a language L µ ? if
there exist x,y,z 2 ? such that - sxyz ("x,y,z are a decomposition of s")
- ygt0
- xy p
- For all i 0,
- x yi z 2 L ("the y part of s can be pumped
to produce other strings in the language") - It follows that s must be a member of L for it to
be p-pumpable - The 15-character string s in the previous example
was 10-pumpable in L(M)
94p-regular-pumpable languages
- Definition A language L is p-regular-pumpable if
- for every s 2 L such that s p, the string s
is p-pumpable in L - in other words, "every long enough string in L is
pumpable" - Our previous example language was
15-regular-pumpable
95RPP(p) and RPP
- Definition RPP(p) is the class of languages that
are p-regular-pumpable. In other words,RPP(p)
Lµ? L is p-regular-pumpable - Definition RPP is the class of languages that are
p-regular pumpable for some p. In other
words, - Lots of notation and apparent complexity, but the
idea is simple RPP is the class of languages in
which every long string is pumpable
96Pumping lemma
- Theorem 1.70 (rephrased) If Lµ? is recognized
by a p-state DFA, then L 2 RPP(p) - Proof Just like our example, but use p instead of
the constant 15 (number of states) - Corollaries
- REG µ RPP
-
Primary application of Pumping Lemma
97Proving a language nonregular
- First unravel these definitions, but it amounts
to proving that L is not a member of RPP. Then
it follows that L isn't regular - Proving that L isn't in RPP allows you to
concentrate on the language rather than
considering all possible proposed programs that
might recognize it
98Unraveling RPP a direct rephrasing
- Rephrasing L is a member of RPP if
- There exists p0 such that
- For every s2L satisfying s p,
- There exist x,y,z 2 ? such that
- sxyz
- ygt0
- xy p
- For all i 0,
- x yi z 2 L
(9 p) (8 s) (9 x,y,z) (8 i) !!!Pretty complicated
99Question from last time
- (Question) Didn't you earlier say "regular
languages are closed under concatenation"? - (Answer) No, I wrote that REG is closed under
concatenation - Subtle but important distinction. REG (the class
of all regular languages) is closed under
language concatenation - If A,B2REG then AB2REG
- That does not mean that each regular language is
itself closed under string concatenation - 10, 1 2 REG but 101 10, 1
100Nonregularity proof by contradiction
- Claim Let B 0n 1n n 0 . Then B is not
regular - Proof We show that B is not a member of RPP by
contradiction. - So assume that B 2 RPP (and hope to reach a
contradiction soon). Then there exists p 0
associated with the definition in RPP. - We let s 0p 1p. (Not the exact same variable
as in the RPP property, but an example of one
such possible setting of it.) Now we know that s
2 B because it has the right form.
101Proof continued
- Now s 2p p. By assumption that B 2 RPP,
there exist x,y,z such that - sxyz ( 0p 1p, remember)
- ygt0
- xy p
- For all i 0,
- x yi z 2 B
- Part (3) implies that xy 2 0 because the first
p-many characters of sxyz are all 0 - So y consists solely of '0' characters
- ... at least one of them, according to (2)
102Proof continued
- But consider
- s xyz xy1z 0p 1p (where we started)
- y consists of one or more '0' characters
- so xy2z contains more '0' characters than '1'
characters. In other words, - xy2z 0py 1p
- so xy2z B 0n 1n n 0 .
- This contradicts part (4)!!
- Since the contradiction followed merely from the
assumption that B2RPP (and right and meet and
true reasoning about which we have no doubt),
that assumption must be wrong QED
103Observations
- We needed (and got) a contradiction that was a
necessary consequence of the assumption that B 2
RPP and then relied on the Theorem 1.70
corollaries - RPP mainly concerns strings that are longer than
p - So you should concentrate on strings longer than
p... - even though p is a variable. But clearly
0p1pgtp - In our example we didn't "do" much after our
initial choice of s and thinking about the
implications we found a contradiction right away - Many other choices of s would work, but many
don't, and even some that do work require more
complex argumentsfor example, s0bp/2c1
1bp/2c1 - Choosing s wisely is usually the most important
thing
104Picture so far
ALL
Each point is a language in this Venn diagram
RPP
We'll see anexample later
0(101)
REG
0101, ?
FIN
B 0n 1n n 0
105More on contradictions
- Consider this shortcut attempt to prove that B
0n 1n n 0 is not regular - Proof Suppose B2 RPP. By RPP,
- There exists p0 such that
- For every s2B satisfying s p,
- There exist x,y,z 2 ? such that
- sxyz
- ygt0
- xy p
- For all i 0,
- x yi z 2 B
- So let s (1010)p. Then s B, which is
inconsistent with the RPP statement.
Contradiction??
NO
106Simplifying RPP proofs
- I find it easier to forget about contradiction
proofs and instead prove directly that a language
is not in RPP - So we need a direct, formal version of of the
statement that L RPP
107Unraveling RPP (repeat)
- Rephrasing L is a member of RPP if
- There exists p0 such that
- For every s2L satisfying s p,
- There exist x,y,z 2 ? such that
- sxyz
- ygt0
- xy p
- For all i 0,
- x yi z 2 L
(9 p) (8 s) (9 x,y,z) (8 i) !!!Pretty complicated
108Unraveling non-RPP
- Rephrasing L is not in RPP if
- For every p0
- There exists some s2L satisfying s p such
that - For every x,y,z 2 ? satisfying 1-3
- sxyz,
- ygt0, and
- xy p
- There exists some i 0 for which
- x yi z L
(8 p) (9 s) (8 x,y,z) (9 i) Still complicated
but you don't have to use contradiction now
109A direct proof of nonregularity
- Let Dan2 n0 ?,a1,a4,a9, ? ('a' is just
some character). Then D is not regular. - Proof idea The pumping lemma says there's a
fixed-size loop in any DFA that accepts long
strings. You can repeat the characters in that
loop as many times as you want to get longer
strings that the machine accepts. Each time you
add a repetition you grow the pumped string by a
constant length. - But the spacing between strings in D above keeps
changing it's never constant. So D doesn't have
the pumping property.
110A direct proof of nonregularity
- Let Dan2 n0 ?,a1,a4,a9, ?. Then D is
not in RPP and thus not regular. - Proof Let p0 and set sa(p1)2. Then s2D and
sgtp (so such an s certainly exists). - Now let x,y,z2? be any strings satisfying
- xyz s a(p1)2
- ygt0, and
- xy p
- Our goal is to produce some i such that xyiz D
111Direct proof continued
- (We'll actually show that xy0z D)
- Observe that yaj for some 1 j p, so
- xy0z a(p1)2-j lt (p1)2
- Since j p we know that -j -p and thus
- xy0z (p1)2 - j
- (p1)2 - p
- p2 p 1
- gt p2
- In other words, xy0z has gt p2 characters and lt
(p1)2 characters. So xy0z is not a perfect
square and thus xy0z D. QED
112Direct or contradiction proof?
- Both work fine... it's your choice
- But you must clearly state what you are doing
- If proof by contradiction, say so
- If direct proof, say so
113Game theory formulation
- The direct proof technique can be formulated as a
two-player game - You are the player who wants to establish that L
is not pumpable - Your opponent wants to make it difficult for you
to succeed - Both of you have to play by the rules
114Game theory continued
- The game has just four steps.
- Your opponent picks p0
- You pick s2L such that s p
- Your opponent chooses x,y,z 2 ? such that sxyz,
ygt0, and xy p - You produce some i 0 such that xyiz L
115Game theory continued
- If you are able to succeed through step 4, then
you have won only one round of the game - Like winning one round of Tic-tac-toe
- Do example for a member of D
- To show that a language is not in RPP you must
show that you can always win, regardless of your
opponent's legal moves - Realize that the opponent is free to choose the
most inconvenient or difficult p and x,y,z
imaginable that are consistent with the rules
116Game theory continued
- So you have to present a strategy for always
winning and convincingly argue that it will
always win - So your choices in steps 2 4 have to depend on
the opponent's choices in steps 1 3 - And you don't know what the opponent will choose
- So your choices need to be framed in terms of the
variables p, x, y, z
117Game theory continued
- Ultimately it is not very different from the
direct proof - But it states clearly what choices you may make
and what you may not a common cause of errors
in proofs - Repeat previous proof in this framework
118A direct proof of nonregularity
Step 1, opponent's choice
Step 2, your choice and reasoning
- Let Dan2 n0 ?,a1,a4,a9, ?. Then D is
not in RPP and thus not regular. - Proof Let p0 and set sa(p1)2. Then s2D and
sgtp (so such an s certainly exists). - Now let x,y,z2? be any strings satisfying
- xyz s a(p1)2
- ygt0, and
- xy p
- Our goal is to produce some i such that xyiz D
Step 3, opponent's choice
119Direct proof continued
- (We'll actually show that xy0z D)
- Observe that yaj for some 1 j p, so
- xy0z a(p1)2-j lt (p1)2
- Since j p we know that -j -p and thus
- xy0z (p1)2 - j
- (p1)2 - p
- p2 p 1
- gt p2
- In other words, xy0z has gt p2 characters and lt
(p1)2 characters. So xy0z is not a perfect
square and thus xy0z D. QED
Step 4, your choice
Step 4, your reasoning
120Unraveling RPP (repeat)
- Rephrasing L is a member of RPP if
- There exists p0 such that
- For every s2L satisfying s p,
- There exist x,y,z 2 ? such that
- sxyz
- ygt0
- xy p
- For all i 0,
- x yi z 2 L
- Theorem REG µ RPP
121Structural facts about RPP
- If L 2 RPP(p) (meaning "strings in L with length
p are pumpable") and qgtp then L 2 RPP(q) - If L RPP(q) and qgtp then L RPP(p)
(contrapositive of 1) - Thus if you have a proof that establishes L
RPP(q) only when q5, that's good enough it
follows that L is not regular - Relevant for C is not regular problem
122Structural facts about RPP
- If L 2 FIN and the longest string in L has length
n, then - L 2 RPP(n1)
- L RPP(q) for all q lt n1
- Note RPP is a class of languages that's only
interesting because of its relation to REG. It
is not a reasonable proposal for a computation
model!
123Unraveling non-RPP (repeat)
- L is not in RPP if
- For every p0 (opponent choice)
- There exists some s2L satisfying s p such
that (your choice) - For every x,y,z 2 ? satisfying 1-3
- sxyz,
- ygt0, and
- xy p
- There exists some i 0 for which
- x yi z L
(opponent's)
(yours)
124Another example
- Let C 0m 1n m ? n . Is C regular? Try to
prove it isn't - Set s0p 12p. If opponent chooses x?, y0p,
z12p, then we can set i2 and win because
xy2z02p 12p C. - What if opponent chooses a shorter y?
- Looks like it's relatively easy to be a member of
C and hard to not be a member of C - Can force opponent to choose y 2 0
- So try to arrange it so that no matter what y
is, some number of repetitions of it will match
the target number of '1's
125Direct proof?
126Using closure properties
- Can simplify argument a great deal
- Fact If L is not regular then Lc is not regular
either. - Proof If L is not regular but Lc were regular,
then (Lc)c would also be regular because REG is
closed under complement. But (Lc)c L QED - Recall the languagesB 0m 1n m n C
0m 1n m ? n C is similar to Bc...
127Using closure properties
- Start over
- B 0m 1n m n (known nonreg)C 0m 1n
m ? n (suspected nonreg) - Certainly B µ Cc
- If mn then it's true that (not m ? n)
- But B ? Cc
- Find example x 2 Cc - B...
- On the other hand, B 01 Å Cc
128Using closure properties
- Fact If L1ÅL2 REG and L1 2 REG, then L2 REG
- Proof Suppose (a) L1Å L2 REG and L12 REG and
(b) L22REG. Since REG is closed under Å we know
that L1ÅL2 2 REG, but that contradicts assumption
(a). Thus (a) and (b) can't both be true. QED
129Topics for Exam 1
- Basic objects
- The main hierarchy alphabets, strings,
languages, classes - Functions
- Relations
- Sets and operations on sets
- , Å, complement, , P(S), A-B, S
- µ, 2
- element predicate(element)
- Propositional and predicate logic
- 8 and 9
130Topics for Exam 1
- Strings
- ? versus
- Operations on strings concatenation,
exponentiation, reversal - Languages
- Operations concatenation, exponentiation,
reversal, , Å, , complement, everything
applicable to sets, ? versus - Language classes
- FIN, REG, ALL
131Topics for Exam 1
- REG and its many formulations
- DFA, NFA, GNFA, UFA, REX
- Syntax and semantics of each model
- L() as program-to-language operator
- Conversions between models
- Subset construction for NFA, UFA
- DFA ! GNFA ! REX
- REX ! NFA
132Topics for Exam 1
- Closure properties of language classes
- REG as a reasonable model of computation
- Arguments for, against
- Homework problems through homework 3
- Lectures reading up through section 1.3
(excluding nonregularity)
133Exam 1
- You may bring and consult a single-sided,
handwritten sheet of notes, which you must turn
in with the exam (and will get back later)
134Applying these closure properties
- B 0m 1n m n C 0m 1n m ? n
- 01 Å Cc B
- Thus C is nonregular too
obviously regular
known to be nonregular
therefore nonregular
135Another closure properties attempt
- B 0m 1n m n 0n 1n n 0
(known nonreg) - BB 0n1n 0m1m n,m 0
- Want to show that BB REG
- We know that REG is closed under language
concatenation. What does that say about whether
BB is regular or not? - Is the class of non-regular languages (REGc)
closed under language concatenation too?
136No
- Let ? a and D an2 n 2
- Then Dc ak k 1 or k is not a square
?, a1,a2,a3,a5,a6,a7,a8,a10,? - We previously proved that D REG
- Thus Dc REG (by "fact" we proved)
- But Dc Dc a 2 REG !!!
- Thus REGc is not closed under language
concatenation
137Back to problem
- B 0m 1n m n (known nonreg)BB 0n1n
0m1m n,m 0 - Want to show that BB REG
- But there's no general result for that
- When applying a closure property, you have to
make sure it's true! - Nonetheless, it is true that BB REG
- Because (BB) Å 01 B
138Chapter 1 closing considerations
- We don't and won't have many results about the
class REGc - Being nonregular says that the language lacks a
certain type of structure it's more complicated
than a DFA can handle - All real computers are finite devices and all
finite languages are regular - Yet the programming models are brittle the
program has to change for larger and larger
inputs - We've seen some easy-to-specify languages that
aren't regular - So REG is not a good general-purpose programming
model...?