91'304 Foundations of Theoretical Computer Science - PowerPoint PPT Presentation

1 / 137
About This Presentation
Title:

91'304 Foundations of Theoretical Computer Science

Description:

... (q,x) is the state of the machine after starting in state q and ... logical OR (A [ B) logical AND (A B) concatenation (A B) and star (A*) hard to prove! ... – PowerPoint PPT presentation

Number of Views:255
Avg rating:3.0/5.0
Slides: 138
Provided by: csU72
Category:

less

Transcript and Presenter's Notes

Title: 91'304 Foundations of Theoretical Computer Science


1
91.304 Foundations of (Theoretical) Computer
Science
  • Chapter 1 Lecture Notes
  • David Martin
  • dm_at_cs.uml.edu

This work is licensed under the Creative Commons
Attribution-ShareAlike License. To view a copy of
this license, visit http//creativecommons.org/lic
enses/by-sa/2.0/ or send a letter to Creative
Commons, 559 Nathan Abbott Way, Stanford,
California 94305, USA.
2
Chapter 1 Regular Languages
  • Simple model of computation
  • Input a string, and either accept or reject it
  • Models a very simple type of function, a
    predicate on strings f ? ! 0,1
  • See example of a state-transition diagram

3
Syntax of DFA
  • A deterministic finite automaton (DFA) is a
    5-tuple (Q,?,delta,q0,F) such that
  • Q is a finite set of states
  • ? (sigma) is an alphabet
  • ?Q?!Q (delta) is the transition function
  • q02 Q (q naught) is the start state
  • F µ Q is the set of accepting states
  • Usually these names are used, but others are
    possible as long as the role is clear

4
DFA syntax
  • It is deterministic because for every input
    (q,c), the next state is a uniquely determined
    member of Q
  • because the codomain of ? is Q
  • Fix the previous example to fit these constraints
  • The same example DFA, specified formally

5
DFA computation
  • This definition is different from but equivalent
    to the one in the text
  • Let M(Q,?,?,q0,F) be a DFA. We define the
    extended transition function ?Q?!
    Qinductively as follows. For all q2 Q,
    ?(q,?) q.If w2? and c2?, let ?(q,wc)
    ?(?(q,w),c)
  • According to this definition, ?(q,x) is the
    state of the machine after starting in state q
    and reading the entire string x
  • See example

6
Language recognized by DFA
  • The language recognized by the DFA M is written
    L(M) and defined as L(M)x2? ?(q0,x) 2 F
  • Think of L() as an operator that turns a program
    into the language it specifies
  • We will use L() for other types of machines and
    grammars too

7
Example
  • Let L2x20,1 the binary number x is a
    multiple of 2 and build a DFA M2 such that
    L(M2) L2
  • Remember this means L(M2) µ L2 and L2 L2 µ L(M2)

8
Definition of regular languages
  • A language L is regular if there exists a DFA M
    such that L L(M)
  • The class of regular languages over the alphabet
    ? is called REG and defined REG L µ ? L
    is regular L(M) M is a DFA
    over ?
  • Now we know 4 classes of languages , FIN, REG,
    and ALL

9
Problems
  • For all k1, let Ak0kn n0. Prove that (8
    k1) Ak 2 REG
  • Solution is a scheme, not a single DFA
  • (Harder) Build a DFA for L3x20,1 the binary
    number x is a multiple of 3
  • Build a DFA for L4x2a,b x contains an odd
    of bs and an even of as

10
Measuring DFA complexity
  • Suppose
  • you have a DFA with states named 00000000 ..
    11111111 (28 256 unique states)
  • an LCD attached to the thing showing the current
    state name
  • ? c (for clock pulse)
  • ?(q, c) (q 1) 0xFF
  • This is a simple counter machine feed it clocks
    and it counts upwards

11
Measuring DFA complexity
  • Time complexity
  • A DFA always takes one transition per input
    character
  • So time complexity is not useful here
  • Program complexity
  • A DFAs program is (mostly) its ?
  • The model specifies no particular programming
    language for ? its just a table mapping
    (state, input) pairs to (state) outputs
  • Though it can sometimes be specified concisely,
    as in ?(q, c) (q 1) 0xFF
  • Reprogram the clock for any permutation of 0,18
    and ?s table remains just as big

12
Measuring DFA complexity
  • Space complexity the amount of memory used
  • But a DFA has no extra memory it only remembers
    what state it is in
  • Cant look back or forward
  • So a DFA always uses the same amount of memory,
    namely the amount of memory required to remember
    what state its in
  • Needs to remember current element of Q
  • Can write down that number in log2 Q bits

13
DFAs as real computers
  • Consider a 256 MB computer that takes a finite
    input and produces a finite output
  • Inputs clock pulses, interrupts, hard drive,
    keyboard, mouse, network, etc.
  • Outputs video, hard drive, network, etc.
  • Can code everything in binary
  • But DFA only accepts or rejects input

14
Recognition model for functions
  • Can still sort of be modeled by a DFA
  • PC x y x,y 20,1 and the input x
    produces the output y
  • Note character is just a separator
  • DFA plays the role of equipment verifier
  • Verifying correctness seems easier than computing
    the output, but at least its related

15
Are DFAs reasonable?
  • One issue is that the programs dont seem to
    reflect much about the problem being solved
  • If you can figure out how many bits of memory are
    needed for the solution, then you can always
    build a DFA based on that knowledge could be
    tedious and really large
  • No difference in program complexity between same
    amount of memory means DFAs dont help us see the
    difference between programs very easily
  • Neural nets??

16
Are DFAs reasonable?
  • Similarly An 8-bit counter is structurally very
    different than a 9-bit counter
  • More memory needed ) totally different ? program
    needed
  • Not very modular!

17
Are DFAs reasonable?
  • Another issue is that DFAs prefer the beginning
    of their inputs to the end of their inputs
  • L5 x20,1 the fifth digit from the left
    of x is 0
  • L6 x20,1 the fifth digit from the right
    of x is 0
  • DFAs know where the input begins but not where it
    ends

18
Is REG reasonable?
  • We should be able to combine computations as
    subroutines in simple ways
  • logical OR (A B)
  • logical AND (A Å B)
  • concatenation (A B) and star (A)
  • hard to prove!! motivation for NFA
  • compl?ment (Ac)
  • reversal (AR)
  • All above are easy to do as logic circuits
  • Will discuss further as closure under language
    operations

19
Nondeterministic Finite Automata
  • Will relax two of these DFA rules
  • Each (state, char) input must produce exactly one
    (state) output
  • Must consume one character in order to advance
    state
  • Example L6 ?bob?
  • See M6
  • The NFA accepts the input if there exists any way
    of reading the input that winds up in an
    accepting state at the end of the string
  • Otherwise it rejects the input

20
NFAs
  • Thus the NFA rejects the input if there doesnt
    exist any way of reading the input that winds up
    in an accepting state at the end of the string
  • In other words every way of reading the input
    leads to a nonaccepting state
  • Example M7
  • L7 ?

a
b
c
?
?
1
2
3
21
Ways to think of NFAs
  • NFAs want to accept inputs and will always take
    the most advantageous alternative(s)
  • Because they will accept if there exists any way
    to get to an accepting state at the end of the
    string
  • The quickest way there may be just one of many
    ways, but it doesnt matter
  • http//www.chompchomp.com/frag05/frag05.01.a.htm

22
Ways to think of NFAs
a
a
a
  • fork() model
  • Input string is in a variable
  • fork() at every nondeterministic choice point
  • subprocess 1 (parent) follows first transition
  • subprocess 2 (child) follows second
  • subprocess 3 (child) follows third (if any), etc.
  • A process that cant follow any transition calls
    exit() -- and gives up its ability to accept
  • A process that makes it through the whole string
    and is in an accepting state prints out ACCEPT
  • A single ACCEPT is enough

23
Syntax of DFA (repeat)
  • A deterministic finite automaton (DFA) is a
    5-tuple (Q,?,delta,q0,F) such that
  • Q is a finite set of states
  • ? is an alphabet
  • ?Q ? ! Q is the transition
    function
  • q02 Q is the start state
  • F µ Q is the set of accepting states
  • Usually these names are used, but others are
    possible as long as the role is clear

24
Syntax of NFA
  • A nondeterministic finite automaton (NFA) is a
    5-tuple (Q,?,delta,q0,F) such that
  • Q is a finite set of states
  • ? is an alphabet
  • ?Q(? ?)!P(Q) is the transition function
  • q02 Q is the start state
  • F µ Q is the set of accepting states
  • Usually these names are used, but others are
    possible as long as the role is clear

25
Syntax of NFA
  • Definition ?? ? ?
  • Well use this frequently enough
  • Differences on state-transition diagram
  • ?(1,a) 1 (not ?(1,a) 1)
  • ?(1,?) 1, 2
  • ?(3, c) 2, 3
  • ?(2,a)
  • ?(3,?) 3

a
b
c
?
?
1
2
3
c
Example M8
26
NFA computation
  • This next definition is different from but
    equivalent to the one in the text
  • Books definition may be easier to understand at
    first, but that makes its version of Theorem 1.39
    (subset construction) harder
  • Goal a function ?Q?! P(Q) where ?(q,x) is
    the set of all states reachable in the machine
    after starting in state q and reading the entire
    string x
  • Then for an NFA M, we will define something like
    L(M) x2? ?(q0,x) contains some
    accepting state

27
NFA computation
  • Let M(Q,?,?,q0,F) be an NFA. We define some
    auxiliary functions
  • E Q ! P(Q) by ("?-closure")
  • E(q) p2 Q p is reachable from q by
    following a chain of 0 or more ?
    transitions
  • Although E takes elements of Q as input, we'll
    also use it as a function that takes subsets of Q
    as input (that is, elements of P(Q)). SoE P(Q)
    ! P(Q) by

In other words, given a set as input, just
process each element independently...
28
NFA computation
  • Thus E(q) is the set of all states you can get to
    from q without reading any input
  • In M8, E(3) ? E(2,1) ?
  • We define a simple extension of ? that takes a
    set of states as input
  • ? Q ??! P(Q) (this comes with the NFA)
  • ?P(Q)?? ! P(Q) defined by

Again, given a set as input, just process each
element independently...
29
NFA computation
  • We have a function E() that follows ?-transitions
    and a function ? that behaves like ? but takes
    sets as input
  • ?Q?! P(Q) is defined inductively For all q2
    Q, ?(q,?) E( q )
  • If w2? and c2?, let
  • ?(q,wc) E(?(?(q,w),c))

30
NFA computation
  • Finally, we defineL(M) x2? ?(q0,x)
    contains some accepting state
    x2?
  • ?(1,ac) E(?(?(1,a),c))
  • ?(1,a)E(?(?(1,?),a))
  • ?(1,?) ?
  • ?(1,ac) ?

?(q0,x) Å F ?
31
Question
  • "How do I know when to follow ? transitions and
    when not to?"
  • If you're talking about ?, then don't--it's the
    program itself. ? can express that "there is an
    ? transition here" but you never go any further
    than that one hop.
  • If you're talking about ?, then do--because it
    includes E() as part of its definition, which is
    there precisely in order to follow ? transitions

32
NFAs are good at union (or)
  • L2x20,1 the binary number x is a multiple
    of 2
  • L3x20,1 the binary number x is a multiple
    of 3
  • Let A L2 L3
  • NFA for A using guess-and-verify strategy
  • Preview of Theorem 1.45

33
The Subset Construction
  • Theorem 1.39 For every NFA M1 there exists a DFA
    M2 such that L(M1) L(M2)
  • Proof idea Well, how does fork() work on a
    uniprocessor machine?

34
The Subset Construction
  • Proof Let M1(Q1,?,?1,init1,F1) be the NFA and
    define the DFA M2(Q2,?,?2,init2,F2) as follows
  • Q2 P(Q1).
  • Each state of the DFA records the set of states
    that the NFA can simultaneously be in
  • Can compare DFA states for equality but also look
    "inside" the state name to find a set of NFA
    state names
  • Define ?2 Q2 ? ! Q2 ?2 P(Q1)? !
    P(Q1) by
  • ?2(S,a) E1(?1(S,a)) Go to whatever states
    are reachable from the states in S and reading
    the character a

Remember in an NFA,?1 Q1 ?? ! P(Q1) from
def ?1P(Q1)?? ! P(Q1) extend to sets E1P(Q1)
!P(Q1) ?-closure
35
The Subset Construction
  • init2 E(init1)
  • F2q 2 Q2 q Å F1? , in other wordsF2S µ
    Q1 S Å F1?
  • The effect is that the DFA knows all states that
    are reachable in the NFA after reading the string
    so far. If any one of them is accepting, then
    the current DFA state is accepting too, otherwise
    it's not.
  • If you believe this then that's all it takes to
    see that the construction is correct. So,
    convince yourself with an example. QED

36
Subset construction example
  • Q2 ,1,2,3,1,2,1,3,2,3,1,2,3
  • (On board)
  • init21,2,3
  • F23,1,3,2,3,1,2,3

a
b
c
?
?
3
1
2
c
Example M8 (think of this as M1 in the
construction)
37
Be methodical
  • Need to compute ?2(1,2,3,c)
    E1(?1(1,2,3,c))
  • By definition, ?1(1,2,3,c) ?1(1,c) ?1(2,c)
    ?1(3,c)

  • 2,3
  • Then take E1( 2,3 ) 2,3
  • Save intermediate results for reuse
  • It's OK to eliminate unreachable states in
    practice, even though that's not what the
    construction really does

38
Subset construction conclusion
  • Adding nondeterminism makes programs shorter but
    not able to do new things
  • Remember regular languages are defined to be
    those "recognized by a DFA"
  • We now have a result that says that every
    language that is recognized by an NFA is regular
    too
  • So if you are asked to show that a language is
    regular, you can exhibit a DFA or NFA for it and
    rely on the subset construction theorem
  • Sometimes questions are specifically about DFAs
    or NFAs, though... pay attention to the precise
    wording

39
More NFA examples
  • Write an NFA for ab,abc with 3 states
  • NFA and DFA for ? over ?0,1
  • Rule ? 2 L(M) , ?
  • NFA and DFA for over ?0,1

40
Closure properties
  • The presence or absence of closure properties
    says something about how well a set tolerates an
    operation
  • Definition. Let S µ U be a set in some universe
    U and be an operation on elements of U. We say
    that S is closed under if applying to
    element(s) of S produces another element of S.
  • For example, if is a binary operation UU!U,
    then we're saying that (8 x2S and y2S) x y 2 S

41
Closure properties illustrated
U
Applying the operation to elements of S never
takes you ouside of S. S is closed with respect
to This example shows unary operations





S
42
Closure properties
  • Having a closure property usually means there is
    some type of "natural fit" between the operation
    and the set
  • Examples
  • N is closed under and and but not - and
  • Z is closed under and - and and unary -
    (negation) but not or
  • Q-0 is closed under and but not or -

43
More examples
  • L1x2 0,1 x is a multiple of 3
  • is closed under string reversal and concatenation
  • L3x20,1 the binary number x is a multiple
    of 3
  • is also closed under string reversal and
    concatenation, harder to see though
  • L4x2a,b x contains an odd of bs and an
    even of as
  • is closed under string reversal
  • is not closed under string concatenation

44
Closure higher abstraction
  • We will usually be concerned with closure of
    language classes under language operations
  • Previous examples were closure of sets containing
    non-set elements under various familiar
    operations
  • We consider DFAs and NFAs to be programs and we
    want assurance that their outputs can be combined
    in desired ways just by manipulating their
    programs (like using one as a subroutine for the
    other)
  • Representative question is REG closed under
    (language) concatenation?

45
The regular operations
  • The regular operations on languages are
  • (union)
  • (concatenation)
  • (Kleene star)
  • The name "regular operations" is not that
    important
  • Too bad we use the word "regular" for so much
  • REG is closed under these regular operations
  • That's why they're called "regular" operations
  • This does not mean that each regular language is
    closed under each of these operations!

46
The regular operations
  • REG is closed under union Theorem 1.25 (using
    DFAs), Theorem 1.45 (using NFAs)
  • REG is closed under concatenation Theorem 1.47
    (NFAs)
  • REG is closed under Theorem 1.49 (NFAs)
  • Study these constructions!!
  • REG is also closed under complement and reversal
    (not in book)

47
Regular expressions
  • You are probably familiar with these
  • Example "int .\(.\)" is a (flex format)
    regular expression that appears to match C
    function prototypes that return ints
  • In our treatment, a regular expression is a
    program that generates a language of matching
    strings when you "run it"
  • We will use a very compact definition that
    simplifies things later

48
Regular expressions
  • Definition. Let ? be an alphabet not containing
    any of the special characters in this list ?
    ) ( We define the syntax of the
    (programming) language REX(?), abbreviated as
    REX, inductively
  • Base cases
  • For all a2?, a2REX. In other words, each single
    character from ? is a regular expression all by
    itself.
  • ?2REX. In other words, the literal symbol ? is a
    regular expression. In this context it is not
    the empty string but rather the single-character
    name for the empty string.
  • 2REX. Similarly, the literal symbol is a
    regular expression.

49
Regular expressions
  • Definition continued
  • Induction cases
  • For all r1, r22 REX,( r1 r2 ) 2 REX
    also
  • For all r1, r22 REX,( r1 r2 ) 2 REX also

literal symbols
variables
50
Regular expressions
  • Definition continued
  • Induction cases continued
  • For all r 2 REX,( r ) 2 REX also
  • Examples over ?0,1
  • ? and 0 and 1 and
  • (((10)(?)))
  • ?? is not a regular expression
  • Remember, in the context of regular expressions,
    ? and are ordinary characters

51
Semantics of regular expressions
  • Definition. We define the meaning of the
    language REX(?) inductively using the L()
    operator so that L(r) denotes the language
    generated by r as follows
  • Base cases
  • For all a2?, L(a) a . A single-character
    regular expression generates the corresponding
    single-character string.
  • L(?) ? . The symbol for the empty string
    actually generates the empty string.
  • L() . The symbol for the empty language
    actually generates the empty language.

52
Regular expressions
  • Definition continued
  • Induction cases
  • For all r1, r22 REX,L( (r1 r2) ) L(r1)
    L(r2)
  • For all r1, r22 REX,L( (r1 r2) ) L(r1)
    L(r2)
  • For all r 2 REX,L( ( r ) ) (L(r))
  • No other string is in REX(?)
  • Example
  • L( ( ((10)(?)) ) ) includes
  • ?,10,1010,101010,10101010,...

53
Orientation
  • We used highly flexible mathematical notation and
    state-transition diagrams to specify DFAs and
    NFAs
  • Now we have a precise programming language REX
    that generates languages
  • REX is designed to close the simplest languages
    under , ,

54
Abbreviations
  • Instead of parentheses, we use precedence to
    indicate grouping when possible.
  • (highest)
  • (lowest)
  • Instead of , we just write elements next to
    each other
  • Example (((10)(?))) can be written as
    (10(?)) but there is no further abbreviation
  • (Not in text) If r2 REX(?), instead of writing
    rr, we write r

55
Abbreviations
  • Instead of writing a union of all characters from
    ? together to mean "any character", we just write
    ?
  • In a flex/grep regular expression this would be
    called "."
  • Instead of writing L(r) when r is a regular
    expression, we consider r alone to simultaneously
    mean both the expression r and the language it
    generates, relying on context to disambiguate

56
Abbreviations
  • Caution regular expressions are strings
    (programs). They are equal only when they
    contain exactly the same sequence of characters.
  • (((10)(?))) can be abbreviated (10(?))
  • however (((10)(?))) ? (10(?)) as strings
  • but (((10)(?))) (10(?)) when they are
    considered to be the generated languages
  • more accurately then, L( (((10)(?))) )
    L( (10(?)) )
  • L( (10) )

57
Facts
  • REX(?) is itself a language over an alphabet ?
    that is
  • ? ? ) , ( , , , ? ,
  • For every ?, REX(?) 1
  • ,(),(()),...
  • even without knowing ? there are infinitely many
    elements in REX(?)
  • Question Can we find a DFA or NFA M with L(M)
    REX(?)?

58
Examples
  • Find a regular expression for w20,1 w ?
    10
  • Find a regular expression for x20,1 the
    6th digit counting from the rightmost
    character of x is 1
  • Find a regular expression forL3x20,1 the
    binary number x is a multiple of 3

59
The DFA for L3
1
0
1
0
1
0
2
0
1
(0 1 0)
Regular expression(0 1 _____________ 1 )
60
Regular expression for L3
  • (0 1 (0 1 0) 1 )
  • L3 is closed under concatenation, because of the
    overall form ( )
  • Now suppose x2L3. Is xR 2 L3?
  • Yes see this is by reversing the regular
    expression and observing that the same regular
    expression results
  • So L3 is also closed under reversal

61
Regular expressions generate regular languages
  • Lemma 1.55 For every regular expression r, L(r)
    is a regular language.
  • Proof by induction on regular expressions.
  • We used induction to create all of the regular
    expressions and then to define their languages,
    so we can use induction to visit each one and
    prove a property about it

62
L(REX) µ REG
  • Base cases
  • For every a2 ?, L(a) a is obviously
    regular
  • L(?) ? 2 REG also
  • L() 2 REG

a
63
L(REX) µ REG
  • Induction cases
  • Suppose the induction hypothesis holds for r1 and
    r2. Namely, L(r1) 2 REG and L(r2) 2 REG. We
    want to show that L( (r1 r2) ) 2 REG also. But
    look by definition, L( (r1 r2) ) L(r1)
    L(r2)
  • Since both of these languages are regular, we
    can apply Theorem 1.45 (closure of REG under )
    to conclude that their union is regular.

64
L(REX) µ REG
  • Induction cases
  • Now suppose L(r1)2 REG and L(r2)2 REG. By
    definition, L( (r1 r2) ) L(r1) L(r2)
  • By Theorem 1.47, this concatenation is regular
    too.
  • Finally, suppose L(r)2 REG. Then by
    definition, L( (r) ) (L(r))
  • By Theorem 1.49, this language is also regular.
    QED

65
On to REG µ L(REX)
  • Now we'll show that each regular language (one
    accepted by an automaton) also can be described
    by a regular expression
  • Hence REG L(REX)
  • In other words, regular expressions are
    equivalent in power to finite automata
  • This equivalence is called Kleene's Theorem (1.54
    in book)

66
Converting DFAs to REX
  • Lemma 1.60 in textbook
  • This approach uses yet another form of finite
    automaton called a GNFA (generalized NFA)
  • The technique is easier to understand by working
    an example than by studying the proof

67
Syntax of GNFA
  • A generalized NFA is a 5-tuple (Q,?,?,qs,qa) such
    that
  • Q is a finite set of states
  • ? is an alphabet
  • ?(Q-qa)(Q-qs)! REX(?) is the transition
    function
  • qs2 Q is the start state
  • qa2 Q is the (one) accepting state

68
GNFA syntax summary
  • Arcs are labeled with regular expressions
  • Meaning is that "input matching the label moves
    from old state to new state" -- just like NFA,
    but not just a single character at a time
  • Start state has no incoming transitions, accept
    has no outgoing
  • Every pair of states (except start accept) has
    two arcs between them
  • Every state has a self-loop (except start
    accept)

69
Construction strategy
  • Will convert a DFA into a GNFA then iteratively
    shrink the GNFA until we end up with a diagram
    like thismeaning that exactly that input
    that matches the giant regular expression is in
    the langauge

giant regular expression
qa
qs
70
Converting DFA to GNFA
1
0
1
0
DFA
1
0
2
0
1
qa
1
0
Adding new start state qs is straightforward Then
make each DFA accepting state have an ?
transition to the single accepting state qa
1
0
?
1
2
0
0
1
?
qs
GNFA
71
Interpreting arcs
  • ?(Q-qa)(Q-qs)! REX(?)In this diagram,
  • ?(0,1)1 ?(2,0) ?(2,qa)
  • ?(1,1) ?(2,2)1 ?(0,qa)?

qa
1
0
1
0
?
1
2
0
0
1
?
qs
72
Eliminating a GNFA state
  • We arbitrarily choose an interior state (not qs
    or qa) to rip out of the machine

Question how is the ability of state i to get to
state j affected when we remove rip? Only the
solid and labeled states and transitions are
relevant to that question
R4
i
j
R1
R3
rip
R2
73
Eliminating a GNFA state
  • We produce a new GNFA that omits rip
  • Its i-to-j label will compensate for the missing
    state
  • We will do this for every (i,j) 2
    (Q-qa)(Q-qs)
  • So we have to rewrite every label in order to
    eliminate this one state
  • New label for i-to-j is
  • R4 (R1 (R2) R3)

R4
i
j
R1
R3
rip
R2
74
Don't overlook
  • The case (i,i) 2 (Q-qa)(Q-qs)
  • New label for i-to-i is still
  • R4 (R1 (R2) R3)
  • Example proceeds on whiteboard, or see textbook
    for a different one

R4
i
R3
R1
rip
R2
75
g/re/p
  • What does grep do?
  • (int float)_rec.emp becomes
  • (?)(int float)_rec(?)emp(?)
  • What does it mean?
  • How does it work?
  • Regular expression ! NFA ! DFA ! state reduction
  • Then run DFA against each line of input, printing
    out the lines that it accepts

76
State machines
  • Very common programming technique
  • while (true)
  • switch (state)
  • case NEW_CONNECTION
  • process_login()
  • stateRECEIVE_CMD
  • break
  • case RECEIVE_CMD
  • if (process_cmd() CMD_QUIT)
  • stateSHUTDOWN
  • break
  • case SHUTDOWN

77
This course so far
  • 1.1 Introduction to languages DFAs
  • 1.2 NFAs and DFAs recognize the same class of
    languages
  • 1.3 REX generates the same class of languages
  • Three different programming "languages" specified
    in different levels of formality that solve the
    same types of computational problems
  • Four, if you count GNFAs
  • Five, if you count UFAs

78
Strategies
  • If you're investigating a property of regular
    languages, then as soon as you know L 2 REG, you
    know there are DFAs, NFAs, Regexes that describe
    it. Use whatever representation is convenient
  • But sometimes you're investigating the properties
    of the programs themselves changing states,
    adding a to a regex, etc. Then the knowledge
    that other representations exist might be
    relevant and might not

79
All finite languages are regular
  • Theorem (not in book) FIN µ REG
  • Proof Suppose L 2 FIN.
  • Then either L , or L s1, s2, ?, sn where
    n2N and each si2?.
  • A regular expression describing L is, therefore,
    either or
  • s1 s2 ? sn QED
  • Note that this proof does not work for n1

80
Picture so far
ALL
Each point is a language in this Venn
diagram REG L(DFA) L(NFA) L(REX)
L(UFA) L(GNFA) ? FIN
REG
is there a language out here?
FIN
"the class of languages generated by DFAs"
81
1.4 Nonregular languages
  • For each possible language L,
  • µ L. So is the smallest language. And is
    regular
  • L µ ?. So ? is the largest language. And ? is
    regular
  • Yet there are languages in between these two
    extremes that are not regular

82
A nonregular language
  • B 0n 1n n 0
  • ?, 01, 0011, 000111, ?
  • is not regular
  • Why?
  • Q how many bits of memory would a DFA need in
    order to recognize B?
  • A there appears to be no single number of bits
    that's big enough to work for every element of B
  • Remember, the DFA needs to reject all strings
    that are not in B

83
Other examples
  • C w20,1 n0(w) n1(w)
  • Needs to count a potentially unbounded number of
    '0's... so nonregular
  • D w20,1 n01(w) n10(w)
  • Needs to count a potentially unbounded number of
    '01' substrings... so ??
  • Need a technique for establishing nonregularity
    that is more formal and... less intuitive?

84
Proving nonregularity
  • To prove a language that a language is
    nonregular, you have to show that no DFA
    whatsoever recognizes the language
  • Not just the DFA that is your best effort at
    recognizing the language
  • The pumping lemma can be used to do that
  • The pumping lemma says that every regular
    language satisfies the "regular pumping property"
    (RPP)
  • Given this, if we can show that a language like B
    doesn't satisfy the RPP, then it's not regular

85
Pumping lemma, informally
  • Roughly "if a regular language contains any
    'long' strings, then it contains infinitely many
    strings"
  • Start with a regular language and suppose that
    some DFA M(Q,?,?,q0,F) for it has Q10 states.
  • What if M accepts some particular string s where
    sc1c2?c15 so that s15?

q0
86
Pigeonhole principle
  • With 15 input characters, the machine will visit
    at most 16 states
  • But there are only 10 states in this machine
  • So clearly it will visit at least one of its
    states more than once
  • Let rpt be our name for the first state that is
    visited multiple times on that particular input s
  • Let acc be our name for the accepting state that
    s leads to, namely, ?(q0,s) acc
  • Let y be our name for the leftmost substring of s
    for which ?(rpt, y)rpt
  • Since there are no ? transitions in a DFA, a
    state being "visited multiple times" means that
    it read at least one character. Therefore, y gt
    0

87
sequence of states that M visits after
readingthe characters below
gt0
10
After reading c1? c10 (first 10 chars of s), M
must have already been to state rpt and returned
to it at least once... because there are only 10
states in M. Of course the repetition could have
been encountered earlier than 10 characters too...
88
sequence of states that M visits after
readingthe characters below
gt0
10
Assigning new names to the pieces of s...
89
sequence of states that M visits after
readingthe characters below
gt0
10
Assigning new names to the pieces of s... So s
xyz as shown above. With these names, the other
constraints can be written y gt 0 xy 10
90
M accepts other strings too
  • Consider the string xz

91
M accepts other strings too
  • Consider the string xz
  • ?(q0,x) rpt
  • ?(rpt,z) acc (from previous slide)
  • So xz 2 L(M) too

92
M accepts other strings too
  • Consider the string xyyz
  • ?(q0,xy)rpt (from 2 slides ago)
  • ? (rpt,y)rpt (from same previous result)
  • ? (rpt,z)acc (from same previous result)
  • So xyyz2 L(M) also
  • Apparently we can repeat y as many times as we
    want

93
p-regular-pumpable strings
  • Definition (not in textbook) A string s is said
    to be p-regular-pumpable in a language L µ ? if
    there exist x,y,z 2 ? such that
  • sxyz ("x,y,z are a decomposition of s")
  • ygt0
  • xy p
  • For all i 0,
  • x yi z 2 L ("the y part of s can be pumped
    to produce other strings in the language")
  • It follows that s must be a member of L for it to
    be p-pumpable
  • The 15-character string s in the previous example
    was 10-pumpable in L(M)

94
p-regular-pumpable languages
  • Definition A language L is p-regular-pumpable if
  • for every s 2 L such that s p, the string s
    is p-pumpable in L
  • in other words, "every long enough string in L is
    pumpable"
  • Our previous example language was
    15-regular-pumpable

95
RPP(p) and RPP
  • Definition RPP(p) is the class of languages that
    are p-regular-pumpable. In other words,RPP(p)
    Lµ? L is p-regular-pumpable
  • Definition RPP is the class of languages that are
    p-regular pumpable for some p. In other
    words,
  • Lots of notation and apparent complexity, but the
    idea is simple RPP is the class of languages in
    which every long string is pumpable

96
Pumping lemma
  • Theorem 1.70 (rephrased) If Lµ? is recognized
    by a p-state DFA, then L 2 RPP(p)
  • Proof Just like our example, but use p instead of
    the constant 15 (number of states)
  • Corollaries
  • REG µ RPP

Primary application of Pumping Lemma
97
Proving a language nonregular
  • First unravel these definitions, but it amounts
    to proving that L is not a member of RPP. Then
    it follows that L isn't regular
  • Proving that L isn't in RPP allows you to
    concentrate on the language rather than
    considering all possible proposed programs that
    might recognize it

98
Unraveling RPP a direct rephrasing
  • Rephrasing L is a member of RPP if
  • There exists p0 such that
  • For every s2L satisfying s p,
  • There exist x,y,z 2 ? such that
  • sxyz
  • ygt0
  • xy p
  • For all i 0,
  • x yi z 2 L

(9 p) (8 s) (9 x,y,z) (8 i) !!!Pretty complicated
99
Question from last time
  • (Question) Didn't you earlier say "regular
    languages are closed under concatenation"?
  • (Answer) No, I wrote that REG is closed under
    concatenation
  • Subtle but important distinction. REG (the class
    of all regular languages) is closed under
    language concatenation
  • If A,B2REG then AB2REG
  • That does not mean that each regular language is
    itself closed under string concatenation
  • 10, 1 2 REG but 101 10, 1

100
Nonregularity proof by contradiction
  • Claim Let B 0n 1n n 0 . Then B is not
    regular
  • Proof We show that B is not a member of RPP by
    contradiction.
  • So assume that B 2 RPP (and hope to reach a
    contradiction soon). Then there exists p 0
    associated with the definition in RPP.
  • We let s 0p 1p. (Not the exact same variable
    as in the RPP property, but an example of one
    such possible setting of it.) Now we know that s
    2 B because it has the right form.

101
Proof continued
  • Now s 2p p. By assumption that B 2 RPP,
    there exist x,y,z such that
  • sxyz ( 0p 1p, remember)
  • ygt0
  • xy p
  • For all i 0,
  • x yi z 2 B
  • Part (3) implies that xy 2 0 because the first
    p-many characters of sxyz are all 0
  • So y consists solely of '0' characters
  • ... at least one of them, according to (2)

102
Proof continued
  • But consider
  • s xyz xy1z 0p 1p (where we started)
  • y consists of one or more '0' characters
  • so xy2z contains more '0' characters than '1'
    characters. In other words,
  • xy2z 0py 1p
  • so xy2z B 0n 1n n 0 .
  • This contradicts part (4)!!
  • Since the contradiction followed merely from the
    assumption that B2RPP (and right and meet and
    true reasoning about which we have no doubt),
    that assumption must be wrong QED

103
Observations
  • We needed (and got) a contradiction that was a
    necessary consequence of the assumption that B 2
    RPP and then relied on the Theorem 1.70
    corollaries
  • RPP mainly concerns strings that are longer than
    p
  • So you should concentrate on strings longer than
    p...
  • even though p is a variable. But clearly
    0p1pgtp
  • In our example we didn't "do" much after our
    initial choice of s and thinking about the
    implications we found a contradiction right away
  • Many other choices of s would work, but many
    don't, and even some that do work require more
    complex argumentsfor example, s0bp/2c1
    1bp/2c1
  • Choosing s wisely is usually the most important
    thing

104
Picture so far
ALL
Each point is a language in this Venn diagram
RPP
We'll see anexample later
0(101)
REG
0101, ?
FIN
B 0n 1n n 0
105
More on contradictions
  • Consider this shortcut attempt to prove that B
    0n 1n n 0 is not regular
  • Proof Suppose B2 RPP. By RPP,
  • There exists p0 such that
  • For every s2B satisfying s p,
  • There exist x,y,z 2 ? such that
  • sxyz
  • ygt0
  • xy p
  • For all i 0,
  • x yi z 2 B
  • So let s (1010)p. Then s B, which is
    inconsistent with the RPP statement.
    Contradiction??

NO
106
Simplifying RPP proofs
  • I find it easier to forget about contradiction
    proofs and instead prove directly that a language
    is not in RPP
  • So we need a direct, formal version of of the
    statement that L RPP

107
Unraveling RPP (repeat)
  • Rephrasing L is a member of RPP if
  • There exists p0 such that
  • For every s2L satisfying s p,
  • There exist x,y,z 2 ? such that
  • sxyz
  • ygt0
  • xy p
  • For all i 0,
  • x yi z 2 L

(9 p) (8 s) (9 x,y,z) (8 i) !!!Pretty complicated
108
Unraveling non-RPP
  • Rephrasing L is not in RPP if
  • For every p0
  • There exists some s2L satisfying s p such
    that
  • For every x,y,z 2 ? satisfying 1-3
  • sxyz,
  • ygt0, and
  • xy p
  • There exists some i 0 for which
  • x yi z L

(8 p) (9 s) (8 x,y,z) (9 i) Still complicated
but you don't have to use contradiction now
109
A direct proof of nonregularity
  • Let Dan2 n0 ?,a1,a4,a9, ? ('a' is just
    some character). Then D is not regular.
  • Proof idea The pumping lemma says there's a
    fixed-size loop in any DFA that accepts long
    strings. You can repeat the characters in that
    loop as many times as you want to get longer
    strings that the machine accepts. Each time you
    add a repetition you grow the pumped string by a
    constant length.
  • But the spacing between strings in D above keeps
    changing it's never constant. So D doesn't have
    the pumping property.

110
A direct proof of nonregularity
  • Let Dan2 n0 ?,a1,a4,a9, ?. Then D is
    not in RPP and thus not regular.
  • Proof Let p0 and set sa(p1)2. Then s2D and
    sgtp (so such an s certainly exists).
  • Now let x,y,z2? be any strings satisfying
  • xyz s a(p1)2
  • ygt0, and
  • xy p
  • Our goal is to produce some i such that xyiz D

111
Direct proof continued
  • (We'll actually show that xy0z D)
  • Observe that yaj for some 1 j p, so
  • xy0z a(p1)2-j lt (p1)2
  • Since j p we know that -j -p and thus
  • xy0z (p1)2 - j
  • (p1)2 - p
  • p2 p 1
  • gt p2
  • In other words, xy0z has gt p2 characters and lt
    (p1)2 characters. So xy0z is not a perfect
    square and thus xy0z D. QED

112
Direct or contradiction proof?
  • Both work fine... it's your choice
  • But you must clearly state what you are doing
  • If proof by contradiction, say so
  • If direct proof, say so

113
Game theory formulation
  • The direct proof technique can be formulated as a
    two-player game
  • You are the player who wants to establish that L
    is not pumpable
  • Your opponent wants to make it difficult for you
    to succeed
  • Both of you have to play by the rules

114
Game theory continued
  • The game has just four steps.
  • Your opponent picks p0
  • You pick s2L such that s p
  • Your opponent chooses x,y,z 2 ? such that sxyz,
    ygt0, and xy p
  • You produce some i 0 such that xyiz L

115
Game theory continued
  • If you are able to succeed through step 4, then
    you have won only one round of the game
  • Like winning one round of Tic-tac-toe
  • Do example for a member of D
  • To show that a language is not in RPP you must
    show that you can always win, regardless of your
    opponent's legal moves
  • Realize that the opponent is free to choose the
    most inconvenient or difficult p and x,y,z
    imaginable that are consistent with the rules

116
Game theory continued
  • So you have to present a strategy for always
    winning and convincingly argue that it will
    always win
  • So your choices in steps 2 4 have to depend on
    the opponent's choices in steps 1 3
  • And you don't know what the opponent will choose
  • So your choices need to be framed in terms of the
    variables p, x, y, z

117
Game theory continued
  • Ultimately it is not very different from the
    direct proof
  • But it states clearly what choices you may make
    and what you may not a common cause of errors
    in proofs
  • Repeat previous proof in this framework

118
A direct proof of nonregularity
Step 1, opponent's choice
Step 2, your choice and reasoning
  • Let Dan2 n0 ?,a1,a4,a9, ?. Then D is
    not in RPP and thus not regular.
  • Proof Let p0 and set sa(p1)2. Then s2D and
    sgtp (so such an s certainly exists).
  • Now let x,y,z2? be any strings satisfying
  • xyz s a(p1)2
  • ygt0, and
  • xy p
  • Our goal is to produce some i such that xyiz D

Step 3, opponent's choice
119
Direct proof continued
  • (We'll actually show that xy0z D)
  • Observe that yaj for some 1 j p, so
  • xy0z a(p1)2-j lt (p1)2
  • Since j p we know that -j -p and thus
  • xy0z (p1)2 - j
  • (p1)2 - p
  • p2 p 1
  • gt p2
  • In other words, xy0z has gt p2 characters and lt
    (p1)2 characters. So xy0z is not a perfect
    square and thus xy0z D. QED

Step 4, your choice
Step 4, your reasoning
120
Unraveling RPP (repeat)
  • Rephrasing L is a member of RPP if
  • There exists p0 such that
  • For every s2L satisfying s p,
  • There exist x,y,z 2 ? such that
  • sxyz
  • ygt0
  • xy p
  • For all i 0,
  • x yi z 2 L
  • Theorem REG µ RPP

121
Structural facts about RPP
  • If L 2 RPP(p) (meaning "strings in L with length
    p are pumpable") and qgtp then L 2 RPP(q)
  • If L RPP(q) and qgtp then L RPP(p)
    (contrapositive of 1)
  • Thus if you have a proof that establishes L
    RPP(q) only when q5, that's good enough it
    follows that L is not regular
  • Relevant for C is not regular problem

122
Structural facts about RPP
  • If L 2 FIN and the longest string in L has length
    n, then
  • L 2 RPP(n1)
  • L RPP(q) for all q lt n1
  • Note RPP is a class of languages that's only
    interesting because of its relation to REG. It
    is not a reasonable proposal for a computation
    model!

123
Unraveling non-RPP (repeat)
  • L is not in RPP if
  • For every p0 (opponent choice)
  • There exists some s2L satisfying s p such
    that (your choice)
  • For every x,y,z 2 ? satisfying 1-3
  • sxyz,
  • ygt0, and
  • xy p
  • There exists some i 0 for which
  • x yi z L

(opponent's)
(yours)
124
Another example
  • Let C 0m 1n m ? n . Is C regular? Try to
    prove it isn't
  • Set s0p 12p. If opponent chooses x?, y0p,
    z12p, then we can set i2 and win because
    xy2z02p 12p C.
  • What if opponent chooses a shorter y?
  • Looks like it's relatively easy to be a member of
    C and hard to not be a member of C
  • Can force opponent to choose y 2 0
  • So try to arrange it so that no matter what y
    is, some number of repetitions of it will match
    the target number of '1's

125
Direct proof?
  • Hmmm

126
Using closure properties
  • Can simplify argument a great deal
  • Fact If L is not regular then Lc is not regular
    either.
  • Proof If L is not regular but Lc were regular,
    then (Lc)c would also be regular because REG is
    closed under complement. But (Lc)c L QED
  • Recall the languagesB 0m 1n m n C
    0m 1n m ? n C is similar to Bc...

127
Using closure properties
  • Start over
  • B 0m 1n m n (known nonreg)C 0m 1n
    m ? n (suspected nonreg)
  • Certainly B µ Cc
  • If mn then it's true that (not m ? n)
  • But B ? Cc
  • Find example x 2 Cc - B...
  • On the other hand, B 01 Å Cc

128
Using closure properties
  • Fact If L1ÅL2 REG and L1 2 REG, then L2 REG
  • Proof Suppose (a) L1Å L2 REG and L12 REG and
    (b) L22REG. Since REG is closed under Å we know
    that L1ÅL2 2 REG, but that contradicts assumption
    (a). Thus (a) and (b) can't both be true. QED

129
Topics for Exam 1
  • Basic objects
  • The main hierarchy alphabets, strings,
    languages, classes
  • Functions
  • Relations
  • Sets and operations on sets
  • , Å, complement, , P(S), A-B, S
  • µ, 2
  • element predicate(element)
  • Propositional and predicate logic
  • 8 and 9

130
Topics for Exam 1
  • Strings
  • ? versus
  • Operations on strings concatenation,
    exponentiation, reversal
  • Languages
  • Operations concatenation, exponentiation,
    reversal, , Å, , complement, everything
    applicable to sets, ? versus
  • Language classes
  • FIN, REG, ALL

131
Topics for Exam 1
  • REG and its many formulations
  • DFA, NFA, GNFA, UFA, REX
  • Syntax and semantics of each model
  • L() as program-to-language operator
  • Conversions between models
  • Subset construction for NFA, UFA
  • DFA ! GNFA ! REX
  • REX ! NFA

132
Topics for Exam 1
  • Closure properties of language classes
  • REG as a reasonable model of computation
  • Arguments for, against
  • Homework problems through homework 3
  • Lectures reading up through section 1.3
    (excluding nonregularity)

133
Exam 1
  • You may bring and consult a single-sided,
    handwritten sheet of notes, which you must turn
    in with the exam (and will get back later)

134
Applying these closure properties
  • B 0m 1n m n C 0m 1n m ? n
  • 01 Å Cc B
  • Thus C is nonregular too

obviously regular
known to be nonregular
therefore nonregular
135
Another closure properties attempt
  • B 0m 1n m n 0n 1n n 0
    (known nonreg)
  • BB 0n1n 0m1m n,m 0
  • Want to show that BB REG
  • We know that REG is closed under language
    concatenation. What does that say about whether
    BB is regular or not?
  • Is the class of non-regular languages (REGc)
    closed under language concatenation too?

136
No
  • Let ? a and D an2 n 2
  • Then Dc ak k 1 or k is not a square
    ?, a1,a2,a3,a5,a6,a7,a8,a10,?
  • We previously proved that D REG
  • Thus Dc REG (by "fact" we proved)
  • But Dc Dc a 2 REG !!!
  • Thus REGc is not closed under language
    concatenation

137
Back to problem
  • B 0m 1n m n (known nonreg)BB 0n1n
    0m1m n,m 0
  • Want to show that BB REG
  • But there's no general result for that
  • When applying a closure property, you have to
    make sure it's true!
  • Nonetheless, it is true that BB REG
  • Because (BB) Å 01 B

138
Chapter 1 closing considerations
  • We don't and won't have many results about the
    class REGc
  • Being nonregular says that the language lacks a
    certain type of structure it's more complicated
    than a DFA can handle
  • All real computers are finite devices and all
    finite languages are regular
  • Yet the programming models are brittle the
    program has to change for larger and larger
    inputs
  • We've seen some easy-to-specify languages that
    aren't regular
  • So REG is not a good general-purpose programming
    model...?
Write a Comment
User Comments (0)
About PowerShow.com