Language Learning Week 2 - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Language Learning Week 2

Description:

Adriaans P., van Dungen M., A method for automatically ... extremely long and complex, since the lark can emit up to 700 different sound units (syllables) ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 31
Provided by: padr1
Category:
Tags: language | lark | learning | week

less

Transcript and Presenter's Notes

Title: Language Learning Week 2


1
Language Learning Week 2
Pieter Adriaans pietera_at_science.uva.nl Sophia
Katrenko katrenko_at_science.uva.nl
2
Contents Week 2
  • Dec grammars
  • And Rational numbers
  • And Music
  • Learning and decoding
  • N-grams
  • Tri-grams
  • Digital Bluff poker

3
Contents Week 2
  • Dec grammars
  • And Rational numbers
  • And Music
  • Learning and decoding

4
Start of this research 2001 Patents
  • Adriaans P., van Dungen M., A method for
    automatically controlling electronic musical
    devices by means of real-time construction and
    search of a multi-level data structure, European
    Patent no. 1.062656, United States Patent
    6313390, (2001)
  • Reduction of time complexity of learning DEC
    grammars from O(n2) to O(n log n).

5
DEC Grammars 1 (Kohonen)
  • Dynamically expanding context grammars
  • A rule of a DEC-grammar has the following form X
    gt y
  • X is a context, y is a symbol
  • Rules are deterministic
  • Let be a start symbol
  • abababababababab.
  • Is described by
  • gt a
  • a gt b
  • b gt a

6
DEC Grammars 2
  • abdbebabdbebabdbebabdbebab..
  • gt a
  • a gt b
  • b gt d
  • d gt b
  • b gt e Conflict 1 Expand context of b

7
DEC Grammars 2
  • abdbebabdbebabdbebabdbebab..
  • gt a
  • a gt b
  • ab gt d
  • d gt b
  • db gt e Conflict 1 Expand context of b

8
DEC Grammars 2
  • abdbebabdbebabdbebabdbebab..
  • gt a
  • a gt b
  • ab gt d
  • d gt b
  • db gt e Conflict 1 Expand context of b
  • e gt b
  • b gt a Conflict 2 Expand context of b

9
DEC Grammars 2
  • abdbebabdbebabdbebabdbebab..
  • gt a
  • a gt b
  • ab gt d
  • d gt b
  • db gt e Conflict 1 Expand context of b
  • e gt b
  • eb gt a Conflict 2 Expand context of b
  • Grammar is stable from now on

10
DEC Grammars 2
  • abdbebabdbebabdbebabdbebab..
  • is described by the DEC-grammar
  • gt a
  • a gt b
  • ab gt d
  • d gt b
  • db gt e
  • e gt b
  • eb gt a

11
Definition
  • An infinite string s can be learned by a
    deterministic DEC grammar if there is a constant
    c such that after scanning s_c the rule set P
    stabilizes, i.e. P has a finite set of rules with
    finite heads.

12
DEC grammars 3
  • What class of languages?
  • How can we learn them?
  • How can we generate them?
  • Any practical use?

13
DEC grammars (1)
  • A deterministic connected DEC grammar is a triple
    lt?, P, Sgt where? is a finite set of terminalsS
    ?N is a start symbolP is a finite set consisting
    of the initial rule S ? ? and a finite set of
    concatenation rules of the form ??? where ???
    and ??? . If we have a string
  • ? ? ? then the rule ??? allows us to create ? ?
    ?? ? (? in ? ).

14
DEC grammars (2)
  • The rules ??? are deterministic, connected and
    minimal.
  • Deterministic no ? is a prefix of any other ?
    nor identical.
  • Connected for each finite s string that can be
    constructed from the start symbol there is a rule
    ???? ? such that ? is a postfix s.
  • Minimal there are no superfluous rules.

15
Numbers
  • N (Z) natural numbers (1,2,3,)
  • Z Whole numbers or integers (-3,-2,-1,0,1,2,3,
    )
  • Q Rational numbers (expressible as p/q where p
    and q are integers)
  • R Real numbers
  • Let A be a finite alphabet. We define
  • A8 The set of all infinite strings consisting of
    elements of A.
  • 0,1) ? R The interval between 0 (included) and 1
    (not included).
  • Important are the set of finite binary strings
    0,1 and decimal strings 0,1,2,3,4,5,6,7,9.

16
Facts and definitions
  • There are terminating decimals like 5/8 0.625
    and non-terminating decimals like 1/3 0.333
    With each terminating decimal we will associate
    an infinite string with a tail of zeros like
    5/8 0.625000 The non-terminating decimals can
    be periodic, i.e. 2/7 285714 where the block
    of digits 285714 repeats itself indefinitely.
  • We will call a decimal fraction 0.n semi-periodic
    if it is either finite or periodic with an
    initial arbitrary segment.
  • 0.4623098239098423098456456456456456456456.

17
Facts and definitions
  • Fact There is a one to one correspondence
    between semi-periodic equivalence classes of
    elements of A8 and 0,1) ? R. Each semi periodic
    equivalence class in A8 corresponds with a unique
    fraction in a number system with base (radix) A.
  • Semi-periodic equivalence 1 and
    0.999999. 0.123 and 0.1229999999

18
Theorem 0.n is a semi-periodic decimal fraction
iff 0.n is rational.
  • Proof (If) We first take the simple periodic
    case. Suppose 0.n is a periodic decimal with a
    repeating block d with length dl. In this case
    0.n d/(10l-1) which is a rational number. The
    semi-periodic case follows from the fact that Q
    is closed for addition and subtraction.
  • Examples 0.123123123123 123/999
    0.456745674567 4567/9999
  • (Only if) Suppose 0.n is rational. We have p/q
    0.n. Consider the standard division algorithm. A
    division by q will have at most q-1 rest values.
    Dividing by q therefore is either terminating or
    has a repeating block of at most q-1 digits, i.e.
    it is semi-periodic.
  • Example 1/7 0.142847142847142847 (142847 has
    length 6)

19
Central Theorem
  • Theorem A string can be learned by a
    deterministic DEC grammar iff it is
    semi-periodic.
  • Proof (If) Suppose a string s is deterministic
    DEC learnable. Since the rules in P in the limit
    are finite deterministic and connected there must
    be a point is s where the rules start to loop,
    i.e. s must be semi-periodic.
  • (Only if)
  • 0.123987134584983248034534534534534534534534534534
    534534.
  • 0.1239871345849832480 345345345345345345
    34534534534534534.

Non-Periodic part of length d (all new dec
rules)
Transition part of length d (some new dec rules)
Periodic part of Infinite length (no new dec
rules)
20
Conclusions Learning DEC grammars learning
rational numbers.
  • Each deterministic DEC grammar can be described
    by a rational number in a number system with the
    lexicon as its base. Efficiently learnable in
    terms of the length of the input.
  • Rather Deep Problem is there an upperbound to
    the time complexity in terms of the complexity of
    the DEC grammar itself? Guess at least super
    exponential. Example 1/7 0.142847142847142847
    (142847 has length 6) I only need to know
    (\log 7 O(1)) bits to create the semi random set
    142847, i.e. DEC grammars could make great
    deterministic pseudo number generators which
    make them hard to learn.
  • Further work non-deterministic DEC grammars,
    stochastic DEC grammars DEC grammars and ?
    languages Bird songs etc.
  • There is beauty in numbers!!

21
Some theorems and lemmas
  • Theorem 0.n is a semi-periodic decimal fraction
    if and only if 0.n is rational.
  • Each DEC grammar generates a unique infinite
    string from the initial symbol.
  • Theorem A string can be learned by a
    deterministic DEC grammar if and only if it is
    semi-periodic.
  • Lemma An infinite string is deterministic DEC
    learnable if and only if there exists a rational
    number that describes that string.
  • Consequence Each deterministic DEC grammar can
    be described by a rational number in a number
    system with the lexicon as its base.

22
Level tree
23
  • Demo!

24
Skylarks http//eurise.univ-st-etienne.fr/tantin
i/Skylarks/
  • Birds produce songs made up sound sequences of
    units called syllables. In a song, a syllable can
    be produced once or several times.The skylark
    Alauda arvensis is an Oscine of the family of
    Alaudidae which has the characteristic to produce
    a territorial proclamation song extremely long
    and complex, since the lark can emit up to 700
    different sound units (syllables). Biologists
    work on the phenomenon of micro-dialect or
    micro-geographical variation components of the
    song are learned, transmitted and shared by the
    members of a group of neighbors.

25
The square root of 2 is irrational
  • Assume r p/q
  • r2 2
  • p/q is irreducible, i.e. p and q have no common
    divisors.
  • We have
  • (p/q)2 2
  • p2 2 q2
  • p2 is divisible by 2 consequently p is divisible
    by 2 (Euclid VII, 30).
  • Assume
  • p 2s
  • p2 (2s)2 2q2
  • 2s2 q2
  • q2 is divisible by 2 consequently q is
    divisible by 2 (Euclid VII, 30).
  • Thus p and q are both divisible by 2. But then
    p/q is reducible. Contradiction.

26
Learning and coding
  • Coding a message with a key
  • Decoding a message with a key
  • Reconstructing the key from the messagesSort of
    learning

27
Spying in the 17th century Hemsterhuis
28
Spying in the 17th century Hemsterhuis
  • Ma Toute Chere diotime permettez que jimplore
    serieusement le secours de Votre sagesse en
    faveur dune tres digne femme dune famille
    assez nombreuse peut-être la plus malheureuse
    qui existe. Voici leur triste histoire en peut de
    mots, laquelle comme Vous sentirez aisement, je
    ne scourois ( ?) confier à Ame qui viva quà Vous
    seule au Gr. H.
  • 15,16. 56 23,31. 33,35,37,31,51,68 
    20,2,6,27,74,35,34. 19,73,60. 23,29. 54,52.
    81,26,5,42. 74,13. 14. 23,32. 81. 56, 43,44,18.
    35,21,25,51,55,57. 27,9,12. 72,77,52,19,14,16,17.
    19,15. 61,ii,42. 56,57,43,33,34,7i,54,55 par
    plusieurs circonstances jointes à l
    2,31,23,41,76,77,61,5,6,50,45,47. 23,37.
    i8,9,i9., 36,37,38. ce 82.50,38. 20,60,48,35,26.
    39,40. 52. 30,3i,32. composition quelconque
    29,31,42,35,16. 15,16,12. 75,61,62,63. 81,52.
    18,84,41,22. Il seroit 1,9,5,42. 62,83,41,54,55.
    36,30,47. 59,58. 82. fût 60,50,51,42,5,13,19,84.
    36,37,58. suivant le jugement les plus experts
    apres le plus scrupuleux examen 2,15. 27.66. 34.
  • 57,60,21,31. 23,6. 71,49,50. 72. 77,52,2,5,58.
    60,59,19. 39,40. 26. 57,58,42,52,33,15,19,14.
    79,72. 59,49,50,22,42,41,83,62,84,60,9,31.
    exactement dans letat ou elle etoit
    26,85,72,50,83,15,16,ii. 42,18,43,44,33,54,38,17.
    73,43,57,5,2,28,47,14. 75,21,22. i,26,4,42,47,48.
    est posterioris curae. 19,15. 12,6,5,9,i9,42.
    10,83,41,79,55. .23,16. 15,13,2. faire sentir
    54,34,25,79,49,19,14,16. qui devront suivre
    15,16,18,21,22,83,34,37,35,34,42,47,62,57.
    23,30,31,32. 57,58,56,40,33. aussi
    45,29,54,55,71,18,55. que celle ci  70. 17,41.
    45,47. 56. put denirer 2,59,60.
    36,37,38,15,39,40,32,48. 81,29,18,ii,49,50,31,32,1
    7. pour 22. 55,59,54,52,41,5,6,14. sur la
    probite, lexperience les lumieres.
    23,21,22,39,37,32,15,54,6,12. 41,79.
    81,43,44,57,49,19,42. 17,16. 1,2,61,18.
    parfaitement 86. 11,26,27,12. 34,37,45,30,31,32.
    14,16,17,6,18,85,55. on pourrait lui
    32,31,19,27,75,60,36,10,6,5. 3 20. 74,58,57.
    46,43,9,81. 48,47,45,35  23,61,54,34.
    45,46,26,65,33,35,i6, de 79 10,31,2,49,50.
  • 52,66,15,20,72. 80,55,65,53,35,16 tres essentiel
    74,16. 15. 64. 82. 86. 85,26,27. 23,16,14.
    22,56,19,61,28,29,79. 25,35,26,27,23.
    81,32,50,17,19,49,50,34,60,57,61. 74,55.
    7,6,79,72,3 ?,23,16. pour 73,72,80,82,55,57.
    actuellement sans comparaison 81,5,6,65,2,6,5.
    23,62. 45,49,50,51  75 29,42,26,83. 19,54. le
    connait. Il serait utile que ces
    41,50,17,84,35,10,45,83,19,49,50,48. puissent
    lui être données 56,72,5. 37,50,56,58,57,11,9,27,2
    6,25,21. 23,6. 81,43,19,74,12. 70 comme je me
    rapelle si je ne me trompe que 54,38. 28.46. a
    des relations particulieres avec
    59,38,56,14,19,50,45,21. ne serait il pas
    possible que dans loccasion il vaudroit sy
    employer en notre faveur ou quil put donner
    quelques ordres à cette fin Voila bien du
    malheur sans doute quon ne scauroit attribuer
    avec justice, ni à la negligence de ce pauvre
    viellard, ne à la tendre sensibilité de Madame,
    ni même à la bizarrerie de la conduite de
    lEpoux. mais enfin est il evident Ma diotime,
    que tout cet 61,63,21,57,28,30,29. nest quun
    50,9,31. 12,32,3i,ii. pour defendre 80,47,48.
    45,46,60,i,77,5,21,22. de l19,27,23,2,17,45,35,32
    ,42, 63,43,50. de 50,49,48. 66, 43,44,19,17. adieu

29
Spying in the 17th century Hemsterhuis
  • 15,16. 56 23,31. 33,35,37,31,51,68 
    20,2,6,27,74,35,34. 19,73,60. 23,29. 54,52.
    81,26,5,42. 74,13. 14. 23,32. 81. 56, 43,44,18.
    35,21,25,51,55,57. 27,9,12. 72,77,52,19,14,16,17.
    19,15. 61,ii,42. 56,57,43,33,34,7i,54,55 par
    plusieurs circonstances jointes à l
    2,31,23,41,76,77,61,5,6,50,45,47. 23,37.
    i8,9,i9., 36,37,38. ce 82.50,38. 20,60,48,35,26.
    39,40. 52. 30,3i,32. composition quelconque
    29,31,42,35,16. 15,16,12. 75,61,62,63. 81,52.
    18,84,41,22. Il seroit 1,9,5,42. 62,83,41,54,55.
    36,30,47. 59,58. 82. fût 60,50,51,42,5,13,19,84.
    36,37,58. suivant le jugement les plus experts
    apres le plus scrupuleux examen 2,15. 27.66. 34.

30
Spying in the 17th century Hemsterhuis
  • 1 2 3 4 5 6 7 8 9 10
  • f i g u r e z v o u
  • 11 12 13 14 15 16 17 18 19 20
  • s s u r l e s r i v
  • 21 22 23 24 25 26 27 28 29 30
  • e s d u g a n g e u
  • 31 32 33 34 35 36 37 38 39 40
  • n e b a r q u e q u
  • 41 42 43 44 45 46 47 48 49 50
  • i t o u c h e s o n
  • 51 52 53 54 55 56 57 58 59 60
  • s a b l e p r e c i
  • 61 62 63 64 65 66 67 68 69 70
  • e u x h m j z w k
  • 71 72 73 74 75 76 77 78 79 80
  • b a c d d f f k l m
  • 81 82 83 84 85 86
  • p p t t t

31
Spying in the 17th century Hemsterhuis
  • l e . p . d e . b r u n s w v i e n d r a . i c
    i . d e . l a . p a r t . d u . r . d e . p . p o
    u r . r e g l e r . n o s . a f a i r e s . i l .
    e s t . p r o b a b l e par plusieurs
    circonstances jointes à l ' i n d i f f e r e n c
    e . d u . r o i . q u e . ce s . n e . v i s r a
    . q u . a . u n e . composition quelconque e n t
    r e . l e s . d e u x . p a . r t i s . il seroit
    f o r t . u t i l e . q u e . c e . s . f û t i n
    s t r u i t . q u e . suivant le jugement les
    plus experts apres le plus scrupuleux examen i l
    . n . l . a . r i e n . d e . b o n . a . f a i r
    e . i c i . q u . a . r e t a b l i r . l a . c o
    n s t i t u t i o n .

32
N-grams 1
  • . An n-gram model of a text consists of the
    collection of all sequences of n words occurring
    in the text.
  • The basic idea behind n-gram models is that in a
    sequence of n words w1,w2,w3,,wn-1,wn, the
    sequence w1,w2,w3,,wn-1 predicts the occurrence
    of word wn
  • Deterministic DEC grammars can generate only one
    structure, but n-gram models can generate
    (sometimes infinite) sets of sentences.

33
N-grams 2
  • John owns a dog.
  • John sees a cat.
  • Bi-gram model of sample (CONTEXT, WORD,
    FREQUENCY)
  • ( ,John , 2)
  • (John ,owns , 1)
  • (Owns ,a , 1)
  • (A ,dog , 1)
  • (John ,sees , 1)
  • (sees ,a , 1)
  • (a ,cat , 1)

34
N-grams 3
  • Tri-gram model of sample (CONTEXT, WORD,
    FREQUENCY)
  • ( ,John , 2)
  • ( John ,owns , 1)
  • ( John ,sees , 1)
  • (Owns a ,dog , 1)
  • (John sees ,a , 1)
  • (sees a ,cat , 1)

35
Tri-grams 1
  • w-1,w0 John owns a dog.
  • w-1,w0 John sees a cat.
  • Tri-gram model of sample (CONTEXT, WORD,
    PROBABILITY)
  • (w-1,w0 ,John , 1)
  • (w0 John ,owns , 0.5)
  • (w0 John ,sees , 0.5)
  • (Owns a ,dog , 1)
  • (Sees a ,cat , 1)
  • (John sees ,a , 1)
  • (sees a ,cat , 1)

36
Tri-grams 2
  • P(wnw1,wn-1) P(wn wn-2,wn-1)
  • For the total probabilities of a sentence of n
    words we have (we will write (w1,n-1) for
    (w1,,wn-1))
  • P(w1,n) P(w1)P(w2w1)P(w3w1,2)
    P(wnw1,n-1) P(w1)P(w2w1)P(w3w1,2)
    P(wnwn-2,n-1) P(w1)P(w2w1)?ni3P(wi,wi-2,i-1)
    ?ni1P(wi,wi-2,i-1)(if we add two vacuous
    words w-1 and w0 before each sentence)

37
Tri-grams 3
  • We can calculate the probability for the tri-gram
    w1,w2,w3 by counting the number of occurrences of
    the sequence w1,w2,w3 in the text and divide this
    number by the number of occurrences of the
    di-gram w1,w2 in the text. P(w3 w1,w2)
    (w1,w2,w3) / (w1,w2).
  • We now have P(cat sees a) (sees a cat) /
    (sees a), where (x) is the number of
    occurrences of x in the text.
  • We can use these tri-gram probabilities to assign
    a probability to a complete sentence.
  • We can also use them during the process of
    sentence generation with a tri-gram grammar.

38
Digital bluff poker Uncooperative teachers 1
39
Digital bluff poker Uncooperative teachers 2
  • Opponent and proponent make alternative hidden
    moves 0 or 1 and then check the result.
  • Proponent 1 Opponent 1 Proponent pays Opponent
    1,-
  • Proponent 0 Opponent 0 Proponent pays Opponent
    1,-
  • Proponent 0 Opponent 1 Opponent pays Proponent
    1,-
  • Proponent 1 Opponent 0 Opponent pays Proponent
    1,-
  • Observation As soon as one of the players uses a
    system that is known to the other player then the
    last one can win, provided he has enough
    computing power.
  • Ergo Learning task

40
Digital bluff poker Uncooperative teachers 3
  • Observation If one of the players makes only
    random moves P(0) P(1) 0.5 there is no winning
    strategy.
  • Proof If proponent plays 1 and opponent plays
    random P(0) P(1) 0.5 then the probability of
    winning or loosing P(1,0) P(1,1) 0.5 If
    proponent plays 0 and opponent plays random P(0)
    P(1) 0.5 then the probability of winning or
    loosing P(0,1) P(0,0) 0.5.

41
Digital bluff poker Uncooperative teachers 4
  • In this case the exchange of money between the
    players is a random walk in one dimension.
  • The expected payoff for one of the players is the
    square root of 2n/?, where n is the number of
    turns.
  • Surprisingly, the most probable number of sign
    changes in a walk is 0, followed by 1, then 2,
    etc.

42
Contents Week 2
  • Dec grammars
  • And Rational numbers
  • And Music
  • Learning and decoding

43
Contents Week 2
  • Dec grammars
  • And Rational numbers
  • And Music
  • Learning and decoding
  • N-grams
  • Tri-grams
  • Digital Bluff poker
Write a Comment
User Comments (0)
About PowerShow.com