Title: Colin de la Higuera
1Grammatical inference techniques and algorithms
Colin de la Higuera
2Acknowledgements
- Laurent Miclet, Tim Oates, Jose Oncina, Rafael
Carrasco, Paco Casacuberta, Pedro Cruz, Rémi
Eyraud, Philippe Ezequel, Henning Fernau,
Jean-Christophe Janodet, Thierry Murgue, Frédéric
Tantini, Franck Thollard, Enrique Vidal,... - and a lot of other people to whom I am grateful
3Outline
- 1 An introductory example
- 2 About grammatical inference
- 3 Some specificities of the task
- 4 Some techniques and algorithms
- 5 Open issues and questions
41 How do we learn languages?
5The problem
- You are in an unknown city and have to eat.
- You therefore go to some selected restaurants.
- Your goal is therefore to build a model of the
city (a map).
6The data
- Up Down Right Left Left ? Restaurant
- Down Down Right ? Not a restaurant
- Left Down ? Restaurant
7Hopefully something like this
u,r
N
u
d
R
d,l
N
u
R
d
r
8(No Transcript)
9Further arguments (1)
- How did we get hold of the data?
- Random walks
- Following someone
- someone knowledgeable
- Someone trying to lose us
- Someone on a diet
- Exploring
10Further arguments (2)
- Can we not have better information (for example
the names of the restaurants)? - But then we may only have the information about
the routes to restaurants (not to the non
restaurants)
11Further arguments (3)
- What if instead of getting the information
Elimo or restaurant, I get the information
good meal or 7/10?
Reinforcement learning POMDP
12Further arguments (4)
- Where is my algorithm to learn these things?
- Perhaps should I consider several algorithms for
the different types of data?
13Further arguments (5)
- What can I say about the result?
- What can I say about the algorithm?
14Further arguments (6)
- What if I want something richer than an
automaton? - A context-free grammar
- A transducer
- A tree automaton
15Further arguments (7)
- Why do I want something as rich as an automaton?
- What about
- A simple pattern?
- Some SVM obtained from features over the strings?
- A neural network that would allow me to know if
some path will bring me or not to a restaurant,
with high probability?
16Our goal/idea
- Old Greeks
- A whole is more than the sum of all parts
- Gestalt theory
- A whole is different than the sum of all parts
17Better said
- There are cases where the data cannot be analyzed
by considering it in bits - There are cases where intelligibility of the
pattern is important
18What do people know about formal language theory?
Nothing
Lots
19A small reminder on formal language theory
- Chomsky hierarchy
- and of grammars
20A crash course in Formal language theory
- Symbols
- Strings
- Languages
- Chomsky hierarchy
- Stochastic languages
21Symbols
- are taken from some alphabet ?
Strings
are sequences of symbols from ?
22Languages
- are sets of strings over ?
Languages
are subsets of ?
23Special languages
- Are recognised by finite state automata
- Are generated by grammars
24DFA Deterministic Finite State Automaton
25b
a
a
a
b
b
abab?L
26What is a context free grammar?
- A 4-tuple (S, S, V, P) such that
- S is the alphabet
- V is a finite set of non terminals
- S is the start symbol
- P ? V ? (V?S) is a finite set of rules.
27Example of a grammar
- The Dyck1 grammar
- (S, S, V, P)
- S a, b
- V S
- P S ? aSbS, S ? ?
28Derivations and derivation trees
S
- S ? aSbS
- ? aaSbSbS
- ? aabSbS
- ? aabbS
- ? aabb
a
b
S
S
?
a
b
S
S
?
?
29Chomsky Hierarchy
- Level 0 no restriction
- Level 1 context-sensitive
- Level 2 context-free
- Level 3 regular
30Chomsky Hierarchy
- Level 0 Whatever Turing machines can do
- Level 1
- anbncn n? ?
- anbmcndm n,m? ?
- uu u??
- Level 2 context-free
- anbn n? ?
- brackets
- Level 3 regular
- Regular expressions (GREP)
31The membership problem
- Level 0 undecidable
- Level 1 decidable
- Level 2 polynomial
- Level 3 linear
32The equivalence problem
- Level 0 undecidable
- Level 1 undecidable
- Level 2 undecidable
- Level 3 Polynomial only when the representation
is DFA.
33b
a
b
a
a
b
PFA Probabilistic Finite (state) Automaton
340.1
b
0.9
a
a
0.35
0.7
a
0.7
b
0.65
b
0.3
0.3
DPFA Deterministic Probabilistic Finite (state)
Automaton
35What is nice with grammars?
- Compact representation
- Recursivity
- Says how a string belongs, not just if it belongs
- Graphical representations (automata, parse trees)
36What is not so nice with grammars?
- Even the easiest class (level 3) contains SAT,
Boolean functions, parity functions - Noise is very harmful
- Think about putting edit noise to language w
wa02?wb02
372 Specificities of grammatical inference
- Grammatical inference consists (roughly) in
finding the (a) grammar or automaton that has
produced a given set of strings (sequences,
trees, terms, graphs).
38The field
Inductive Inference
Pattern Recognition
Machine Learning
Grammatical Inference
Computational linguistics
Computational biology
Web technologies
39The data
- Strings, trees, terms, graphs
- Structural objects
- Basically the same gap of information as in
programming between tables/arrays and data
structures
40Alternatives to grammatical inference
- 2 steps
- Extract features from the strings
- Use a very good method over ?n.
41Examples of strings
- A string in Gaelic and its translation to
English - Tha thu cho duaichnidh ri èarr àirde de a
coisich deas damh
- You are as ugly as the north end of a southward
traveling ox
42(No Transcript)
43(No Transcript)
44gtA BAC41M14 LIBRARYCITB_978_SKB AAGCTTATTCAATAGT
TTATTAAACAGCTTCTTAAATAGGATATAAGGCAGTGCCATGTA GTGGA
TAAAAGTAATAATCATTATAATATTAAGAACTAATACATACTGAACACTT
TCAAT GGCACTTTACATGCACGGTCCCTTTAATCCTGAAAAAATGCTAT
TGCCATCTTTATTTCA GAGACCAGGGTGCTAAGGCTTGAGAGTGAAGCC
ACTTTCCCCAAGCTCACACAGCAAAGA CACGGGGACACCAGGACTCCAT
CTACTGCAGGTTGTCTGACTGGGAACCCCCATGCACCT GGCAGGTGACA
GAAATAGGAGGCATGTGCTGGGTTTGGAAGAGACACCTGGTGGGAGAGG
GCCCTGTGGAGCCAGATGGGGCTGAAAACAAATGTTGAATGCAAGAAAAG
TCGAGTTCCA GGGGCATTACATGCAGCAGGATATGCTTTTTAGAAAAAG
TCCAAAAACACTAAACTTCAA CAATATGTTCTTTTGGCTTGCATTTGTG
TATAACCGTAATTAAAAAGCAAGGGGACAACA CACAGTAGATTCAGGAT
AGGGGTCCCCTCTAGAAAGAAGGAGAAGGGGCAGGAGACAGGA TGGGGA
GGAGCACATAAGTAGATGTAAATTGCTGCTAATTTTTCTAGTCCTTGGTT
TGAA TGATAGGTTCATCAAGGGTCCATTACAAAAACATGTGTTAAGTTT
TTTAAAAATATAATA AAGGAGCCAGGTGTAGTTTGTCTTGAACCACAGT
TATGAAAAAAATTCCAACTTTGTGCA TCCAAGGACCAGATTTTTTTTAA
AATAAAGGATAAAAGGAATAAGAAATGAACAGCCAAG TATTCACTATCA
AATTTGAGGAATAATAGCCTGGCCAACATGGTGAAACTCCATCTCTAC T
AAAAATACAAAAATTAGCCAGGTGTGGTGGCTCATGCCTGTAGTCCCAGC
TACTTGCGA GGCTGAGGCAGGCTGAGAATCTCTTGAACCCAGGAAGTAG
AGGTTGCAGTAGGCCAAGAT GGCGCCACTGCACTCCAGCCTGGGTGACA
GAGCAAGACCCTATGTCCAAAAAAAAAAAAA AAAAAAAGGAAAAGAAAA
AGAAAGAAAACAGTGTATATATAGTATATAGCTGAAGCTCCC TGTGTAC
CCATCCCCAATTCCATTTCCCTTTTTTGTCCCAGAGAACACCCCATTCCT
GAC TAGTGTTTTATGTTCCTTTGCTTCTCTTTTTAAAAACTTCAATGCA
CACATATGCATCCA TGAACAACAGATAGTGGTTTTTGCATGACCTGAAA
CATTAATGAAATTGTATGATTCTAT
45(No Transcript)
46(No Transcript)
47(No Transcript)
48 ltbookgt ltpartgt ltchaptergt
ltsect1/gt ltsect1gt
ltorderedlist numeration"arabic"gt
ltlistitem/gt ltffragbody/gt
lt/orderedlistgt lt/sect1gt
lt/chaptergt lt/partgt lt/bookgt
49lt?xml version"1.0"?gtlt?xml-stylesheet
href"carmen.xsl" type"text/xsl"?gtlt?cocoon-proce
ss type"xslt"?gt lt!DOCTYPE pagina lt!ELEMENT
pagina (titulus?, poema)gtlt!ELEMENT titulus
(PCDATA)gtlt!ELEMENT auctor (praenomen, cognomen,
nomen)gtlt!ELEMENT praenomen (PCDATA)gtlt!ELEMENT
nomen (PCDATA)gtlt!ELEMENT cognomen
(PCDATA)gtlt!ELEMENT poema (versus)gtlt!ELEMENT
versus (PCDATA)gtgt ltpaginagtlttitulusgtCatullus
IIlt/titulusgtltauctorgtltpraenomengtGaiuslt/praenomengt
ltnomengtValeriuslt/nomengtltcognomengtCatulluslt/cogno
mengtlt/auctorgt
50(No Transcript)
51A logic program learned by GIFT
color_blind(Arg1) - start(Arg1,X), p11(
Arg1,X). start(X,X). p11(Arg1,P) -
mother(M,P),p4(Arg1, M). p4(Arg1,X) -
woman(X),father(F,X),p3(Arg1,F). p4(Arg1,X) -
woman(X),mother(M,X),p4(Arg1,M). p3(Arg1,X) -
man(X),color_blind(X).
523 Hardness of the task
- One thing is to build algorithms, another is to
be able to state that it works. - Some questions
- Does this algorithm work?
- Do I have enough learning data?
- Do I need some extra bias?
- Is this algorithm better than the other?
- Is this problem easier than the other?
53Alternatives to answer these questions
- Use well admitted benchmarks
- Build your own benchmarks
- Solve a real problem
- Prove things
54Use well admitted benchmarks
- yes allows to compare
- no many parameters
- problem difficult to better (also, in GI, not
that many about!)
55Build your own benchmarks
- yes allows to progress
- no against one-self
- problem one invents the benchmark where one is
best!
56Solve a real problem
- yes it is the final goal
- no we dont always know why things work
- problem how much pre-processing?
57Theory
- Because you may want to be able to say something
more than seems to work in practice .
58Identification in the limit
A class of languages
yields
L
Pres
? ??X
?
L
A learner
The naming function
G
A class of grammars
L(?(f))yields(f)
f(?)g(?) ?yields(f)yields(g)
59L is identifiable in the limit in terms of G from
Pres iff ?L?L, ?f? Pres(L)
f1
f2
fn
fi
h1
h2
hn
hi
? hn
L(hi) L
60- No quería componer otro Quijote lo cual es
fácil sino el Quijote. Inútil agregar que no
encaró nunca una transcripción mecánica del
original no se proponía copiarlo. Su admirable
ambición era producir unas páginas que
coincidieran palabra por palabra y línea por
línea con las de Miguel de Cervantes. -
- Mi empresa no es difícil, esencialmente leo
en otro lugar de la carta. Me bastaría ser
inmortal para llevarla a cabo. - Jorge Luis Borges(18991986)
- Pierre Menard, autor del Quijote (El jardín de
senderos que se bifurcan) Ficciones
614 Algorithmic ideas
62The space of GI problems
- Type of input (strings)
- Presentation of input (batch)
- Hypothesis space (subset of the regular grammars)
- Success criteria (identification in the limit)
63Types of input
64Types of input - oracles
- Membership queries
- Is string S in the target language?
- Equivalence queries
- Is my hypothesis correct?
- If not, provide counter example
- Subset queries
- Is the language of my hypothesis a subset of the
target language?
65Presentation of input
- Arbitrary order
- Shortest to longest
- All positive and negative examples up to some
length - Sampled according to some probability distribution
66Presentation of input
- Text presentation
- A presentation of all strings in the target
language - Complete presentation (informant)
- A presentation of all strings over the alphabet
of the target language labeled as or -
67Hypothesis space
- Regular grammars
- A welter of subclasses
- Context free grammars
- Fewer subclasses
- Hyper-edge replacement graph grammars
68Success criteria
- Identification in the limit
- Text or informant presentation
- After each example, learner guesses language
- At some point, guess is correct and never changes
- PAC learning
69Theorems due to Gold
- The good news
- Any recursively enumerable class of languages can
be learned in the limit from an informant (Gold,
1967) - The bad news
- A language class is superfinite if it includes
all finite languages and at least one infinite
language - No superfinite class of languages can be learned
in the limit from a text (Gold, 1967) - That includes regular and context-free
70A picture
Mildly context sensitive, from queries
DFA, from queries
A lot of information
DFA, from posneg
Context-free, from pos
Sub-classes of reg, from pos
Little information
Poor languages
Rich Languages
71Algorithms
RPNI
K-Reversible
L
SEQUITUR
GRIDS
724.1 RPNI
- Regular Positive and Negative Grammatical
Inference - Identifying regular languages in polynomial time
- Jose Oncina Pedro García 1992
73- It is a state merging algorithm
- It identifies any regular language in the limit
- It works in polynomial time
- It admits polynomial charac-teristic sets.
74The algorithm
- function rmerge(A,p,q)
- A merge(A,p,q)
- while ?a??, p,q??A(r,a), p?q do
- rmerge(A,p,q)
75- APTA(X) Fr ?(q0,a) a??
- K q0
- While Fr?? do
- choose q from Fr
- if ?p?K L(rmerge(A,p,q))?X-? then A
rmerge(A,p,q) - else K K ? q
- Fr ?(q,a) q?K K
76X?, aaa, aaba, ababa, bb, bbaaa
7
a
11
a
a
4
8
b
2
a
b
a
14
a
b
9
12
5
1
a
15
b
13
3
b
a
a
10
6
X-aa, ab, aaaa, ba
77Try to merge 2 and 1
7
a
11
a
a
4
8
b
2
a
b
a
14
a
b
9
12
5
1
a
15
b
13
3
b
a
a
10
6
X-aa, ab, aaaa, ba
78Needs more merging for determinization
7
a
11
a
a
4
8
b
a
a
14
a
b
9
b
12
5
1,2
a
15
b
13
3
b
a
a
10
6
X-aa, ab, aaaa, ba
79But now string aaaa is accepted, so the merge
must be rejected
a
9, 11
a
14
b
a
12
a
15
b
1,2,4,7
13
b
a
a
10
3,5,8
6
X-aa, ab, aaaa, ba
80Try to merge 3 and 1
7
a
11
a
a
4
8
b
2
a
b
a
14
a
b
9
12
5
1
a
15
b
13
3
b
a
a
10
6
X-aa, ab, aaaa, ba
81Requires to merge 6 with 1,3
7
a
11
a
a
4
8
b
2
a
b
a
14
a
b
9
12
5
1,3
a
15
b
b
13
a
a
10
6
X-aa, ab, aaaa, ba
82And now to merge 2 with 10
7
a
11
a
a
4
8
b
2
a
b
a
14
a
b
9
1,3,6
12
5
a
15
a
b
13
a
10
X-aa, ab, aaaa, ba
83And now to merge 4 with 13
7
a
11
a
a
4
2,10
8
b
a
b
a
14
a
b
9
1,3,6
12
5
a
15
a
b
13
X-aa, ab, aaaa, ba
84And finally to merge 7 with 15
7
a
11
a
4,13
a
2,10
8
b
a
b
a
14
a
b
9
1,3,6
12
5
b
15
a
X-aa, ab, aaaa, ba
85No counter example is accepted so the merges are
kept
7,15
a
11
a
4,13
a
2,10
8
b
a
b
a
14
a
b
9
1,3,6
12
5
b
X-aa, ab, aaaa, ba
86Next possible merge to be checked is 4,13 with
1,3,6
7,15
a
11
a
4,13
a
2,10
8
b
a
b
a
14
a
b
9
1,3,6
12
5
b
X-aa, ab, aaaa, ba
87More merging for determinization is needed
7,15
a
11
a
b
2,10
8
a
b
a
14
a
b
9
1,3,4,6,13
a
12
5
b
X-aa, ab, aaaa, ba
88But now aa is accepted
2,7,10,11,15
a
b
a
14
1,3,4,6, 8,13
a
b
9
a
12
5
b
X-aa, ab, aaaa, ba
89So we try 4,13 with 2,10
7,15
a
11
a
4,13
a
2,10
8
b
a
b
a
14
a
b
9
1,3,6
12
5
b
X-aa, ab, aaaa, ba
90After determinizing, negative string aa is again
accepted
a
a
b
a
14
2,4,7,10, 13,15
a
b
1,3,6
12
9,11
5,8
b
X-aa, ab, aaaa, ba
91So we try 5 with 1,3,6
7,15
a
11
a
4,13
a
2,10
8
b
a
b
a
14
a
b
9
1,3,6
12
5
b
X-aa, ab, aaaa, ba
92But again we accept ab
7,15
a
11
a
4,13
a
2,9,10,14
8
b
a
1,3,5,6,12
b
b
X-aa, ab, aaaa, ba
93So we try 5 with 2,10
7,15
a
11
a
4,13
a
2,10
8
b
a
b
a
14
a
b
9
1,3,6
12
5
b
X-aa, ab, aaaa, ba
94Which is OK. So next possible merge is 7,15
with 1,3,6
7,15
11,14
a
4,9,13
a
2,5,10
a
b
a
1,3,6
8,12
b
b
X-aa, ab, aaaa, ba
95Which is OK. Now try to merge 8,12 with
1,3,6,7,15
11,14
a
4,9,13
a
b
8,12
a
a
1,3,6, 7,15
a
2,5,10
b
b
X-aa, ab, aaaa, ba
96And ab is accepted
a
4,9,13
b
1,3,6,7, 8,12,15
a
a
2,5,10,11,14
b
b
X-aa, ab, aaaa, ba
97Now try to merge 8,12 with 4,9,13
11,14
a
4,9,13
a
b
8,12
a
a
1,3,6, 7,15
a
2,5,10
b
b
X-aa, ab, aaaa, ba
98This is OK and no more merge is possible so the
algorithm halts.
a
4,8,9,12,13
b
a
a
1,3,6,7, 11,14,15
a
2,5,10
b
b
X-aa, ab, aaaa, ba
99Definitions
- Let ? be the length-lex ordering over ?
- Let Pref(L) be the set of all prefixes of strings
in some language L.
100Short prefixes
- Sp(L)u?Pref(L)
- ?(q0,u)?(q0,v) ? u?v
- There is one short prefix per useful state
101Kernel-sets
- N(L)ua?Pref(L) u?Sp(L)??
- There is an element in the Kernel-set for each
useful transition
102A characteristic sample
- A sample is characteristic (for RPNI) if
- ?x?Sp(L) ?xu?X
- ?x?Sp(L), ?y?N(L),
- ?(q0,x)??(q0,y)
- ?z?? xz?X?yz?X- ?
- xz?X-?yz?X
103About characteristic samples
- If you add more strings to a characteristic
sample it still is characteristic - There can be many different characteristic
samples - Change the ordering (or the exploring function in
RPNI) and the characteristic sample will change.
104Conclusion
- RPNI identifies any regular language in the
limit - RPNI works in polynomial time. Complexity is in
O(X3.X-) - There are many significant variants of RPNI
- RPNI can be extended to other classes of
grammars.
105Open problems
- RPNIs complexity is not a tight upper bound.
Find the correct complexity. - The definition of the characteristic set is not
tight either. Find a better definition.
106Algorithms
RPNI
K-Reversible
L
SEQUITUR
GRIDS
1074.2 The k-reversible languages
- The class was proposed by Angluin (1982).
- The class is identifiable in the limit from text.
- The class is composed by regular languages that
can be accepted by a DFA such that its reverse is
deterministic with a look-ahead of k.
108- Let A(?, Q, ?, I, F) be a NFA,
- we denote by AT(?, Q, ?T, F, I) the reversal
automaton with - ?T(q,a)q?Q q??(q,a)
109a
b
A
a
2
b
1
0
4
a
a
a
3
a
AT
b
a
2
b
1
0
4
a
a
a
3
110Some definitions
- u is a k-successor of q if uk and ?(q,u)??.
- u is a k-predecessor of q if uk and
?T(q,uT)??. - ? is 0-successor and 0-predecessor of any state.
111a
b
A
a
2
b
1
0
4
a
a
a
3
- aa is a 2-successor of 0 and 1 but not of 3.
- a is a 1-successor of 3.
- aa is a 2-predecessor of 3 but not of 1.
112- A NFA is deterministic with look-ahead k iff
?q,q?Q q?q - (q,q?I) ? (q,q??(q,a))
- ?
(u is a k-successor of q) ? (v is a k-successor
of q)
?
u?v
113Prohibited
u
1
uk
a
a
u
2
114Example
a
b
a
2
b
1
0
4
a
a
a
3
- This automaton is not deterministic with
look-ahead 1 but is deterministic with look-ahead
2.
115K-reversible automata
- A is k-reversible if A is deterministic and AT is
deterministic with look-ahead k. - Example
b
b
a
a
b
2
a
b
2
a
1
0
1
0
b
b
deterministic with look-ahead 1
deterministic
116Violation of k-reversibility
- Two states q, q violate the k-reversibility
condition iff - they violate the deterministic condition
q,q??(q,a) - or
- they violate the look-ahead condition
- q,q?F, ?u??k u is k-predecessor of both
- ?u??k, ?(q,a)?(q,a) and u is k-predecessor of
both q and q.
117Learning k-reversible automata
- Key idea the order in which the merges are
performed does not matter! - Just merge states that do not comply with the
conditions for k-reversibility.
118K-RL Algorithm (?k-RL)
- Data k??, X sample of a k-RL L
- APTA(X)
- While ?q,q k-reversibility violators do
- Amerge(A,q,q)
119Let Xa, aa, abba, abbbba
k2
a
aa
abba
a
a
a
?
b
b
b
b
a
ab
abb
abbbb
abbb
abbbba
Violators, for u ba
120Let Xa, aa, abba, abbbba
k2
a
aa
abba
a
a
a
a
?
b
b
b
b
ab
abb
abbbb
abbb
Violators, for u bb
121Let Xa, aa, abba, abbbba
k2
a
aa
abba
a
a
a
b
?
b
b
b
ab
abb
abbb
122Properties (1)
- ?k?0, ?X, ?k-RL(X) is a k-reversible language.
- L(?k-RL(X)) is the smallest k-reversible language
that contains X. - The class Lk-RL is identifiable in the limit from
text.
123Properties (2)
- Any regular language is k-reversible iff
- (u1v)-1L ?(u2v)-1L?? and vk
- ?
- (u1v)-1L(u2v)-1L
- (if two strings are prefixes of a string of
length at least k, then the strings are
Nerode-equivalent)
124Properties (3)
- Lk-RL(X) ? L(k1)-RL(X)
- Lk-TSS(X) ? L(k-1)-RL(X)
125Properties (4)
- The time complexity is O(kX3).
- The space complexity is O(X).
- The algorithm is not incremental.
126Properties (4) Polynomial aspects
- Polynomial characteristic sets
- Polynomial update time
- But not necessarily a polynomial number of mind
changes
127Extensions
- Sakakibara built an extension for context-free
grammars whose tree language is k-reversible - Marion Besombes propose an extension to tree
languages. - Different authors propose to learn these automata
and then estimate the probabilities as an
alternative to learning stochastic automata.
128Exercises
- Construct a language L that is not k-reversible,
?k?0. - Prove that the class of k-reversible languages is
not in TxtEx. - Run ?k-RL on Xaa, aba, abb, abaaba, baaba for
k0,1,2,3
129Solution (idea)
- Lkai i?k
- Then for each k Lk is k-reversible but not
k-1-reversible. - And ULk a
- So there is an accumulation point
130Algorithms
RPNI
K-Reversible
L
SEQUITUR
GRIDS
1314.4 Active Learning learning DFA from
membership and equivalence queries the L
algorithm
132The classes C and H
- sets of examples
- representations of these sets
- the computation of L(x) (and h(x)) must take
place in time polynomial in ?x?.
133Correct learning
- A class C is identifiable with a polynomial
number of queries of type T if there exists an
algorithm ? that - ?L?C identifies L with a polynomial number of
queries of type T - does each update in time polynomial in ?f? and in
??xi?, xi counter-examples seen so far.
134Algorithm L
- Angluins papers
- Some talks by Rivest
- Kearns and Vazirani
- Balcazar, Diaz, Gavaldà Watanabe
135Some references
- Learning regular sets from queries and
counter-examples, D. Angluin, Information and
computation, 75, 87-106, 1987. - Queries and Concept learning, D. Angluin, Machine
Learning, 2, 319-342, 1988. - Negative results for Equivalence Queries, D.
Angluin, Machine Learning, 5, 121-150, 1990.
136The Minimal Adequate Teacher
- You are allowed
- strong equivalence queries
- membership queries.
137General idea of L
- find a consistent table (representing a DFA)
- submit it as an equivalence query
- use counterexample to update the table
- submit membership queries to make the table
complete - Iterate.
138An observation table
?
a
?
1
0
0
0
a
b
0
1
0
0
aa
0
1
ab
139The experiments (E)
?
a
?
1
0
The states (S) or test set
0
0
a
b
0
1
aa
0
0
The transitions (T)
0
1
ab
140Meaning
?
a
?
1
0
?(q0, ?. ?)?F ? ? ?L
0
0
a
b
0
1
aa
0
0
0
1
ab
141?
a
?
1
0
0
0
?(q0, ab.a)? F ? aba ? L
a
b
0
1
aa
0
0
ab
0
1
142Equivalent prefixes
?
a
?
1
0
These two rows are equal, hence ?(q0,?)
?(q0,ab)
0
0
a
b
0
1
aa
0
0
ab
0
1
143Building a DFA from a table
?
a
?
1
0
0
0
a
b
0
1
aa
0
0
ab
0
1
144b
?
a
?
?
1
0
b
a
0
0
a
a
b
0
1
aa
0
0
a
ab
0
1
145Some rules
This set is suffix-closed
b
?
a
?
?
1
0
This set is prefix- closed
b
a
0
0
a
a
b
0
1
aa
0
0
S?\ST
a
0
1
ab
146An incomplete table
b
?
a
?
?
1
0
b
a
0
a
a
b
0
1
aa
a
0
ab
0
1
147Good idea
- We can complete the table by making membership
queries...
v
Membership query
u
?
uv?L ?
148A table is
- closed if any row of T corresponds to some row
in S
?
a
?
1
0
0
0
a
b
0
1
1
0
aa
Not closed
0
1
ab
149And a table that is not closed
b
?
a
?
?
1
0
b
a
0
0
a
a
b
0
1
aa
1
0
a
0
1
ab
?
150What do we do when we have a table that is not
closed?
- Let s be the row (of T) that does not appear in
S. - Add s to S, and ?a?? sa to T.
151An inconsistent table
?
a
?
1
0
Are a and b equivalent?
aa
0
1
ab
0
1
ba
0
1
0
0
bb
152A table is consistent if
- Every equivalent pair of rows in H remains
equivalent in S after appending any symbol - row(s1)row(s2)
- ?
- ?a??, row(s1a)row(s2a)
153What do we do when we have an inconsistent table?
- Let a?? be such that row(s1)row(s2) but
row(s1a)?row(s2a) - If row(s1a)?row(s2a), it is so for experiment e
- Then add experiment ae to the table
154What do we do when we have a closed and
consistent table ?
- We build the corresponding DFA
- We make an equivalence query!!!
155What do we do if we get a counter-example?
- Let u be this counter-example
- ?w?Pref(u) do
- add w to S
- ?a??, such that wa?Pref(u) add wa to T
156 Run of the algorithm
?
?
Table is now closed and consistent
157An equivalence query is made!
Counter example baa is returned
158?
?
1
b
1
ba
0
baa
a
1
bb
bab
baaa
baab
159? a
Table is now closed and consistent
?
1 1
b
1 1
ba
1 0
0 0
baa
a
1
bb
1
bab
1
baaa
0
baab
1
160Proof of the algorithm
Sketch only Understanding the proof is important
for further algorithms Balcazar et al. is a good
place for that.
161Termination / Correctness
- For every regular language there is a unique
minimal DFA that recognizes it. - Given a closed and consistent table, one can
generate a consistent DFA. - A DFA consistent with a table has at least as
many states as different rows in S. - If the algorithm has built a table with n
different rows in S, then it is the target.
162Finiteness
- Each closure failure adds one different row to S.
- Each inconsistency failure adds one experiment,
which also creates a new row in S. - Each counterexample adds one different row to S.
163Polynomial
- E ? n
- at most n-1 equivalence queries
- membership queries ? n(n-1)m where m is the
length of the longest counter-example returned by
the oracle
164Conclusion
- With an MAT you can learn DFA
- but also a variety of other classes of grammars
- it is difficult to see how powerful is really an
MAT - probably as much as PAC learning.
- Easy to find a class, a set of queries and
provide and algorithm that learns with them - more difficult for it to be meaningful.
- Discussion why are these queries meaningful?
165Algorithms
RPNI
K-Reversible
L
SEQUITUR
GRIDS
1664.5 SEQUITUR
- (http//sequence.rutgers.edu/sequitur/)
- (Neville Manning Witten, 97)
- Idea construct a CF grammar from a very long
string w, such that L(G)w - No generalization
- Linear time (/-)
- Good compression rates
167Principle
- The grammar with respect to the string
- Each rule has to be used at least twice
- There can be no sub-string of length 2 that
appears twice.
168Examples
S ?aAdA A ?bc
S?abcdbc
S?AaA A ?aab
S?aabaaab
S?AbAab A ?aa
169abcabdabcabd
170- In the beginning, God created the heavens and the
earth. - And the earth was without form, and void and
darkness was upon the face of the deep. And the
Spirit of God moved upon the face of the waters. - And God said, Let there be light and there was
light. - And God saw the light, that it was good and God
divided the light from the darkness. - And God called the light Day, and the darkness he
called Night. And the evening and the morning
were the first day. - And God said, Let there be a firmament in the
midst of the waters, and let it divide the waters
from the waters. - And God made the firmament, and divided the
waters which were under the firmament from the
waters which were above the firmament and it was
so. - And God called the firmament Heaven. And the
evening and the morning were the second day.
171(No Transcript)
172Sequitur options
- appending a symbol to rule S
- using an existing rule
- creating a new rule
- and deleting a rule.
173Results
- On text
- 2.82 bpc
- compress 3.46 bpc
- gzip 3.25 bpc
- PPMC 2.52 bpc
174Algorithms
RPNI
K-Reversible
L
SEQUITUR
GRIDS
1754.6 Using a simplicity bias
- (Langley Stromsten, 00)
- Based on algorithm GRIDS (Wolff, 82)
- Main characteristics
- MDL principle
- Not characterizable
- Not tested on large benchmarks.
176Two learning operators
- Creation of non terminals and rules
NP ?ART ADJ NOUN NP ?ART ADJ ADJ NOUN
NP ?ART AP1 NP ?ART ADJ AP1 AP1 ? ADJ NOUN
177- Merging two non terminals
NP ?ART AP1 NP ?ART AP2 AP1 ? ADJ NOUN AP2 ? ADJ
AP1
NP ?ART AP1 AP1 ? ADJ NOUN AP1 ? ADJ AP1
178- Scoring function MDL principle ?G??w?T ?d(w)?
- Algorithm
- find best merge that improves current grammar
- if no such merge exists, find best creation
- halt when no improvement
179Results
- On subsets of English grammars (15 rules, 8 non
terminals, 9 terminals) 120 sentences to
converge - on (ab) all (15) strings of length ? 30
- on Dyck1 all (65) strings of length ? 12
180Algorithms
RPNI
K-Reversible
L
SEQUITUR
GRIDS
1815 Open questions and conclusions
- dealing with noise
- classes of languages that adequately mix
Chomskys hierarchy with edit distance compacity - stochastic context-free grammars
- polynomial learning from text
- learning POMDPs
- fast algorithms
182ERNESTO SÁBATO, EL TÚNEL
- Intuí que había caído en una trampa y quise
huir. Hice un enorme esfuerzo, pero era tarde mi
cuerpo ya no me obedecía. Me resigné a presenciar
lo que iba a pasar, como si fuera un
acontecimiento ajeno a mi persona. El hombre
aquel comenzó a transformarme en pájaro, en un
pájaro de tamaño humano. Empezó por los pies vi
cómo se convenían poco a poco en unas patas de
gallo o algo así. Después siguió la
transformación de todo el cuerpo, hacia arriba,
como sube el agua en un estanque. Mi única
esperanza estaba ahora en los amigos, que
inexplicablemente no habían llegado. Cuando por
fin llegaron, sucedió algo que me horrorizó no
notaron mi transformación. Me trataron como
siempre, lo que probaba que me veían como
siempre. Pensando que el mago los ilusionaba de
modo que me vieran como una persona normal,
decidí referir lo que me había hecho. Aunque mi
propósito era referir el fenómeno con
tranquilidad, para no agravar la situación
irritando al mago con una reacción demasiado
violenta (lo que podría inducirlo a hacer algo
todavía peor), comencé a contar todo a gritos.
Entonces observé dos hechos asombrosos la frase
que quería pronunciar salió convertida en un
áspero chillido de pájaro, un chillido
desesperado y extraño, quizá por lo que encerraba
de humano y, lo que era infinitamente peor, mis
amigos no oyeron ese chillido, como no habían
visto mi cuerpo de gran pájaro por el contrario,
parecían oír mi voz habitual diciendo cosas
habituales, porque en ningún momento mostraron el
menor asombro. Me callé, espantado. El dueño de
casa me miró entonces con un sarcástico brillo en
sus ojos, casi imperceptible y en todo caso sólo
advertido por mí. Entonces comprendí que nadie,
nunca, sabría que yo había sido transformado en
pájaro. Estaba perdido para siempre y el secreto
iría conmigo a la tumba.