Title: Language Learning Week 11
1Language Learning Week 11
Pieter Adriaans pietera_at_science.uva.nl Sophia
Katrenko katrenko_at_science.uva.nl
2Contents Week 11
- Learning Human Languages
- Learning context-free grammars
- Emile
3GI Research Questions
- Research Question What is the complexity of
human language? - Research Question Can we make a formal model of
language development of young children that
allows us to understand - Why the process is efficient?
- Why the process is discontinuous?
- Underlying Research Question Can we learn
natural language efficiently from text? How much
text is needed? How much processing is needed? - Research Question Semantic learning e.g. can we
construct ontologies for specific domains from
(scientific) text?
4Chomsky Hierarchy and the complexity of Human
Language
5Complexity of Natural Language Zipf distribution
Heavy Low Frequency Tail
Structured High Frequency Core
6Observations
- Word Frequencies in human utterances dominated by
powerlaws - High Frequency core
- Low Frequency heavy tail
- Open versus closed wordclasses (function words)
- Natural Language is open. Grammar is elastic.
Occurence of new words is natural phenomenon.
Syntactic/semantic bootstrapping must play an
important role in language learning. - Bootstrapping will be important for ontology
learning as well as child language acquisition - Better understanding of NL distributions is
necessary
7Learn NL from text Probabilistic versus
Recursion theoretic approach
- 1967 Gold. Any language more complex than
super-finite sets (including regular and up the
Chomsky hierarchy) can not be learned from
positive data. - 1969 Horning Probabilistic context-free
grammars can be learned from positive data. Given
a text T and two grammars G1 and G2 we are able
to approximate max(P(G1T), P(G2T)) - ICGI gt 1990 empirical approach. Just build
algorithms and try them. Approximate NL from
below Finite ? Regular ? Context-free ?
Context-sensitive
8Situation lt 2004
- GI seems to be hard
- No identification in the limit
- Ill-understood Powerlaws dominate (word)
frequencies in human communication - Machine learning algorithms have difficulties in
these domains - PAC learning does not converge on these domains
- Nowhere near learning natural languages
- We were running out of ideas
9Situation lt 2004 Learning Regular Languages
- Reasonable success in learning Regular languages
of moderate complexity (Evidence Based State
Merging, Blue-Fringe) - Transparant representation Deterministic Finite
Automata (DFA) - DEMO
10Situation lt 2004 Learning Context-free Languages
- A number of approaches Learning Probabilistic
CFG, Inside-outside Algorithm, Emile, ABL. - No transparant representation Push Down Automata
(PDA) are not really helpful to model the
learning process. - No adequate convergence on interesting real life
corpora - Problem of sparse data sets.
- Complexity issues ill-understood.
11Emile natural language allows bootstrapping
- Lewis Caroll's famous poem Jabberwocky' starts
with - 'Twas brillig, and the slithy toves
- Did gyre and gimble in the wabe
- All mimsy were the borogoves
- and the mome raths outgrabe.
12Emile Characteristic Expressions and Contexts
- An expression of a type T is characteristic for T
if it only appears with contexts of type T - Similarly, a context of a type T is
characteristic for T if it only appears with
expressions of type T. - Let G be a grammar (context-free or otherwise) of
a language L. G has context separability if each
type of G has a characteristic context, and
expression separability if each type of G has a
characteristic expression. - Natural languages seem to be context- and
expression-separable. - This is nothing but stating that languages can
define their own concepts internally (...is a
noun, ...is a verb).
13Emile Natural languages are shallow
- A class of languages C is shallow if for each
language L it is possible to find a context- and
expression-separable grammar G, and a set of
sentences S inducing characteristic contexts and
expressions for all the types of G, such that the
size of S and the length of the sentences of S
are logarithmic in the descriptive length of
L(relative to C). - Seems to hold for natural languages ? Large
dictionaries, low thickness
14Regular versus context-free merging-clustering
?
?
?
?
? ? ?
?
? ? ? ?
? ?
15The EMILE learning algorithm
- One can prove that, using clustering techniques,
shallow CFGs can be learned efficiently from
positive examples drawn under m. - General idea ????? ? ???\?/??
sentence? expression?\?/? context
?
?
? ? ? ?
? ?
16Grammar Formalisms Context_free
- Context_free GrammarSentence ? Name Verb
Sentence ? Name T_Verb Name Name ? Mary
JohnVerb ? WalksT_Verb ? Loves - Sentences John loves Mary Mary walks
17Grammar Formalisms Categorial Grammars
- Categorial Grammar (Lexicalistic)loves ?Name \
Sentence / Name walks Runs ? Name \
SentenceMary John ? Name - Parsing as deduction ? ? ?\????/? ? ???
Sentence
Name Name\Sentence
Name (Name\Sentence)/Name Name
John loves
Mary
18Categorial Grammar Propositional calculus
without structural rules
- Interchange x, A, y, B, z ? C x, B, y, A, z
? C - Contraction x, A, A, y ? C x, A, y ? C
- Thinning x, y ? C x, A, y ? C
- Logic A (A ? B) ? B (A ? B) A ? B
- Grammar A ? (A \ B) ? B (A / B) ? A ? B
19Categorial Grammar Formalism Algebraic
specification
- M is a multiplicative system
- A ? B x ? y ? M (x ?A) (y ?B)
- C / B x ?M ? y?B (x ? y ?C)
- A \ C y ?M ? x?A (x ? y ?C)
20Categorial Grammar Formalism Algebraic
specification Data base operations
- Name John, Mary
- Verb walks, runs
- S Name ? Verb John, Mary ? walks,
runs John walks John runs Mary
walks Mary runs
21Categorial Grammar Formalism Algebraic
specification Data base operations
- Name \ S
- John, Mary \ John walks, John runs, Mary
Walks, Mary runs John walks Mary
walks John runs Mary runs walks,
runs - S / Verb
- John walks, John runs, Mary Walks, Mary runs /
walks, runs John. Mary
22EMILE 3.0 stages Take Sample
John loves Mary Mary walks
23EMILE 3.0 stages First order Explosion
John loves Mary Mary walks
S/loves Mary ? John S/Mary ? John loves S ?
John loves Mary John\S/Mary ? loves John\S ?
loves Mary John loves\S ? Mary S/walks ?
Mary S ? Mary walks Mary\S ? walks
24EMILE 3.0 stages First order Explosion
John loves Mary Mary walks
25EMILE 3.0 stages Complete First order Explosion
John loves Mary Mary walks
26EMILE 3.0 stages Clustering
John loves Mary Mary walks
27EMILE 3.0 stages Clustering
John loves Mary Mary walks
28EMILE 3.0 stages Clusters ? non-terminal names
John loves Mary Mary walks
A
B
C
D
E
29EMILE 3.0 stages protorules
S/loves Mary ? A S/Mary ? B S ? C John\S/Mary
? D John\S ? E John loves\S ? A S/walks ?
A Mary\S ? E
A ? John B ? John loves C ? John loves
Mary D ? loves E ? loves Mary A ?
Mary C ? Mary walks E ? walks
30Emile 3.0 stages generalize into Context-free
rules
John\S/Mary ? D _____________________________ S
? John D Mary A ? John
(Chacteristic expression) A ?
Mary (Chacteristic expression) __________________
___________ S ? A D A
Grammar S ? A D A S ? B A S ? C S ? A E A ?
John Mary B ? A D C ? A D A A E D ?
loves E ? A D walks
31Theorem (Adriaans 92)
- If a language L has a context_free grammar
Gis shallow is sampled according to the
Universal Distributionand there is a
member-check function availablethen then it can
be learned efficiently from text - Assumptions Natural language is
shallow Distributions of sentences in a text is
simple
32EMILE 3.0 (1992) Problems, not very practical
- Take Sample Positive examples
- First Order explosion Deduction
- Complete first order explosion Positive
Negative examples - Clustering Deduction
- Non-terminal names Deduction
- Proto-rules Induction
- Context-free rules Induction
33EMILE 3.0 (1992) Problems
- Take Sample Positive examples
- First Order explosion Deduction
- Complete first order explosion Positive
Negative examples - Clustering Deduction
- Non-terminal names Deduction
- Proto-rules Induction
- Context-free rules Induction
- Supervised, no text, speakers do not give
negative examples
34EMILE 3.0 (1992) Problems
- Take Sample Positive examples
- First Order explosion Deduction
- Complete first order explosion Positive
Negative examples - Clustering Deduction
- Non-terminal names Deduction
- Proto-rules Induction
- Context-free rules Induction
- Supervised, no text, speakers do not give
negative examples - Polynomial, but very complex due to overlapping
clusters
35EMILE 3.0 (1992) Only theoretical value
- Take Sample Positive examples
- First Order explosion Deduction
- Complete first order explosion Positive
Negative examples - Clustering Deduction
- Non-terminal names Deduction
- Proto-rules Induction
- Context-free rules Induction
- Supervised, no text, speakers do not give
negative examples - Polynomial, but very complex due to overlapping
clusters - Batch oriented, not incremental
36EMILE 4.1 (2000) Vervoort
- Unsupervised
- Two dimensional clustering random search for
maximized blocks in the matrix - Incremental thresholds for filling degree of
blocks - Simple (but sloppy) rule induction using
characteristic expressions
37Clustering (2-dimensional)
- John makes tea
- John likes tea
- John likes eating
- ? ? ? \ ? / ?
- John makes coffee
- John likes coffee
- John is eating
38Emile 4.1 Clustering Sparse Matrices of Contexts
and Expressions
Contexts
Characteristic Expression
Expressions
Characteristic Context
39Emile guaranteed to find types with right settings
- Let T be a type with a characteristic context cch
and a characteristic expression ech. Suppose that
the maximum lengths for primary contexts and
expressions are set to at least cch and ech
and suppose that the total_support ,
expression_support and context_support
settings are all set to 100 . Let TltmaxC and
TltmaxE be the sets of contexts and expressions of
T that are small enough to be used as primary
contexts and expressions. If EMILE is given a
sample containing all combinations of contexts
from TltmaxC and expressions from TltmaxE, then
EMILE will find type T. (Vervoort 2000)
40Original grammar
- S ? NP V_i ADV
- NP_a VP_a
- NP_a V_s that S
- NP ? NP_a
- NP_p
- VP_a ? V_t NP
- V_t NP P NP_p
- NP_a ? John Mary the man the child
- NP_p ? the car the city the house the shop
- P ? with near in from
- V_i ? appears is seems looks
- V_s ? thinks hopes tells says
- V_t ? knows likes misses sees
- ADV ? large small ugly beautiful
41Learned Grammar after 100.000 examples
- 0 ?17 6
- 0 ?17 22 17 6
- 0 ?17 22 17 22 17 22 17 6
- 6 ? misses 17 likes 17 knows 17
sees 17 - 6 ?22 17 6
- 6 ? appears 34 looks 34 is 34 seems
34 - 6 ?6 near 17 6 from 17 6 in 17
6 ?6 with 17 - 17 ? the child Mary the city the man
John the car the house the shop - 22 ? tells that thinks that hopes that
says that - 22 ?22 17 22
- 34 ? small beautiful large ugly
42Bible books
- King James version
- 31102 verses of 82935 lines
- 4,8 Mb of English text
- 001001 In the beginning God created the heaven
and the earth. - 66 Experiments with increasing sample size
- Initially Book Genesis, Book Exodus,
- Full run 40 minutes, 500 Mb on Ultra-2 Sparc
43Bible books
44GI on the bible
- 0 ? Thou shall not 582
- 0 ? Neither shalt thou 582
- 582 ? eat it
- 582 ? kill .
- 582 ? commit adultery .
- 582 ? steal .
- 582 ? bear false witness against thy neighbour
. - 582 ? abhor an Edomite
45Knowledge base in Bible
- Dictionary Type 76
- Esau, Isaac, Abraham, Rachel, Leah, Levi, Judah,
Naphtali, Asher, Benjamin, Eliphaz, Reuel, Anah,
Shobal, Ezer, Dishan, Pharez, Manasseh, Gershon,
Kohath, Merari, Aaron, Amram, Mushi, Shimei,
Mahli,Joel, Shemaiah, Shem, Ham, Salma, Laadan,
Zophah, Elpaal, Jehieli - Dictionary Type 362
- plague, leprosy
- Dictionary Type 414
- Simeon, Judah, Dan, Naphtali, Gad, Asher,
Issachar, Zebulun, Benjamin, Gershom - Dictionary Type 812
- two, three, four
- Dictionary Type 1056
- priests, Levites, porters, singers, Nethinims
- Dictionary Type 978
- afraid, glad, smitten, subdued
- Dictionary Type 2465
- holy, rich, weak, prudent
- Dictionary Type 3086
- Egypt, Moab, Dumah, Tyre, Damascus
- Dictionary Type 4082
- heaven, Jerusalem
46Evaluation
- Works efficiently on large corpora
- learns (partial) grammars
- unsupervised
- - EMILE 4.1 needs a lot of input.
- - Convergence to meaningful syntactic type rarely
observed. - - Types seem to be semantic rather than
syntactic. - Why?
- Hypothesis distribution in real life text is
semantic, not syntactic. - But, most of all Sparse data!!!
47Contents Week 11
- Learning Human Languages
- Learning context-free grammars
- Emile