Title: Learning Theory and Natural Languages
1Learning Theory and Natural Languages
- Presented by Yaron Singer
2Outline
- Introduction
- Formal Learning Theory and motivation
- Golds Paradigm
- Alterative Models of Language Acquisition
- Strong Nativism (time permitting)
3Introduction
4Introduction to todays presentation
- This presentation is a brief introduction to
formal learning theory (to be defined shortly) - The questions which will be discussed
- What are natural languages?
- Which languages can humans learn?
- Do humans learn a language from zero, or do they
have some inborn mechanism which enables language
acquisition (Strong Nativism, Chomsky)? - Can we impose some constraints a construct a
formal model where artificial natural language
acquisition is possible?
5Formal Learning Theory and Motivation
6Motivation
- I wish to construct a precise model for the
intuitive notion able to speak a language in
order to be able to investigate theoretically how
it can be achieved artificially. - EM Gold (1967)
7Comparative Grammar
- Comparative Grammar is the attempt to
characterize the class of natural languages
through formal specification of their grammars - Theories of comparative grammars begin with
Chomsky (e.g. 1957, 1965).
8Formal Learning Theory
- What is Formal Learning Theory?
- Link between the results of acquisitional studies
and comparative grammar. - For Example
- Suppose we prove that given one rule of grammar
in some language, there is an algorithm which
generates all rules of grammar of that language. - Then we find out that children first use only one
rule of grammar. - We assume that children use the algorithm on
generating a grammar.
9What do we know natural languages?
- One of the most fundamental properties of natural
languages is
- Children can learn it through unsystematic
exposure to it within a few years.
10Thus
- If we will be able to construct a formal model on
how children learn a language, we will be able to
train a computer to learn a language. - Maybe.
11Golds Paradigm
12Definitions
- For the purpose of this discussion we will need
to define the following - Language
- Learner
- Learning Environment
- Criterion of Learning
13Languages and Grammars
- We shall define Languages to be sets of
sentences. - The only constraint is that the set off all
possible sentences in the language is countable -
That is, - All Logically possible Grammars are defined here
as all possible Turing Machines.
14Decidable Languages
- A language is said to be decidable iff it has a
grammar and its complement has a grammar. - We focus on Non-empty languages.
15Environment
- To understand how a learner acquires a language,
we must understand their learning environment. - Assumptions on Learning environment
- Sentences are presented one after another with no
ungrammatical intrusions - Negative Information is withheld
- Each sentence in L eventually appears
- Repetitions are allowed
- Sentences can arrive at any order
- Sentences are presented forever
16Learning Environments as Text
- Gold describes environments as infinite text.
- Note that we are already making an assumption
which we know isnt correct - We are assuming that all input is grammatical.
- We know that hugs, kisses, smiles, crying, tone,
intonation, volume of speech (shouting,
whispering), etc. effect learning. - This simplifies our model (cant / wont hug a
computer).
17Texts as Environments
- An environment is referred to as text.
- A text is for a language L if every member of L
appears somewhere in t (repetitions are allowed),
and no members of L appear in t. - L(t) denotes the language for which t is a text.
- We denote a text t, and the first n members of t
are denoted tn. - The set of all finite sequences of any length
(t1,t2,) is denoted SEQ.
18Learning Function
- A learning function is defined as any function
from the set of all finite sentence sequences
(denoted SEQ) the set of possible grammars. - Note It may be that some learning functions may
be undefined.
19Children implementing Learning Functions
- A child which acquires a language implements a
learning function as they are mapping finite
sequences of sentences into grammars. - We make a few assumptions here
- Linguistic input
- One grammatical hypothesis
20Learning Functions
- We wish to define some criterion that would
enable us to decide what is a good learning
function and what is a bad one.
21Criterion of Learning
- In his paper Language identification in the
Limit, Gold has suggested the following
criterion for learning - A learning function f is defined on text t if f
is defined on tn for all n in N - If f is defined on t and for some grammar g in G,
f(tn)g, for all but finitely many n in N then f
is said to converge on t to g - If f converges on t to a grammar for L(t), then f
is said to identify t. - If f identifies every text for a language, L,
then f is said to identify L. - If f identifies every language in a set of
languages then f is said to identify that set of
languages. - A collection of languages is said to be
identifiable if there is some learning function f
which identifies it.
22Intuitive Example
- To have some intuition on what is meant by
identifying a language lets consider the
following example - A text t is fed to a learner M, one sentence at a
time - With each new input, M is faced with a finite set
of sequences - M is defined on t if it offers a hypothesis on
all of these finite sequence of sentences - If M is undefined somewhere on t, then it is
stuck - If M does not get stuck and after some finite
time converges on t to a grammar g, M has
identified the language.
text fed to the learner
M
23Questions
24Unidentifiable Collection of Languages
- In his paper, Gold proved the following
remarkable proposition - PROPOSITION Let L be a collection of
- languages that includes every finite
- language and at least one infinite language.
- Then L is not identifiable.
- CONCLUSION This raises a serious constraint on
models of language acquisition by children as
natural languages are infinite. - Golds learning paradigm offers useful conditions
on comparative grammar only to the extent that
the paradigm accurately portraits normal language
acquisition.
25Alterative Models of Language Acquisition
26Exploring alternative models
- We will now explore alternative models to Golds
model. - In some ways we will try to make the model
tighter - Computable learning functions
- Learning functions which generate infinite
functions - In other ways well try to make the model
looser - Noisy text learning with interruption.
- We will see the constraints that these
alternative models impose on the languages which
they are able to learn.
27Alternative Models of Language Acquisition
- Do children implement a learning function?
- Current hypothesis is that children are a proper
subset of the class of all learning functions
Learning Functions
children
28Computability
- We would like to believe that language
acquisition is computable. i.e. For a natural
language L there exists a Turing Machine M, which
generates a Turing Machine G, which identifies a
L. - Computable functions are a small subset of all
learning functions. - If we assume that language acquisition is
computable, what constraints are we imposing?
29Identifiable Computable Languages
- Are all Identifiable languages computable?
- PROPOSITION There are collections, L of
languages such that L is identifiable, but no
computable learning function identifies L.
30Is that good or bad?
- Maybe there is a natural language that a child
can learn, but no algorithm exists which enables
a computer speak that language. - On the other hand, if we assume that natural
languages are computable, this proposition
enables us to ignore learning, this narrows down
our question of what natural languages are. - Still, there are still too many learning
functions. Lets consider subsets of the set of
computable functions.
31Learning Functions
Learning Functions
Learning functions of identifiable languages
Computable
32Nontriviality
- Natural Languages are infinite.
- (No natural language contains the longest
sentence). - A learning function is considered nontrivial if
- It is computable
- It produces a grammar which generates an infinite
language on every finite sequence for which it is
defined.
33Constraints of Nontriviality
- The next proposition shows that nontriviality
imposes limits on the computable learner - PROPOSITION There are collections, L of infinite
languages such that some computable function
identifies L ,but no nontrivial learning
functions identifies L. - If children are nontrivial learners, then there
are collections of infinite languages beyond
their reach that might otherwise (if theyre not
nontrivial learners) have been available.
34Learning Functions
Learning Functions
Learning functions of identifiable languages
Children?
Computable
Nontrivial
35Natural Environments
- Gold has defined environment as text.
- One the one hand
- We know that a real learning environment contains
ungrammatical intrusions, as well as omission of
some grammatical sentences - On the other hand
- Arbitrary text consists of arbitrary ordering of
sentences. Usually this is not the case. - Waldo is in the red house.
- all positive even integers greater than two can
be expressed as the sum of two primes.
36Noisy Text
- Suppose D is some arbitrary finite set.
- A noisy text for a language is a a text for L U
D. That is, a text for L into which a number of
intrusions has been inserted. - It is easy to see that no collection of languages
that includes finite variants is identifiable on
noisy text.
37Infinite Languages on Noisy Text
- But what about infinite languages?
- PROPOSITION There are collections, L of
infinite, disjoint, and computable languages such
that no computable function identifies L on noisy
text. - Conclusion Trying to construct a model on noisy
text raises serious constraint!
38Strong Nativism
39Summary
- We have presented Golds paradigm and the
difficulties with the assumptions it makes. - We have explored a few alternative models, and
witnessed their constraints. - We have introduced briefly the concept of Strong
Nativism. - Ultimately, we might hope to find sufficiently
powerful conditions on the human learning
function, on the environments in which it
typically operates, and on the criterion of
success to uniquely define the class of natural
languages.