Title: Systematicity in sentence processing by recurrent neural networks
1Systematicity in sentence processingby recurrent
neural networks
- Stefan Frank
- Nijmegen Institute for Cognition and Information
- Radboud University Nijmegen
- The Netherlands
2Please make it heavy on computers and AI and
light on the psycho stuff (Konstantopoulos,
personal communication, December 23, 2005)
3Systematicity in language
- Imagine you meet someone who only knows two
sentences of English
Could you please tell me where the toilet is?
I cant find my hotel.
So (s)he does not know
Could you please tell me where my hotel is?
I cant find the toilet.
This person has no knowledge of English but
simply memorized some lines from a phrase book.
4Systematicity in language
- Human language behavior is (more or less)
systematic if you know some sentences, you know
many. - Sentences are not atomic but made up of words.
- Likewise, words can be made up of
morphemes.(e.g., un clear unclear, un
stable unstable, ) - It seems like language results from applying a
set of rules (grammar, morphology) to symbols
(words, morphemes).
5Systematicity in language
- The Classical symbol system hypothesis the mind
contains word-like symbols the are manipulated by
structure-sensitive processes (Fodor Pylyshyn,
1988). E.g., for dealing with language - boy and girl are nouns (N)
- loves and sees are verbs (V)
- N V N is a possible sentence structure
- This hypothesis explains the systematicity found
in language If you know the N V N structure, you
know all N V N sentences (boy sees girl, girl
loves boy, boy sees boy, )
6Some issues for the Classical theory
- Lack of systematic behavior Why are people often
so unsystematic in practice?
The boy plays. OK
The boy who the girl likes plays. OK
The boy who the girl who the man sees likes
plays. OK?
The athlete who the coach who the sponsor hired
trained won. OK!
7Some issues for the Classical theory
- Lack of systematic behavior Why are people often
so unsystematic in practice? - Lack of systematicity in language Why are there
exceptions to rules?
help full helpful help less
helpless meaning full meaningful meaning
less meaningless
beauty full beautiful beauty less ugly
8Some issues for the Classical theory
- Lack of systematic behavior Why are people often
so unsystematic in practice? - Lack of systematicity in language Why are there
exceptions to rules? - Development How do children learn the rules from
what they hear?
The Classical theory has answers to these
questions, but no explanations.
9Connectionism
- The state of mind is represented as a pattern
of activity over a large number of simple,
quantitative (i.e., non-logical) processing units
(neurons). - These units are connected by weighted links,
forming a (neural) network through which
activation moves around. - The connection weights are adjusted to the
networks input and task. - The network develops its own internal
representation of the input. - It should generalize to new (test) inputs
10Connectionism and the Classical issues
- Lack of systematic behavior Systematicity is
built on top of an unsystematic architecture. - Lack of systematicity in language Beautiless
is expected statistically but never occurs, so
the network learns it doesnt exist. - Development The network adapts to its input.
But can neural networks explain systematicity, or
even behave systematically?
11Connectionism and systematicity
- Fodor Pylyshyn (1988) Neural networks cannot
be systematic. They only learn to associate
examples rather than becoming sensitive to
structure. - Systematicity knowing X ? knowing
Y.Generalization training on X ? learning
Y.So, systematicity equals generalization
(Hadley, 1994) - Demonstrations of connectionist systematicity
- require many training examples but only use few
tests - are not robust oversensitive to training details
- only display weak systematicity words occur in
the same syntactic positions of training and
test sentences
12Simple Recurrent Networks Elman (1990)
Feedforward networks have long-term memory (LTM)
but no short-term memory (STM). So how to process
sequential input, like the words of a sentence?
output layer
hidden layer
input layer
13SRNs and systematicity Van der Velde et al.
(2004)
- An SRN processed a minilanguage with
- 18 words (boy, girl, loves, sees, who, ., )
- 3 sentence types
- N V N . (boy sees girl.)
- N V N who V N . (boy sees girl who loves boy.)
- N who N V V N . (boy who girl sees loves boy.)
- Nouns and verbs were divided into four groups,
each had two nouns and two verbs. - In training sentences, nouns and verbs were from
the same group lt 0.44 of sentences used for
training. - In test sentences, nouns and verbs came from
different groups. Note weak systematicity only.
14SRNs and systematicity Van der Velde et al.
(2004)
- SRNs fail on test sentences, so
- They do not generalize to structurally similar
sentences - They cannot learn systematic behavior from a
small training set - They do not form good models of human language
behavior - But
- what does it mean to fail? Maybe the network
was more than completely non-systematic? - was the size of the network appropriate?
- larger network ? more STM ? better processing ?
- smaller network ? less LTM ? better
generalization ? - was the language complex enough? With more
different words there is more reason to abstract
to syntactic types (nouns, verbs)
15SRNs and systematicityreplication of Van der
Velde et al. (2004)
- What if a network does not generalize at all?
When given a new sentence, it can only use the
last word because combing words requires
generalization. - This hypothetical, unsystematic network serves as
the baseline for rating SRN performance. - Performance 1 network never makes ungrammatical
predictions - Performance 0 network does not generalize at
all, but gives the best possible output based on
the last word - Performance 1 network only makes ungrammatical
predictions. - Positive performance indicates systematicity
16Network architecture
w 18 units(one for each word)
output layer
10 units
hidden layer
n 20 units
recurrenthidden layer
w 18 units(one for each word)
input layer
17SRN Results
Positive performance at each word of each test
sentence type, so there is some systematicity.
18SRN Resultseffect of recurrent layer size
N V N
N V N who V N
N who N V V N
Larger networks (n 40) do better, but very
large ones (n 100) overfit.
19SRN performance and memory
- SRNs do show systematicity to some extent.
- But their performance is limited
- small n ? limited processing capacity (STM)
- large n ? large LTM ? overfitting.
- How to combine large STM with small LTM?
20Echo State NetworksJaeger (2003)
- Keep the connections to and within the recurrent
layer fixed at random values. - The recurrent layer becomes a dynamical
reservoir a non-specific STM for the input
sequence. - Some constraints on the dynamical reservoir
- large enough
- sparsely connected (here 15)
- weight matrix has spectral radius lt 1
- LTM capacity
- In SRNs O(n2)
- In ESNs O(n)
- So, can ESNs combine large STM with small LTM?
21Network architecture
w 18 units
output layer
trained untrained
10 units
hidden layer
The STM remains untrained, but the network does
develop internal representations
n 20 units
recurrenthidden layer
input layer
w 18 units
22ESN Results
Positive performance at each word of each test
sentence type, so there is some systematicity,
but less than in an SRN of the same size
23ESN Resultseffect of recurrent layer size
N V N
N V N who V N
N who N V V N
Bigger is better no overfitting even when n
1530!
24ESN Resultseffect of lexicon size (n 100)
N V N
N V N who V N
N who N V V N
Note with larger w, a smaller percentage of
possible sentences is used for training.
25Strong systematicity
- 30 words (boy(s), girl(s), like(s), see(s), who,
) - Many sentence types
- N V N . (girl sees boys.)
- N V N who V N . (girl sees boys who like boy.)
- N who N V V N . (girl who boy sees likes boy.)
- N who V N who N V . (girls who like boys see boys
who girl likes.) - Unlimited recursion (girls see boy who sees boy
who sees man who ) - Number agreement between nouns and verbs
26Strong systematicity
- In training sentences females as grammatical
subjects, males as grammatical objects (girl sees
boy) - In test sentences vice versa (boy sees girl)
- Positive performance on all words of four test
sentences types - N who V N V N . (boy who likes girls sees woman.)
- N V N who V N . (boy likes girls who see woman.)
- N who N V V N . (boys who man likes see girl.)
- N V N who N V . (boys like girl who man sees.)
27Conclusions
- ESNs can display both weak and strong
systematicity - Even with few training sentences and many test
sentences - By doing less training, the network can learn
more - Training fewer connections gives better results
- Training a smaller part of possible sentences
gives better results
- Can connectionism explain systematicity?
- No, because neural networks do not need to be
systematic - Yes, because they need to adapt to systematicity
in the training input. - The source of systematicity is not the cognitive
system, but the external world.