Systematicity in sentence processing by recurrent neural networks - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Systematicity in sentence processing by recurrent neural networks

Description:

beauty full = beautiful. beauty less = ugly. Some issues for ... They do not form good models of human language behavior. But. what does it mean to 'fail' ... – PowerPoint PPT presentation

Number of Views:77

Avg rating:3.0/5.0

Slides: 28

Provided by: sfr33

Category:

more less

Transcript and Presenter's Notes

Title: Systematicity in sentence processing by recurrent neural networks

1
Systematicity in sentence processingby recurrent
neural networks

Stefan Frank
Nijmegen Institute for Cognition and Information
Radboud University Nijmegen
The Netherlands

2
Please make it heavy on computers and AI and
light on the psycho stuff (Konstantopoulos,
personal communication, December 23, 2005)
3
Systematicity in language

Imagine you meet someone who only knows two
sentences of English

Could you please tell me where the toilet is?
I cant find my hotel.
So (s)he does not know
Could you please tell me where my hotel is?
I cant find the toilet.
This person has no knowledge of English but
simply memorized some lines from a phrase book.
4
Systematicity in language

Human language behavior is (more or less)
systematic if you know some sentences, you know
many.
Sentences are not atomic but made up of words.
Likewise, words can be made up of
morphemes.(e.g., un clear unclear, un
stable unstable, )
It seems like language results from applying a
set of rules (grammar, morphology) to symbols
(words, morphemes).

5
Systematicity in language

The Classical symbol system hypothesis the mind
contains word-like symbols the are manipulated by
structure-sensitive processes (Fodor Pylyshyn,
1988). E.g., for dealing with language
boy and girl are nouns (N)
loves and sees are verbs (V)
N V N is a possible sentence structure
This hypothesis explains the systematicity found
in language If you know the N V N structure, you
know all N V N sentences (boy sees girl, girl
loves boy, boy sees boy, )

6
Some issues for the Classical theory

Lack of systematic behavior Why are people often
so unsystematic in practice?

The boy plays. OK
The boy who the girl likes plays. OK
The boy who the girl who the man sees likes
plays. OK?
The athlete who the coach who the sponsor hired
trained won. OK!
7
Some issues for the Classical theory

Lack of systematic behavior Why are people often
so unsystematic in practice?
Lack of systematicity in language Why are there
exceptions to rules?

help full helpful help less
helpless meaning full meaningful meaning
less meaningless
beauty full beautiful beauty less ugly
8
Some issues for the Classical theory

Lack of systematic behavior Why are people often
so unsystematic in practice?
Lack of systematicity in language Why are there
exceptions to rules?
Development How do children learn the rules from
what they hear?

The Classical theory has answers to these
questions, but no explanations.
9
Connectionism

The state of mind is represented as a pattern
of activity over a large number of simple,
quantitative (i.e., non-logical) processing units
(neurons).
These units are connected by weighted links,
forming a (neural) network through which
activation moves around.
The connection weights are adjusted to the
networks input and task.
The network develops its own internal
representation of the input.
It should generalize to new (test) inputs

10
Connectionism and the Classical issues

Lack of systematic behavior Systematicity is
built on top of an unsystematic architecture.
Lack of systematicity in language Beautiless
is expected statistically but never occurs, so
the network learns it doesnt exist.
Development The network adapts to its input.

But can neural networks explain systematicity, or
even behave systematically?
11
Connectionism and systematicity

Fodor Pylyshyn (1988) Neural networks cannot
be systematic. They only learn to associate
examples rather than becoming sensitive to
structure.
Systematicity knowing X ? knowing
Y.Generalization training on X ? learning
Y.So, systematicity equals generalization
(Hadley, 1994)
Demonstrations of connectionist systematicity
require many training examples but only use few
tests
are not robust oversensitive to training details
only display weak systematicity words occur in
the same syntactic positions of training and
test sentences

12
Simple Recurrent Networks Elman (1990)
Feedforward networks have long-term memory (LTM)
but no short-term memory (STM). So how to process
sequential input, like the words of a sentence?
output layer
hidden layer
input layer
13
SRNs and systematicity Van der Velde et al.
(2004)

An SRN processed a minilanguage with
18 words (boy, girl, loves, sees, who, ., )
3 sentence types
N V N . (boy sees girl.)
N V N who V N . (boy sees girl who loves boy.)
N who N V V N . (boy who girl sees loves boy.)
Nouns and verbs were divided into four groups,
each had two nouns and two verbs.
In training sentences, nouns and verbs were from
the same group lt 0.44 of sentences used for
training.
In test sentences, nouns and verbs came from
different groups. Note weak systematicity only.

14
SRNs and systematicity Van der Velde et al.
(2004)

SRNs fail on test sentences, so
They do not generalize to structurally similar
sentences
They cannot learn systematic behavior from a
small training set
They do not form good models of human language
behavior
But
what does it mean to fail? Maybe the network
was more than completely non-systematic?
was the size of the network appropriate?
larger network ? more STM ? better processing ?
smaller network ? less LTM ? better
generalization ?
was the language complex enough? With more
different words there is more reason to abstract
to syntactic types (nouns, verbs)

15
SRNs and systematicityreplication of Van der
Velde et al. (2004)

What if a network does not generalize at all?
When given a new sentence, it can only use the
last word because combing words requires
generalization.
This hypothetical, unsystematic network serves as
the baseline for rating SRN performance.
Performance 1 network never makes ungrammatical
predictions
Performance 0 network does not generalize at
all, but gives the best possible output based on
the last word
Performance 1 network only makes ungrammatical
predictions.
Positive performance indicates systematicity

16
Network architecture
w 18 units(one for each word)
output layer
10 units
hidden layer
n 20 units
recurrenthidden layer
w 18 units(one for each word)
input layer
17
SRN Results
Positive performance at each word of each test
sentence type, so there is some systematicity.
18
SRN Resultseffect of recurrent layer size
N V N
N V N who V N
N who N V V N
Larger networks (n 40) do better, but very
large ones (n 100) overfit.
19
SRN performance and memory

SRNs do show systematicity to some extent.
But their performance is limited
small n ? limited processing capacity (STM)
large n ? large LTM ? overfitting.
How to combine large STM with small LTM?

20
Echo State NetworksJaeger (2003)

Keep the connections to and within the recurrent
layer fixed at random values.
The recurrent layer becomes a dynamical
reservoir a non-specific STM for the input
sequence.
Some constraints on the dynamical reservoir
large enough
sparsely connected (here 15)
weight matrix has spectral radius lt 1
LTM capacity
In SRNs O(n2)
In ESNs O(n)
So, can ESNs combine large STM with small LTM?

21
Network architecture
w 18 units
output layer
trained untrained
10 units
hidden layer
The STM remains untrained, but the network does
develop internal representations
n 20 units
recurrenthidden layer
input layer
w 18 units
22
ESN Results
Positive performance at each word of each test
sentence type, so there is some systematicity,
but less than in an SRN of the same size
23
ESN Resultseffect of recurrent layer size
N V N
N V N who V N
N who N V V N
Bigger is better no overfitting even when n
1530!
24
ESN Resultseffect of lexicon size (n 100)
N V N
N V N who V N
N who N V V N
Note with larger w, a smaller percentage of
possible sentences is used for training.
25
Strong systematicity

30 words (boy(s), girl(s), like(s), see(s), who,
)
Many sentence types
N V N . (girl sees boys.)
N V N who V N . (girl sees boys who like boy.)
N who N V V N . (girl who boy sees likes boy.)
N who V N who N V . (girls who like boys see boys
who girl likes.)
Unlimited recursion (girls see boy who sees boy
who sees man who )
Number agreement between nouns and verbs

26
Strong systematicity

In training sentences females as grammatical
subjects, males as grammatical objects (girl sees
boy)
In test sentences vice versa (boy sees girl)
Positive performance on all words of four test
sentences types
N who V N V N . (boy who likes girls sees woman.)
N V N who V N . (boy likes girls who see woman.)
N who N V V N . (boys who man likes see girl.)
N V N who N V . (boys like girl who man sees.)

27
Conclusions

ESNs can display both weak and strong
systematicity
Even with few training sentences and many test
sentences
By doing less training, the network can learn
more
Training fewer connections gives better results
Training a smaller part of possible sentences
gives better results

Can connectionism explain systematicity?
No, because neural networks do not need to be
systematic
Yes, because they need to adapt to systematicity
in the training input.
The source of systematicity is not the cognitive
system, but the external world.

Write a Comment

User Comments (0)