Generalization and Systematicity in Echo State Networks

About This Presentation

Title:

Generalization and Systematicity in Echo State Networks

Description:

'The ability to produce/understand some sentences is intrinsically ... (ESN; Jaeger, 2001) Train only output connections. One-shot learning by linear regression ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 16

Provided by: sfr33

Category:

more less

Transcript and Presenter's Notes

Title: Generalization and Systematicity in Echo State Networks

1
Generalization and Systematicity in Echo State
Networks

Stefan Frank
Institute for Logic, Language and Computation
University of Amsterdam
The Netherlands
Michal Cernanský
Institute of Applied Informatics
Slovak University of Technology
Bratislava, Slovakia

2
Systematicity in language

The ability to produce/understand some sentences
is intrinsically connected to the ability to
produce/ understand certain others (Fodor
Pylyshyn, 1988)
If you understand Quokkas are cute. I eat nice
food.
you also understand Quokkas are nice food. I
eat cute quokkas. (and many more...)
...unless you learned (a bit of) English by
memorizing a phrase book

3
Systematicity and connectionismFodor Pylyshyn
(1988)

A compositional symbol system is needed to
explain this phenomenon
Neural networks do not provide such a system
So connectionism cannot account for systematicity
(and connectionist modelling should be abandoned)

Do neural networks learn sentences as if they
memorize a phrase book, or can they display
systematicity?
4
Systematicity and connectionism

Systematicity in language is just
likegeneralization in neural networks
Do neural networks generalize to the same extent
as people do?
Hadley (1994)
People display strong systematicity words that
have only been observed in one grammatical
position (e.g., quokkas as a subject noun) can be
generalized to new positions (e.g., quokkas as
object of eat)
Connectionist models of sentence processing have
not been shown to generalize in this way (note
in 1994)

5
Systematicity and connectionism

Standard approach in connectionist modelling of
sentence processing
Small, artificial language
Random sampling of many sentences for training
Simple recurrent network (SRN Elman, 1990)
trained on next-word prediction
Test on new sentences
Because of large random sample, each word will
have occurred in each legal position ? no test
for strong systematicity
Even when SRN systematicity has been claimed
Excessive training ? not psychologically
realistic
Training details were crucial ? no robust outcomes

6
Echo state networks

SRNs require slow, iterative training (e.g.,
backprop)
Echo state network(ESN Jaeger, 2001)
Train only output connections
One-shot learning by linear regression
No training parameters

Simple recurrent network
Echo state network
output (word predictions)
recurrent layer
Can ESNs display strong systematicity in sentence
processing?
input (words)
7
SimulationsThe language

26 words
12 plural nouns (3 females, 3 males, 6 animals)
10 transitive plural verbs
2 prepositions
1 relative clause marker that
1 end-of-sentence marker end
Sentences types
Simple N V N girls see boys end
Prepositional phrase girls see boys with quokkas
end
Subject-relative clause girls that see boys like
quokkas end
Object-relative clause girls that boys see like
quokkas end
multiple embeddings girls that see boys that
quokkas like avoid elephants end

8
SimulationsTraining and test sentences

For training 5,000 sentences all females are
subject and all males are object
For testing new sentences with one
subject-relative clause (SRC) or one
object-relative clause (ORC)
SRC1 girls that like boys see men end
SRC2 girls like boys that see men end
ORC1 girls that women like see men end
ORC2 girls like boys that women see end
Mere generalization 10,759 sentences with
female subjects and male objects (as during
training)

9
SimulationsTraining and test sentences

For training 5,000 sentences all females are
subjects and all males are object
For testing new sentences with one
subject-relative clause (SRC) or one
object-relative clause (ORC)
SRC1 boys that like girls see women end
SRC2 boys like girls that see women end
ORC1 boys that men like see women end
ORC2 boys like girls that men see end
Mere generalization 10,759 sentences with
female subjects and male objects (as during
training)
Strong systematicity 10,800 sentence with male
subjects and female objects (unlike during
training)

10
SimulationsRating performance

Performance
The output vector is the networks estimate of
next-word probabilities
The true probability distribution follows from
the grammar
The cosine between the two is the measure for
network performance
Baseline
Take all n-gram models (based on training
sentences),from n 1 to the number of words in
the sentence so far
The one that performs best (at each point in each
test sentence) is the baseline
To be considered systematic, the ESN should
generally perform better than the best n-gram
model

11
ESN resultsGeneralization

The ESN generally outperforms n-gram models when
testing for mere generalization

12
ESN resultsSystematicity

The ESN often performs much worse than n-gram
models when testing for strong systematicity

13
Improving ESN performance

Old solution (Frank, 2006) Add a layer of units
? iterative training needed
New solution Use informative rather than random
word representations
Let the representation of word i (i.e., input
weight vector wi ) encode co-occurrence info
Efficient (one-shot, non-iterative)
Unsupervised (not task-dependent)
Captures paradigmatic relations (representations
of words from the same syntactic category tend to
cluster together)

14
ESN resultsGeneralization

ESN and ESN perform similarly when tested for
mere generalization

15
ESN resultsSystematicity

ESN generally outperforms both n-gram models and
ESN when tested for systematicity
Strong systematicity without iterative training

Write a Comment

User Comments (0)

About PowerShow.com

Generalization and Systematicity in Echo State Networks - PowerPoint PPT Presentation

Generalization and Systematicity in Echo State Networks

'The ability to produce/understand some sentences is intrinsically ... (ESN; Jaeger, 2001) Train only output connections. One-shot learning by linear regression ... – PowerPoint PPT presentation