Generalization and Systematicity in Echo State Networks - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Generalization and Systematicity in Echo State Networks

Description:

'The ability to produce/understand some sentences is intrinsically ... (ESN; Jaeger, 2001) Train only output connections. One-shot learning by linear regression ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 16
Provided by: sfr33
Category:

less

Transcript and Presenter's Notes

Title: Generalization and Systematicity in Echo State Networks


1
Generalization and Systematicity in Echo State
Networks
  • Stefan Frank
  • Institute for Logic, Language and Computation
  • University of Amsterdam
  • The Netherlands
  • Michal Cernanský
  • Institute of Applied Informatics
  • Slovak University of Technology
  • Bratislava, Slovakia

2
Systematicity in language
  • The ability to produce/understand some sentences
    is intrinsically connected to the ability to
    produce/ understand certain others (Fodor
    Pylyshyn, 1988)
  • If you understand Quokkas are cute. I eat nice
    food.
  • you also understand Quokkas are nice food. I
    eat cute quokkas. (and many more...)
  • ...unless you learned (a bit of) English by
    memorizing a phrase book

3
Systematicity and connectionismFodor Pylyshyn
(1988)
  • A compositional symbol system is needed to
    explain this phenomenon
  • Neural networks do not provide such a system
  • So connectionism cannot account for systematicity
    (and connectionist modelling should be abandoned)

Do neural networks learn sentences as if they
memorize a phrase book, or can they display
systematicity?
4
Systematicity and connectionism
  • Systematicity in language is just
    likegeneralization in neural networks
  • Do neural networks generalize to the same extent
    as people do?
  • Hadley (1994)
  • People display strong systematicity words that
    have only been observed in one grammatical
    position (e.g., quokkas as a subject noun) can be
    generalized to new positions (e.g., quokkas as
    object of eat)
  • Connectionist models of sentence processing have
    not been shown to generalize in this way (note
    in 1994)

5
Systematicity and connectionism
  • Standard approach in connectionist modelling of
    sentence processing
  • Small, artificial language
  • Random sampling of many sentences for training
  • Simple recurrent network (SRN Elman, 1990)
    trained on next-word prediction
  • Test on new sentences
  • Because of large random sample, each word will
    have occurred in each legal position ? no test
    for strong systematicity
  • Even when SRN systematicity has been claimed
  • Excessive training ? not psychologically
    realistic
  • Training details were crucial ? no robust outcomes

6
Echo state networks
  • SRNs require slow, iterative training (e.g.,
    backprop)
  • Echo state network(ESN Jaeger, 2001)
  • Train only output connections
  • One-shot learning by linear regression
  • No training parameters

Simple recurrent network
Echo state network
output (word predictions)
recurrent layer
Can ESNs display strong systematicity in sentence
processing?
input (words)
7
SimulationsThe language
  • 26 words
  • 12 plural nouns (3 females, 3 males, 6 animals)
  • 10 transitive plural verbs
  • 2 prepositions
  • 1 relative clause marker that
  • 1 end-of-sentence marker end
  • Sentences types
  • Simple N V N girls see boys end
  • Prepositional phrase girls see boys with quokkas
    end
  • Subject-relative clause girls that see boys like
    quokkas end
  • Object-relative clause girls that boys see like
    quokkas end
  • multiple embeddings girls that see boys that
    quokkas like avoid elephants end

8
SimulationsTraining and test sentences
  • For training 5,000 sentences all females are
    subject and all males are object
  • For testing new sentences with one
    subject-relative clause (SRC) or one
    object-relative clause (ORC)
  • SRC1 girls that like boys see men end
  • SRC2 girls like boys that see men end
  • ORC1 girls that women like see men end
  • ORC2 girls like boys that women see end
  • Mere generalization 10,759 sentences with
    female subjects and male objects (as during
    training)

9
SimulationsTraining and test sentences
  • For training 5,000 sentences all females are
    subjects and all males are object
  • For testing new sentences with one
    subject-relative clause (SRC) or one
    object-relative clause (ORC)
  • SRC1 boys that like girls see women end
  • SRC2 boys like girls that see women end
  • ORC1 boys that men like see women end
  • ORC2 boys like girls that men see end
  • Mere generalization 10,759 sentences with
    female subjects and male objects (as during
    training)
  • Strong systematicity 10,800 sentence with male
    subjects and female objects (unlike during
    training)

10
SimulationsRating performance
  • Performance
  • The output vector is the networks estimate of
    next-word probabilities
  • The true probability distribution follows from
    the grammar
  • The cosine between the two is the measure for
    network performance
  • Baseline
  • Take all n-gram models (based on training
    sentences),from n 1 to the number of words in
    the sentence so far
  • The one that performs best (at each point in each
    test sentence) is the baseline
  • To be considered systematic, the ESN should
    generally perform better than the best n-gram
    model

11
ESN resultsGeneralization
  • The ESN generally outperforms n-gram models when
    testing for mere generalization

12
ESN resultsSystematicity
  • The ESN often performs much worse than n-gram
    models when testing for strong systematicity

13
Improving ESN performance
  • Old solution (Frank, 2006) Add a layer of units
    ? iterative training needed
  • New solution Use informative rather than random
    word representations
  • Let the representation of word i (i.e., input
    weight vector wi ) encode co-occurrence info
  • Efficient (one-shot, non-iterative)
  • Unsupervised (not task-dependent)
  • Captures paradigmatic relations (representations
    of words from the same syntactic category tend to
    cluster together)

14
ESN resultsGeneralization
  • ESN and ESN perform similarly when tested for
    mere generalization

15
ESN resultsSystematicity
  • ESN generally outperforms both n-gram models and
    ESN when tested for systematicity
  • Strong systematicity without iterative training
Write a Comment
User Comments (0)
About PowerShow.com