An evaluation measure after all - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

An evaluation measure after all

Description:

This is one part of the CUNY-CoLAG project (Computational ... ( Breed the fitter ones. Then evaluate a. batch of the offspring. And again...) (Clark 1992) ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 26
Provided by: scE63
Category:

less

Transcript and Presenter's Notes

Title: An evaluation measure after all


1
An evaluation measure after all?
  • Janet Dean Fodor
  • The Graduate Center, CUNY
  • San Sebastian, June 2006

2
Credit where its due
Joint work with William G. Sakas
(Hunter College CUNY Graduate Center)
CUNY graduate students
David Brizan Carrie
Crowther Arthur Hoskey Xuân-Nga Kam
Iglika Stoyneshka Lidiya Tornyova
  • This is one part of the CUNY-CoLAG project
    (Computational Language Acquisition Group)

3
Agenda
  • Chapter 1, Aspects of the Theory of Syntax (1965)
  • A program for modeling language acquisition
  • Why have we not fulfilled it?
  • Valuable studies of what children know when,but
    still no viable psycho-computational models.

4
Aspects Chapter 1
  • Let us consider what is involved in the
    construction of an acquisition model for
    language.
  • Representation of input signal and
    derivations
  • The class of possible grammars
  • A method for selecting one on the
    basis of a childs primary linguistic
    data an evaluation measure.
  • An actual acquisition model must have a strategy
    for finding hypotheses. E.g., grammars
    that exceed a certain value (in terms of the
    evaluation measure).

5
From creating rules to setting parameters
  • The Chapter 1 program for modeling acquisition
    was never fulfilled. Too many possible grammars
    no plausible EM.
  • Shift to parameter theory (Chomsky 1981)
    languages differ only in their lexicons and the
    values of a small finite number of parameters,
    e.g., null subject.
  • Now a finite number of possible grammars. ?
  • Input sentences trigger the parameter values,
    so the learner knows which parameter values to
    adopt to license each input sentence. ?
  • Incremental (memoryless) learning. Choose next
    grammar hypothesis on basis of current input
    sentence only. ?

6
But ambiguous triggers (Clark 1989)
  • Pat expects Sue to win. How is case licensed on
    Sue?
  • ECM matrix verb governs lower
    subject. or SCM non-finite Infl assigns
    case to its subject.
  • What can Pat sing? Why is the object in initial
    position?
  • WH-movement or Scrambling
  • Over-optimistic For each parameter, an
    unambiguous trigger, innately specified.
    Realistically, it would be masked by other
    differences between languages.
    (Clark 1989 Gibson Wexler 1994)
  • The null subject parameter is not typical!

7
So parameter setting needs EM too
  • Ambiguity of triggers ? no unique grammar to
    choose for an input sentence. A pool of
    candidates.
  • So either the switch-setting mechanism contains
    overrides. Or the switches are set only after a
    choice is made between candidates.
  • Either way, a decision must be made EM.
  • So not automatic triggering, and not
    error-free. Parameter settings must be
    revisable.(Non-deterministic - important below)

8
EM must include the Subset Principle
  • Poverty of the Stimulus (POS)
  • POPS poverty of the positive stimulus
    E.g., little if any exposure to
    parasitic gaps.
  • So grammar formation cant be purely
    data driven. (Kam et al.) Whatever
    data are available must entail the rest.
  • PONS extreme poverty of the negative
    stimulus
  • I.e., little if any info about what is
    ungrammatical.
  • So all grammar choices must be
    conservative.
  • Subset Principle If one language properly
    contains another, both compatible with the
    input, EM must favor the latter.

9
Aspects illuminates current learning models
  • Whether learning is rule-based or
    parameter-based
  • On hearing a sentence that my current grammar
    does not license, what is the pool of possible
    grammars to switch to?
  • Obviously right answer All and only the grammars
    compatible with this input sentence.
  • Preferably ranked by EM, so that all learners
    follow the same route, favor the same
    generalizations.
  • Necessarily ranked by EM, where subset/superset
    choices.

10
Recent models of parameter setting
  • Recent learning models do not reflect this
    picture at all.
  • They have no analogue of triggering.
  • They have no way to compute the set of candidate
    grammars.
  • There is no way to apply an evaluation metric,
    including SP.
  • 4 examples

11
Candidate grammar pool in recent models
  • Any grammar identical to Gcurrent except add
    one transformation to convert the wrongly
    generated word string into the observed word
    string or delete a transformation, selected at
    random, regardless of whether the string is then
    licensed. (Wexler Culicover 1980)
  • Any grammar that differs from Gcurrent with
    respect to one parameter value. (Then check
    whether it licenses the string adopt it only if
    it does.)
    (Gibson Wexler 1994)

12
Candidate grammars in recent models
  • From the set of all grammars, a batch that are
    being simultaneously evaluated against successive
    sentences. (Breed the fitter ones. Then evaluate
    abatch of the offspring. And again)
    (Clark 1992)
  • A grammar selected from among all grammars, with
    probability based on how well each of its
    parameter values has performed in the past. (If
    it parses the new input, upgrade the weights of
    its p-values if it fails, downgrade
    them.) (Yang 2000)

13
The CUNY model (STL)
  • Retain as much of triggering as possible As in
    triggering, the input sentence should tell the
    learner which parameters could be reset to
    license the sentence.
  • E.g. What can Pat sing? WH-movt or
    Scrambling
  • Who is he looking at? Wh-movt
    ?Pied piping
  • We call this parametric decoding.
  • But no-one knows how to do it, fully and
    feasibly.

14
Theres no good substitute for decoding
  • Trial-and-error selection of a grammar hypothesis
    without reference to the input sentence is
    inefficient. Our simulation studies confirm this.
    Many input sentences go by with nothing learned
    from them, because an unsuccessful hypothesis was
    tested.
  • Models that assign a success score to grammars
    dont waste inputs, but have to cover the search
    space.Without decoding, these are also
    inefficient.
  • Compare decoding The target grammar is one of
    these, one of these, and one of these. Intersect.
    Apply EM,SP.

15
Only partial decoding is feasible
  • The sentence parsing routines (innate) can do
    parametric decoding When the current grammar
    fails to parse the input sentence, draw on
    additional parameter values to complete the
    parse. Adopt those values.
  • But for full decoding, the parse would have to be
    parallel Compute every parse tree for the
    sentence.
  • Unrealistic! Even adults cant do full parallel
    parsing.
  • With serial parsing ( compute just one structure
    for the sentence), only partial decoding.
  • Partial decoding is not fail-safe. An unattended
    language might be a subset of the adopted
    language.

16
Summary (so far) on decoding
  • The Chapter 1 blueprint for a model of
    acquisition from primary linguistic data requires
    exhaustive decoding ? the set of all grammars
    compatible with the sentence, so that
    EM can select the best one.
  • How to do this was not specified in Chapter 1.
  • All we can realistically assume is partial
    decoding.And that looks to be as useless for SP
    as none at all.

17
What to do?
  • First, a detour. Reconsider a traditional
    (clunky) old approach
    to EM enumeration.
  • It needs a new twist, to make it psychologically
    acceptable.
  • But then it solves the problem of how to apply
    SP Locate candidates in EM-order adopt first
    that works.
  • It also solves another dire problem the fact
    that SP can itself cause learning failures.
  • Partial decoding fits profitably into this model.

18
Traditional solution enumeration (Gold)
  • Assume an innate ordering of all
    grammars/languages with subsets preceding
    supersets, other EM rankings.
  • The learning algorithm must test grammars in that
    sequence, moving to the next one only when
    preceding grammars have been disconfirmed.
  • By doing so the learner automatically respects
    SP.
  • But psycholinguistically absurd! 30 parameters ?
    billion grammars. Learning by enumeration within
    a reasonable time bound is likely to be
    intractable (Pinker 1979)
  • No role for decoding at all just trial-and-error
    again.

19
Re-thinking enumeration
  • Twist the ordering of all possible grammars into
    a lattice, representing the subset-superset
    relations between them. (Strictly a poset)
  • .

For our 3,072 languages 31,504 subset relations
20
Approx 10 of the CoLAG grammar lattice
(256)
super sub
(366)
21
How a learner could use the lattice
  • At the bottom of the lattice are languages with
    no proper subsets. We call these smallest
    languages.
  • They are the only legitimate hypotheses when
    learning begins.
  • As smallest languages are tried and disconfirmed
    by input, they are erased from the lattice.
  • So the pool of legitimate hypotheses at the
    bottom edge changes as learning proceeds. But all
    respect SP.
  • Smallest languages might be tested by
    trial-and-error. Or more efficiently by partial
    decoding, using the parser.

22
Enumeration by lattice - pros and cons
  • Safe but efficient. No need to crawl through all
    languages between Lcurrent and Ltarget. Only
    through all subsets of Ltarget.
    ?
  • Erasing grammars is like phonological learning,
    where all possible distinctions at birth are
    whittled down by exposure to the target language.
    ?
  • Keeping track of disconfirmed grammars by erasure
    does not add to memory load.
    ?
  • To check Can we successfully integrate the
    smallest languages restriction into the
    decoding process? ?

23
Now Lattice solves another bizarre problem
  • Parameter setting implies incremental learning.
  • Incremental learning is considered plausible /
    desirable because it requires no memory for past
    inputs or past grammar hypotheses. (In contrast
    to little linguist models)
  • But SP and incremental learning are incompatible.
    SP becomes over-conservative. It causes
    undershoot errors which can prevent convergence
    on the target.
  • SP demands selection of the least inclusive
    language compatible with the current input
    sentence an absurdly small language, lacking
    constructions previously acquired. E.g.
    Its bedtime. ? no topicalization,
    extraposition,
    passive, tag questions,

24
Learning failures without SP and with SP
  • The culprit (again) is ambiguity of triggers, a
    fact about the natural language domain.
  • How could the learning mechanism cope?
  • The erasure of grammars from the lattice blocks
    excessive retrenchment. No more undershoot
    errors.
  • But are we really born with this grammar lattice
    in our heads? Could it all be physics? Could it
    be projected? Current evidence suggests it cant.

25
To wrap up
  • Starting with Chomskys Chapter 1 blueprint
    (idealization to instantaneous acquisition),
    attempt to build a process model of syntactic
    parameter setting.
  • Aim is computational rigor psychological
    verisimilitude.Impose strict limits on
    resources memory, computation.
    (Pinker 1979)
  • We have looked two nasty problems in the eye,
    concerning the choice among grammars all
    compatible with the input, to find out what a
    solution would have to be like.
  • After 40 years, a suggestion. The lattice model
    is an idea about a direction towards a possible
    solution.
Write a Comment
User Comments (0)
About PowerShow.com