Title: Statistical Language Learning: Mechanisms and Constraints
1Statistical Language LearningMechanisms and
Constraints
- Jenny R. Saffran
- Department of Psychology Waisman Center
- University of Wisconsin - Madison
2(No Transcript)
3What kinds of learning mechanisms do infants
possess?
- How do infants master complex bodies of
knowledge? - Learning requires both experience innate
structure - bridge between nature nurture? - Constraints on learning computational,
perceptual, input-driven, maturational all
neural, though we are not working at that level
of analysis
4Language acquisition Experience versus innate
structure
- How much of language acquisition can be explained
by learning? - Language-specific linguistic structures
- Learning does not offer transparent explanations
- How is abstract linguistic structure acquired?
- Why are human languages so similar?
- Why cant non-human learners acquire human
language?
5Todays talk
- Consider a new approach to language learning that
may begin to address some of these outstanding
central issues in the study of language beyond
6Statistical Learning
freq XY freq X
pr YX
7Statistical Learning
freq XY freq X
pr YX
What computations are performed? What are the
units over which computations are performed? Are
these the right computations units given the
structure of human languages?
8Breaking into language
9Word segmentation
10Word segmentation cues
- Words in isolation
- Pauses/utterance boundaries
- Prosodic cues (e.g., word-initial stress in
English) - Correlations with objects in the environment
- Phonotactic/articulatory cues
- Statistical cues
11Statistical learning
High likelihood
High likelihood
Low likelihood
Continuations within words are systematic Continua
tions between words are arbitrary
12Transitional probabilities
PRETTY BABY
.80
versus
(freq) tyba (freq) ty
.0002
13Infants can use statistical cues to find word
boundaries
- Saffran, Aslin, Newport (1996)
- 2 minute exposure to a nonsense language
(tokibu, gopila, gikoba, tipolu) - Only statistical cues to word boundaries
- Tested on discrimination between words and
part-words (sequences spanning word boundaries)
14Experimental setup
15Headturn Preference Procedure
16tokibugikobagopilatipolutokibu gopilatipolutokibu
gikobagopila gikobatokibugopilatipolugikoba tipolu
gikobatipolugopilatipolu tokibugopilatipolutokibug
opila tipolutokibugopilagikobatipolu tokibugopilag
ikobatipolugikoba tipolugikobatipolutokibugikoba g
opilatipolugikobatokibugopila
17tokibugikobagopilatipolutokibu gopilatipolutokibu
gikobagopila gikobatokibugopilatipolugikoba tipolu
gikobatipolugopilatipolu tokibugopilatipolutokibug
opila tipolutokibugopilagikobatipolu tokibugopilag
ikobatipolugikoba tipolugikobatipolutokibugikoba g
opilatipolugikobatokibugopila
18tokibugikobagopilatipolutokibu gopilatipolutokibu
gikobagopila gikobatokibugopilatipolugikoba tipolu
gikobatipolugopilatipolu tokibugopilatipolutokibug
opila tipolutokibugopilagikobatipolu tokibugopilag
ikobatipolugikoba tipolugikobatipolutokibugikoba g
opilatipolugikobatokibugopila
19Results
Looking times (sec)
20Detecting sequential probabilities
- Statistical learning for word segmentation
- Infants track transitional probabilities, not
frequencies of co-ocurrence (Aslin, Saffran,
Newport, 1997) - The first useable cue to word boundaries Use of
statistical cues precedes use of lexical stress
cues (Thiessen Saffran, 2003) - Statistical learning is facilitated by the
intonation contours of infant-directed speech
(Thiessen, Hill, Saffran, 2005) - Infants treat tokibu as an English word
(Saffran, 2001) - Emerging words feed into syntax learning
(Saffran Wilson, 2003) - Other statistics useful for learning phonetic
categories, lexical categories, etc. - Beyond language Domain generality
- Tone sequences (Saffran et al., 1999 Saffran
Griepentrog, 2001) golabupabikututibudaropi... ?
ACEDGFCBGAFD - Visuospatial visuomotor sequences (Hunt
Aslin, 2000 Fiser Aslin, 2003) - Even non-human primates can do it! (Hauser,
Newport, Aslin, 2001)
21So does statistical learning really tell us
anything about language learning?
22Language acquisition Experience versus innate
structure
- How much of language acquisition can be explained
by learning? - Language-specific linguistic structures ?
- Learning does not offer transparent explanations
- How is abstract linguistic structure acquired?
- Why are human languages so similar?
- Why cant non-human learners acquire human
languages?
23Acquisition of basic phrase structure
- Words occur serially, but representations of
sentences contain clumps of words (phrases) - ?How is this structure acquired? Where does it
come from? - Innately endowed as part of Universal Grammar
(X-bar theory)? - Prosodic cues? (probabilistically available)
- Predictive dependencies as cues to phrase units
cross-linguistically (c.f.
mid-20th-century structural linguistics phrasal
diagnostics) - Nouns often occur without articles, but articles
usually require nouns - The walked down the street.
- NP often occurs without prepositions, but P
usually requires NP - She walked among.
- NP often occurs without Vtrans, but Vtrans
usually requires object NP - The man hit.
24Statistical cue to phrase boundaries
- Unidirectional predictive dependencies
? high conditional probabilities - Can humans use predictive dependencies to find
phrase units? (Saffran, 2001) - Artificial grammar learning task
- Dependencies were the only phrase structure cues
- Adults kids learned the basic structure of the
language
25Statistical cue to phrase boundaries
- Predictive dependencies assist learners in the
discovery of abstract underlying structure. - ? Predicts better phrase structure learning when
predictive dependencies are available than when
they are not. - Constraint on learning Provides potential
learnability explanation for why languages so
frequently contain predictive dependencies
26Do predictive dependencies enhance learning?
- Methodology Contrast the acquisition of two
artificial grammars (Saffran, 2002) - Predictive language
- - Contains predictive dependencies between
word classes as a cue to phrasal units - Non-predictive language
- - No predictive dependencies between
word classes
27Predictive language
- S ? AP BP (CP)
- AP ? A (D)
- BP ? CP F
- CP ? C (G)
- A BIFF, SIG, RUD, TIZ
- Note Dependencies are the opposite direction
from English (head-final language)
A, AD
C, CG
28Non-predictive language
S ? AP BP AP ? (A) (D) BP ? CP
F CP ? (C) (G) e.g., in English NP ?
(Det) (N)
A, D, AD
C, G, CG
Det, N, Det N
29Predictive vs. Non-predictive language comparison
- P N
- Sentence types 12 9
- Five word sentences 33 11
- Three word sentences 11 44
- Lexical categories 5 5
- Vocabulary size 16 16
30Experiment 1
- Participants Adults 6- to 9-year-olds
- Predictive versus Non-predictive phrase structure
languages - Language Between-subject variable
- Incidental learning task
- 40 min. auditory exposure, with descending
sentential prosody - Auditory forced-choice test
- Novel grammatical vs. novel ungrammatical
- Same test items for all participants
RUD
BIFF
HEP
KLOR
LUM
CAV
DUPP.
LUM
TIZ.
31Results
Mean score (chance 15)
32Experiment 2 Effect of predictive dependencies
beyond the language domain?
- Same grammars, different vocabulary
- Nonlinguistic materials Alert sounds
- Exp. 1 materials (Predictive Non-predictive
grammars and test items), translated into
non-linguistic vocabulary - Adult participants
33Linguistic versus non-linguistic
Mean score (chance 15)
34New auditory non-linguistic task Predictive vs.
Non-predictive languages
35Non-linguistic replication
Mean score (chance 15)
36Predictive language gt Non-predictive language
- Predictive dependencies play a role in learning
- For both linguistic non-linguistic auditory
materials - Also seen for simultaneous visual displays
- But not sequential visual displays ? modality
effects - Human languages may contain predictive
dependencies because they assist the learner in
finding structure. - The structure of human languages may have been
shaped by human learning mechanisms. - ? Predict different patterns of learning for
appropriately aged human learners versus
non-human learners.
37Infant/Tamarin comparison Methodology(with Marc
Hauser _at_ Harvard)
Headturn Preference Procedure
Orienting Procedure
Laboratory exposure Home cage exposure
Test Measure looking times
Test Measure orienting responses Paired
methods previously used in studies of word
segmentation, simple grammars, etc.
(Hauser, Newport, Aslin, 2001 Hauser, Weiss,
Marcus, 2002 etc.)
38Materials
- Predictive vs. Non-Predictive languages (between
Ss) - Small Grammar Used to validate methodology
- Grammars written over individual words, not
categories (one A word, one C word, etc.) - 8 sentences, repeated
- 2 min. exposure (infants) or 2 hrs. exposure
(tamarins) - Grammatical (familiar) vs. ungrammatical test
items - Large Grammar Languages from adult studies
- Grammars written over categories (category A, C,
etc.) - 50 sentences, repeated
- 21 min. exposure (infants) or 2 hrs. exposure
(tamarins) - Grammatical (novel) vs. ungrammatical test items
39Tamarin results
Small grammar
Large grammar
40Tamarin results
Small grammar
Large grammar
41Infant results (12-month-olds, 12 per group)
Looking times (sec)
Small grammar
Looking times (sec)
Large grammar
42Cross-species differences
- Small grammar vs. large grammar
- Tamarins only learned the small grammar
- Difficulty with generalization? Memory for
sentence exemplars? - Can learn patterns over individual elements but
not categories? - Infants learned both systems, despite size of
large grammar - Availability of predictive dependencies
- Only affected the tamarins learning the small
grammar - Affected the infants regardless of the size of
the grammar - Consistent with constrained statistical learning
hypothesis ? human learning mechanisms may have
shaped the structure of natural languages
43Constrained statistical learning as a theory of
language acquisition?
- Word segmentation, aspects of phonology, aspects
of syntax - Developing the theory
- Scaling up Multiple probabilistic cues in the
input (e.g., prosodic cues), multiple levels of
language in the input, more realistic speech
(e.g., IDS) - Mapping to meaning Are statistically-segmented
words good labels? - Critical period effects Exogenous constraints on
statistical learning - Modularity Distinguishing domain-specific
domain-general factors - e.g., statistical learning of musical syntax
- Bilingualism Separating languages computing
separate statistics - Relating to real acquisition outcomes Individual
differences - Patients with congenital amusia with Isabelle
Peretz, U. de Montreal - Specific Language Impairment study with Dr. Julia
Evans, UW-Madison
44Conclusions
- Infants are powerful language learners Rapid
acquisition of complex structure without external
reinforcement - However, humans are constrained in the types of
patterns they readily acquire - Understanding what is not learnable may be just
as valuable as cataloging what infants can
learn - ? These predispositions may be among the factors
that have shaped the structure of human language
45Acknowledgements
Infant Learning Lab UW-Madison
- National Institutes of Health RO1 HD37466, P30
HD03352 - National Science Foundation PECASE BCS-9983630
- UW-Madison Graduate School
- UW-Madison Waisman Center
- Members of the Infant Learning Lab
- All the parents and babies who have participated!