Title: Formal Typology: Explanation in Optimality Theory
1Formal Typology Explanation in Optimality Theory
- Paul Smolensky
- Cognitive Science Department
- Johns Hopkins University
with
Géraldine Legendre Donald Mathis Melanie
Soderstrom
Alan Prince Suzanne Stevenson Peter Jusczyk
2Advertisement
The Harmonic Mind From neural computation to
optimality-theoretic grammar  Paul
Smolensky  Géraldine Legendre
- Blackwell 2002 (??)
- Develop the Integrated Connectionist/Symbolic
(ICS) Cognitive Architecture - Apply to the theory of grammar
3Chomsky 1988
- 1. What is the system of knowledge?
- 2. How does this system of knowledge arise in
the mind/brain? - 3. How is this knowledge put to use?
- 4. What are the physical mechanisms that serve
as the material basis for this system of
knowledge and for the use of this knowledge? (p.
3)
4Responsibilities of Grammatical Theory
Chomskys Big 4 questions concerning knowledge
of grammar
OT
Structure
Nativist hypothesis
Acquisition
Processing
Neuro-genetics
Not new to Chomsky or generative grammar
5Jakobsons Program
- Linguistic theory is not just for theoretical
linguists - The same principles that explain formal
cross-linguistic and language-internal
distributional patterns can also explain - Acquisition
- Processing
- Neurological breakdown
6Jakobsons Program
- Markedness enables a Grand Unified Theory for the
cognitive science of language Avoid a - ? Structure
- Inventories lack a
- Alternations eliminate a
- ? Acquisition
- a is acquired late
- ? Processing
- a is processed poorly
- ? Neural
- Brain damage most easily disrupts a
7Talk Plan
OT Explanation
- ? Structure
- ? Acquisition
- ? Processing
- ? Neuro- genetics
?
8Responsibilities of Grammatical Theory
Chomskys Big 4 questions concerning knowledge
of grammar
Structure
Structure of UG Captured in a general formalism
for grammars and their variation
Acquisition
Processing
Neuro-genetics
9From Markedness to OT
- Formalizing markedness ? ? ? OT
- Markedness constraints
- Faithfulness constraints
- Competition
- Strict domination
- Strong universality Richness of the Base
10? Structure Formal ResultFormalizing
Markedness Two Problems
- Goal Change epiphenomenal explanatory status of
markedness - Markedness explains grammars (e.g., rules)
informal commentary about grammar vs. - Markedness IS grammar markedness-grammars
formally determine languages
?
11? Structure Formal ResultFormalizing
Markedness Two Problems
- Problem 1 Multidimensional integration
- Each dimension of linguistic structure
independently has its own marked pole, but how do
these dimensions combine? - Turns out to be related to another fundamental
problem
12? Structure Formal ResultFormalizing
Markedness Two Problems
- a is marked ? Avoid a
- But when how does avoidance happen? Problem
2 Pervasive variability in avoidance - Inventories If ? is absent in French because
it is marked how can it be present in English
despite being marked? - The grammar of every language turns on or off
No a a a markedness constraint. OT More
subtle version that also solves - Alternations If in environment E, a ? ß because
a is more marked than ß, how do we explain that
in E? a ?? ß even though a is more marked than
ß?
13? Structure Formal ResultFormalizing Markedness
- Most crudely Why arent unmarked elements always
avoided? - Something must oppose markedness forces.
- Markedness cannot be the sole basis of a formal
grammatical theory it is only one half of the
complete story.
14? Structure Formal ResultThe Great Dialectic
- Phonological representations serve two masters
FAITHFULNESS
MARKEDNESS
Locked in eternal conflict
15? Structure Formal ResultThe Core Constraints
of Con
- MARKEDNESS a (minimize effort maximize
distinctiveness) - constraint a ? Con ? a meets empirical
criteria for marked - Freedom? Empirically constrained by universal
patterns - FAITHFULNESS (be this invariant form)
- /input/ ? output is the identity map, i.e.,
- elements /x/ and x are in one-to-one
correspondence and identical (McCarthy Prince
95) - Constraints MAX(x), DEP(x), IDENT(x),
- Essentially determined by elements x of
representation - Freedom? Representations as always empirically
constrained to allow statement of markedness
constraints
In OT you can invent any constraint you want
?
16? Structure Formal ResultConflict
- Dialectic MARK vs. FAITH conflict
- Why arent marked elements always avoided?
- Because sometimes MARK is over-ruled by FAITH
- Why arent words always pronounced in their
invariant, lexical form? - Because sometimes FAITH is over-ruled by MARK
- ?1 over-rules (dominates) ?2 ?1 ?2
- Whether M gets violated (whether marked elements
fail to be avoided) varies by - Language (in some, M F in others, F M)
- Context (in some, M F2 in others F1 M)
17? Structure Formal ResultConflict
- Dialectic MARK vs. FAITH conflict
- Whether M gets violated (whether marked elements
fail to be avoided) varies by - Language (in some, M F in others, F M)
- Context (in some, M F2 in others F1 M)
- Why is there cross-linguistic variation?
- Phonetic ? Lexical MARK ? FAITH Dialectic gets
resolved differently - Typology by re-ranking Factorial Typology
- possible human languages ? rankings of Con
- (n constraints give n! rankings many are
equivalent)
18? Structure Formal ResultFormalizing Markedness
- Problem 1 Avoidance of the marked is
pervasively variable exactly where does marked
material appear? - Solution Constraint ranking MARK w.r.t.
FAITH - Will now see this also solves
- Problem 2 Multidimensional markedness
- Solution single constraint ranking for all
constraints in a given language
19? Structure Formal ResultFormalizing Markedness
- Markedness is multidimensional
- Each dimension has its universally marked pole
- How do dimensions combine? (?M1, M2) vs. (M1,
?M2) - CVC.CV (?STRESSHEAVY, MAINSTRESSRIGHT) vs.
CVC.CV - Integrate via a common markedness currency
Harmony - Numerical M1 ?3.2 M2 ?2.8
- Symbolic M1 absolutely worse than M2
- OT
- For a given language, there is a single
constraint ranking for all constraints - Strict domination hierarchy markedness on
higher-ranked constraints can never be
compensated for by unmarkedness on lower-ranked
ones
20? Structure Formal ResultCompetition for
Optimality
- Given an input, an OT grammar does not provide a
procedure for how to construct the output bur
rather a description of the output the structure
that best-satisfies the constraint ranking - Best-satisfies is a comparative criterion
outputs compete and the grammar identifies the
winner the optimal grammatical highest
Harmony output for that input
21? Structure Formal ResultHarmonic Competition
- Stress is on the initial heavy syllable iff the
number of light syllables n obeys
Pathological grammars
22? Structure Formal ResultHarmonic Competition
- Symbolic Harmony Strict domination
- STRESSHEAVY MAINSTRESSRIGHT
Stress the initial heavy syllable
- MAINSTRESSRIGHT STRESSHEAVY
Stress the final syllable
- Strict domination ? Grammars cant count
23? Structure Formal ResultOT Formal definition
- Gen Specifies candidate outputs for any given
input - Con The constraint set
- A grammar A hierarchical ranking of Con
- H-Eval Given two candidates and a ranking, a
formal definition employing strict domination of
which has higher Harmony which better-satisfies
the ranking - I ? O mapping I ? The maximal-Harmony
candidates in Gen(I)
24? Structure Formal ResultRichness of the Base
- Universality All systematic cross-linguistic
variation arises from differences in constraint
ranking - Therefore
- Con is universal H-Eval is universal
- Gen is universal, including the space of possible
inputs as well as possible outputs - i.e. No systematic cross-linguistic variation is
due to differences in inputs - e.g. Languages with no surface codas cannot get
this property from limitations on the lexicon
(e.g., a morpheme structure constraint Cwd)
but rather from the ranking - i.e. The grammar must have the property that
even if there were C-final inputs, there would
still be no surface codas
25Aside
- Richness of the Base is a principle for inducing
a grammar (generalizing) from a set of
grammatical items - It can be justified by the central principle of
John Goldsmiths presentation - ? Maximize the probability of the data
26? Structure Conceptual QuestionExplanatory
Power
- OT is as unexplanatory as extrinsically-ordered
rule-theory - Stipulating ranking stipulating ordering
27? Structure Conceptual QuestionAnalytic
Restrictiveness
- You can make up any constraint you want in OT
28? Structure Explanatory Goal Consequences of ?
? Con I The Subordination Pattern
- E.g., ? NOCODA
- Recall
- If No codas is in UG, why do codas ever appear?
- Conflict
- With faithfulness constraints
- With other markedness constraints other
dimensions of markedness - Cross-linguistic variation codas are less and
less restricted as NOCODA is subordinated to more
and more conflicting constraints (i.e.,
dimensions of markedness)
29? Structure Empirical Application Subordination
Pattern Codas
NOCODA
No codas at all
Codas only in stressed syllables
Geminate codas
Codas unrestricted
except prohibited inter-vocalically V.CV
30? Structure Conceptual QuestionMultiplicity
of Constraints
- For second pervasive pattern generated by ? ?
Con - Any framework which leads to the morass of
constraints found in OT analyses in phonology
cannot possibly be explanatorily adequate.
31? Structure Explanatory Goal Consequences of ?
? Con II Factorial Interaction
- Factorial interaction with varying interaction
(re-ranking), n simple modular constraints
correspond to - Multiplicity of rules (many more than n)
- Complex, non-modular rules
- Rules representational/notational tricks
- Rules constraints
- E.g., ? NOCODA
32? Structure Empirical Application Factorial
Interaction Codas
- Consider Con ? MAX ? MAX, DEP
- Number of constraints increases by 1
- Number of corresponding rules doubles as set of
repairs now includes epenthesis as well as
deletion - NOCODA MAX C?Ø/s
- ? NOCODA DEP Ø ?V/Cs
- ONSET MAX V?Ø/s
- ? ONSET DEP Ø ?C/sV
33? Structure Empirical Application Factorial
Interaction Codas
In general, the number of comparable rules
increases much faster than the number of
constraints
34? Structure Explanatory Goal Consequences of ?
? Con II Factorial Interaction
- Factorial interaction with varying interaction
(re-ranking), n simple modular constraints
correspond to - Multiplicity of rules (many more than n)
- Complex, non-modular rules
- Rules representational/notational tricks
- Rules constraints
- E.g., ? NOCODA
35? Structure Empirical Application Factorial
Interaction Codas
- STRESS-TO-WEIGHT NOCODA
- Codas only in stressed syllables
- C?Ø/s? segmental rule sensitive to foot
structurenon-modular rules - ANCHOR-R NOCODA
- Codas only word-finally
- C?Ø/s plus final-C extrametricality
representational trick - MAXµ NOCODA
- Only geminate codas /Cµ/
- C?Ø/s plus Hayes exclusivity of
associationnotational trick
36? Structure Empirical Application Factorial
Interaction
- STRESS-TO-WEIGHT NOCODA Codas only in stressed
syllables - STRESS-TO-WEIGHT Cµ
- Geminates only after stressed V
- µ?Ø/s?
- ANCHOR-R NOCODA Codas only word-finally
- ANCHOR-R voi,?son
- Obstruent devoicing except word-finally
- voi??voi/, ?son plus ?? to block
word-finally - MAXµ NOCODA Only geminate codas /C µ/
- MAXµ WEIGHT-TO-STRESS
- Geminates are the only codas in unstressed
syllables - C?Ø/s? plus exclusivity of association
37? Structure Jakobsons ProgramMarkedness
Faithfulness Harmony
- In summary
- Jakobsons key insight concerning linguistic
structure the central organizing principle of
grammar is Minimize Markedness - OT formalizes this as Maximize Harmony
- OT formalizes Markedness via violable constraints
- OT adds the crucial notion of Faithfulness the
other (lexical) half of the phonological
dialectic - OT Harmony combines Markedness with Faithfulness
their conflict is adjudicated via ranking - Ranking unifies multiple dimensions of markedness
38 ? Structure Summary
- OT achieves the explanatory goals of
- Changing the epiphenomenal status of markedness
in grammatical theory markedness is now in
grammar, not about grammar - A strongly universalist formalism exhibiting
Inherent Typology - Robust falsifiability
39Responsibilities of Grammatical Theory
Chomskys Big 4 questions concerning knowledge
of grammar
?
OT
Structure
Acquisition
Processing
Neuro-genetics
40? Acquisition Formal Result ILearning Theory
- Learning algorithm
- Provably correct and efficient (when part of a
general decomposition of the grammar learning
problem) - Sources
- Tesar 1995 et seq.
- Tesar Smolensky 1993, , 2000
- See for how to exploit the analogy to weighted
OT (Goldsmith, today) - If you hear A when you expected to hear E,
increase the Harmony of A above that of E by
minimally demoting each constraint violated by A
below a constraint violated by E
41? Acquisition Formal Result IConstraint
Demotion Algorithm
If you hear A when you expected to hear E,
increase the Harmony of A above that of E by
minimally demoting each constraint violated by A
below a constraint violated by E
Correctly handles difficult case multiple
violations in E
42? Acquisition Conceptual QuestionLarge
Grammar Space
- Huge number of grammars OT is too
unrestrictive
43? Acquisition Formal Result IILearnability
the Initial State
- M F is learnable with /inpossible/?impossible
- not in- except when followed by
- exception that proves the rule M NPA
- M F is not learnable from data if there are no
exceptions (alternations) of this sort, e.g.,
if no affixes and all underlying morphemes have
mp ?M and ?F, no M vs. F conflict, no evidence
for their ranking - Thus must have M F in the initial state, H0
44? Acquisition Empirical ApplicationInitial
State Experimental Test
- Collaborators
- Peter Jusczyk
- Theresa Allocco
- (Elliott Moreton, Karen Arnold)
- Here, only a thumbnail sketch (more in the OT
Workshop Thursday)
45? Acquisition Empirical ApplicationInitial
State Experimental Test
- Linking hypothesis
- More harmonic phonological stimuli ? Longer
listening time - More harmonic
- ?M ? M, when equal on F
- ?F ? F, when equal on M
- When must chose one or the other, more harmonic
to satisfy M M F - M Nasal Place Assimilation (NPA)
464.5 Months (NPA)
? Acquisition Empirical Application
47? Acquisition Empirical Application
4.5 Months (NPA)
484.5 Months (NPA)
? Acquisition Empirical Application
494.5 Months (NPA)
? Acquisition Empirical Application
50? Acquisition Jakobsons ProgramMarkedness
Distance from Initial State
- X is universally more marked than Y
- In addition to the constraints M1, M2, , Mk
violated by Y, X also violates markedness
constraints M?1, M?2, , M?n - Y will be acquired become admitted into the
childs inventory after M1, M2, Mn are all
demoted below relevant faithfulness constraints - These demotions are all necessary for X to be
acquired, and additional demotions of M?1, M?2,
, M?n are also required - X will require more time to be acquired
51Responsibilities of Grammatical Theory
Chomskys Big 4 questions concerning knowledge
of grammar
?
OT
Structure
?
?
Nativist hypothesis
Acquisition
Processing
Neuro-genetics
52? Processing Formal ResultsContext-Free Parsing
Algorithm
- Theorem (Tesar 1994, 1995b, a, 1996). Suppose
- Gen parses a string of input symbols into
structures specified via a context-free grammar - Con constraints meet a tree-locality condition
and penalize empty structure - Then a given dynamic programming algorithm is
- Left-to-right
- General (any such Gen, Con)
- Guaranteed to find the optimal outputs
- As efficient as parsers for conventional
context-free grammars.
53? Processing Formal ResultsFinite-State Parsing
Algorithm
- Theorem (Ellison 1994). Suppose
- Gen(I) is representable as a (non-deterministic)
finite-state transducer (particular to I) mapping
the input string to a set of output candidates - Con constraints are reducible to
multiply-violable binary constraints each
representable as a finite-state transducer
mapping an output candidate to a sequence of
violation marks - Then composing the Gen(I) and rank-sequenced
constraint-transducers yields a transducer that - Directly maps I to its optimal outputs
- Can be efficiently pruned by dynamic programming
54? Processing Formal ResultsComplexity of
Violable Constraints
- Theorem (Frank and Satta 1998). Suppose
- Gen is representable as a (non-deterministic)
finite-state transducer mapping an input string
to a set of output candidates - Con the set of structures incurring n violations
of each constraint is generable by a finite-state
machine, and n can be finitely bounded for each
constraint - Then the mapping from inputs to optimal outputs
has the complexity of a finite-state transducer. - Theorem (Hiller 1996, Smolensky 1997).
- If n is unbounded there are (extremely simple) OT
grammars with greater computational complexity.
55? Processing Conceptual QuestionProcessing
(Symbolic) Theory
- Infinite candidate set uncomputable
56? Processing Empirical ApplicationSentence
Processing
- Because an OT grammar assigns a parse to any
input, no additional principles (e.g., parsing
heuristics) are needed for parsing the initial,
incomplete segment of a sentence - Linking hypothesis
- Processing difficulty arises when previously
established structure needs to be abandoned in
the face of further input
57? Processing Empirical ApplicationPP Attachment
The servant of the actress who (Cuetos
Mitchell 88)
Assuming who is ambiguous for Case.
Violates NOM, LOCALITY2
Violates NOM, AGRCASE
Violates GEN
- LOCALITY If XP c-commands YP, then XP precedes
YP. - AGRCASE A relative pronoun must agree in Case
with the modified NP. - CASE GEN DAT ACC NOM (universal)
58? Processing Empirical ApplicationPP Attachment
The servant of the actress who (Cuetos
Mitchell 88)
- If GEN, AGRCASE LOCALITY2, then ? ?
attach high - If LOCALITY2 GEN or AGRCASE, then ? ? or ?
attach low
59? Processing Empirical ApplicationPP Attachment
- Preliminary result A cross-linguistic typology
of PP attachment patterns (across differences in
case and embedding depth) - Empirically promising, but not perfect
- Unclear yet how rankings determining parsing
preferences relate to rankings in the pure
competence grammar
60? Processing Jakobsons ProgramProcessing and
Markedness
- Phonological analogy Incrementally parse CVC
- /C/ ? C
- /CV/ ? CV
- /CVC/ ? CVC
- Now expect a V if get it, no reanalysis
- But if get a C, need reanalysis ? difficulty
- /CVCC/ ? CVCC
- Processing marked material (coda C) creates
difficulty because it is initially analyzed as
unmarked (as an onset)
61? Processing Conceptual QuestionProcessing
(Symbolic) Theory
- OT not psychologically plausible
62Responsibilities of Grammatical Theory
Chomskys Big 4 questions concerning knowledge
of grammar
?
OT
Structure
?
?
Nativist hypothesis
Acquisition
?
Processing
Neuro-genetics
63? Neuro-genetics Formal ResultsNeural
Representations (Gen)
64OT Connectionism
- OT derives from the numerical formalism, derived
from connectionist Harmony maximization, of - Harmonic Grammar (Legendre, Miyata, Smolensky,
1990)
65? Neuro-genetics Formal Results Neural
Constraints (Con)
NOCODA A syllable has no coda
H(as k æ t) sNOCODA lt 0
66? Neuro-genetics Formal Results UGenome for CV
Theory
- The game take a first shot at a concrete example
of a genetic encoding of UG in a Language
Acquisition Device - Proteins ? Universal grammatical principles ?
- Case study Basic CV Syllable Theory
- Introduce an abstract genome notion parallel to
(and encoding) abstract neural network - Collaborators
- Melanie Soderstrom
- Donald Mathis
67? Neuro-genetics Formal ResultsNetwork
Architecture
/C1 C2 /
C1 V C2
68? Neuro-genetics Formal ResultsPARSE
- All connection coefficients are 2
69? Neuro-genetics Formal ResultsONSET
- All connection coefficients are ?1
70? Neuro-genetics Formal ResultsConnectivity
geometry
71? Neuro-genetics Formal ResultsConstraint PARSE
- Input units grow south and connect
- Output units grow east and connect
- Correspondence units grow north west and
connect with input output units.
72? Neuro-genetics Formal ResultsConnectivity
Genome
- Contributions from ONSET and PARSE
73? Neuro-genetics Formal ResultsProcessing
74? Neuro-genetics Formal ResultsLearning
75? Neuro-genetics Formal ResultsLearning Behavior
- A simplified system can be solved analytically
- Learning algorithm turns out to
- Dsi(?) e violations of constrainti P?
76Conclusion
- OT is enabling progress on several explanatory
goals for linguistic theory - ? Inherent typology
- ? General learning theory
- ? General processing theory
- General biological realization
Often, OT formalizes Jakobsons program
Thank you for your attention