Title: The Harmonic Mind
1The Harmonic Mind
- Paul Smolensky
- Cognitive Science Department
- Johns Hopkins University
with
Géraldine Legendre Alan Prince
Peter Jusczyk Donald Mathis Melanie Soderstrom
A Mystery Co-laborator
2Personal Firsts thanks to SPP
- First invited talk! ( first visit to JHU, 1986)
- First public confessional midnight thoughts of a
worried connectionist (UNC, 1988) - First generative syntax talk (Memphis, 1994)
- First attempt at stand-up comedy (Columbia, 2000)
- First rendition of a 900-page book as a graphical
synopsis in Powerpoint (1 minute from now)
3Advertisement
The Harmonic Mind From neural computation to
optimality-theoretic grammar Paul
Smolensky Géraldine Legendre
- Blackwell 2002 (??)
- Develop the Integrated Connectionist/Symbolic
(ICS) Cognitive Architecture - Case study in formalist multidisciplinary
cognitive science
4Talk Plan
- Sketch the ICS cognitive architecture, pointing
to contributions from/to traditional disciplines - Topics of direct philosophical relevance
- Explanation of the productivity of cognition
- Nativism
- Theoretical work
- Symbolic
- Connectionist
- Experimental work
5Mystery Quote 1
- Smolensky has recently been spending a lot of
his time trying to show that, vivid first
impressions to the contrary notwithstanding, some
sort of connectionist cognitive architecture can
indeed account for compositionality,
productivity, systematicity, and the like. It
turns out to be rather a long story 185 pages
are devoted to Smolenskys telling of it, and
there appears to be no end in sight. It seems it
takes a lot of squeezing to get this stone to
bleed.
6Processing I Activation
- Computational neuroscience ? ICS
- Key sources
- Hopfield 1982, 1984
- Cohen and Grossberg 1983
- Hinton and Sejnowski 1983, 1986
- Smolensky 1983, 1986
- Geman and Geman 1984
- Golden 1986, 1988
Processing spreading activation is
optimization Harmony maximization
7Processing II Optimization
- Cognitive psychology ? ICS
- Key sources
- Hinton Anderson 1981
- Rumelhart, McClelland, the PDP Group 1986
8Representation
- Symbolic theory ? ICS
- Complex symbol structures
- Generative linguistics ? ICS
- Particular linguistic representations
- PDP connectionism ? ICS
- Distributed activation patterns
- ICS
- realization of (higher-level) complex symbolic
structures in distributed patterns of activation
over (lower-level) units (tensor product
representations etc.)
9Representation
10Constraints
- Linguistics (markedness theory) ? ICS
- ICS ? Generative linguistics Optimality Theory
- Key sources
- Prince Smolensky 1993 ms. Rutgers report
- McCarthy Prince 1993 ms.
- Texts Archangeli Langendoen 1997, Kager 1999,
McCarthy 2001 - Electronic archive rutgers/ruccs/roa.html
Met in SPP Debate, 1988!
11Constraints
NOCODA A syllable has no coda
H(as k æ t) sNOCODA lt 0
12Constraint Interaction I
- ICS ? Grammatical theory
- Harmonic Grammar
- Legendre, Miyata, Smolensky 1990 et seq.
13Constraint Interaction I
The grammar generates the representation that
maximizes H this best-satisfies the constraints,
given their differential strengths
Any formal language can be so generated.
14Harmonic Grammar Parser
- Simple, comprehensible network
- Simple grammar G
- X ? A B Y ? B A
- Language
-
Parsing
15Harmonic Grammar Parser
16Harmonic Grammar Parser
17Harmonic Grammar Parser
H(Y, B) gt 0H(Y, A) gt 0
- Weight matrix for Y ? B A
18Harmonic Grammar Parser
- Weight matrix for X ? A B
19Harmonic Grammar Parser
- Weight matrix for entire grammar G
20Bottom-up Parsing
21Top-down Parsing
22Explaining Productivity
- Full-scale parsing of formal languages by
neural-network Harmony maximization productive
competence - How to explain?
231. Structured representations
24 2. Structured connections
25 Proof of Productivity
- Productive behavior follows mathematically from
combining - the combinatorial structure of the vectorial
representations encoding inputs outputs - and
- the combinatorial structure of the weight
matrices encoding knowledge
26Mystery Quote 2
- Paul Smolensky has recently announced that the
problem of explaining the compositionality of
concepts within a connectionist framework is
solved in principle. This sounds suspiciously
like the offer of a free lunch, and it turns out,
upon examination, that there is nothing to it.
27Explaining Productivity I
Intra-level decomposition A B ? A, B
Inter-level decomposition A B ? 1,0,?1,1
28Explaining Productivity II
Intra-level decomposition G ? X?AB, Y?BA
Inter-level decomposition A B ? 1,0,?1,1
29Mystery Quote 3
- even after all those pages, Smolensky hasnt
so much as made a start on constructing an
alternative to the Classical account of the
compositionality phenomena.
30Constraint Interaction II OT
- ICS ? Grammatical theory
- Optimality Theory
- Prince Smolensky 1993
31Constraint Interaction II OT
- Differential strength encoded in strict
domination hierarchies - Every constraint has complete priority over all
lower-ranked constraints (combined) - Take-the-best heuristic (Hertwig, today)
- constraint ? cue
- ranking ? cue validity
- Decision-theoretic justification for OT?
- Approximate numerical encoding employs special
(exponentially growing) weights
32Constraint Interaction II OT
- Stress is on the initial heavy syllable iff the
number of light syllables n obeys
No way, man
33Constraint Interaction II OT
- Constraints are universal
- Human grammars differ only in how these
constraints are ranked - factorial typology
- First true contender for a formal theory of
cross-linguistic typology
34The Faithfulness / Markedness Dialectic
- cat /kat/ ? kæt NOCODA why?
- FAITHFULNESS requires identity
- MARKEDNESS often opposes it
- Markedness-Faithfulness dialectic ? diversity
- English NOCODA FAITH
- Polynesian FAITH NOCODA (French)
- Another markedness constraint M
- Nasal Place Agreement Assimilation (NPA)
- mb ? nb, ?b nd ? md, ?d ?g ? ?b,
?d - labial coronal
velar
35Nativism I Learnability
- Learning algorithm
- Provably correct and efficient (under strong
assumptions) - Sources
- Tesar 1995 et seq.
- Tesar Smolensky 1993, , 2000
- If you hear A when you expected to hear E,
minimally demote each constraint violated by A
below a constraint violated by E
36Constraint Demotion Learning
- If you hear A when you expected to hear E,
minimally demote each constraint violated by A
below a constraint violated by E
Correctly handles difficult case multiple
violations in E
37Nativism I Learnability
- M F is learnable with /inpossible/?impossible
- not in- except when followed by
- exception that proves the rule M NPA
- M F is not learnable from data if there are no
exceptions (alternations) of this sort, e.g.,
if no affixes and all underlying morphemes have
mp vM and vF, no M vs. F conflict, no evidence
for their ranking - Thus must have M F in the initial state, H0
38Nativism II Experimental Test
- Linking hypothesis
- More harmonic phonological stimuli ? Longer
listening time - More harmonic
- vM ? M, when equal on F
- vF ? F, when equal on M
- When must chose one or the other, more harmonic
to satisfy M M F - M Nasal Place Assimilation (NPA)
- Collaborators
- Peter Jusczyk
- Theresa Allocco
- (Elliott Moreton, Karen Arnold)
394.5 Months (NPA)
404.5 Months (NPA)
414.5 Months (NPA)
424.5 Months (NPA)
43Nativism III UGenome
- Can we combine
- Connectionist realization of harmonic grammar
- OTs characterization of UG
- to examine the biological plausibility of UG as
innate knowledge? - Collaborators
- Melanie Soderstrom
- Donald Mathis
44Nativism III UGenome
- The game take a first shot at a concrete example
of a genetic encoding of UG in a Language
Acquisition Device - Introduce an abstract genome notion parallel to
(and encoding) abstract neural network - Is connectionist empiricism clearly more
biologically plausible than symbolic nativism? - No!
45The Problem
- No concrete examples of such a LAD exist
- Even highly simplified cases pose a hard problem
- How can genes which regulate production of
proteins encode symbolic principles of
grammar? - Test preparation Syllable Theory
46Basic syllabification Function
- /underlying form/ ? surface form
- Plural form of dish
- /d?s/ ? .d?. ? z.
- /CVCC/ ? .CV.C V C.
47Basic syllabification Function
- /underlying form/ ? surface form
- Plural form of dish
- /d?s/ ? .d?. ? z.
- /CVCC/ ? .CV.C V C.
- Basic CV Syllable Structure Theory
- Prince Smolensky 1993 Chapter 6
- Basic No more than one segment per syllable
position .(C)V(C).
48Basic syllabification Function
- /underlying form/ ? surface form
- Plural form of dish
- /d?s/ ? .d?. ? z.
- /CVCC/ ? .CV.C V C.
- Basic CV Syllable Structure Theory
- Correspondence Theory
- McCarthy Prince 1995 (MP)
- /C1V2C3C4/ ? .C1V2.C3 V C4
49Syllabification Constraints (Con)
- PARSE Every element in the input corresponds to
an element in the output no deletion MP
MAX
50Syllabification Constraints (Con)
- PARSE Every element in the input corresponds to
an element in the output - FILLV/C Every output V/C segment corresponds to
an input V/C segment every syllable position in
the output is filled by an input segment no
insertion/epenthesis MP DEP
51Syllabification Constraints (Con)
- PARSE Every element in the input corresponds to
an element in the output - FILLV/C Every output V/C segment corresponds to
an input V/C segment - ONSET No V without a preceding C
52Syllabification Constraints (Con)
- PARSE Every element in the input corresponds to
an element in the output - FILLV/C Every output V/C segment corresponds to
an input V/C segment - ONSET No V without a preceding C
- NOCODA No C without a following V
53SAnet architecture
/C1 C2 /
C1 V C2
54Connection substructure
55PARSE
- All connection coefficients are 2
56ONSET
- All connection coefficients are ? 1
57Activation dynamics
- Boltzmann Machine/Harmony Theory dynamics
(temperature T ? 0)
58Boltzmann-type learning dynamics
Gradient descent in
- Clamped P input output P ? input
- ?si eEHi P ? EHi P ?
- eEHiP ?
- During the processing of training data in phase P
?, whenever unit f (of type F) and unit ? (of
type ?) are simultaneously active, modify si by
?e? . e? e/Np
59Crucial Open Question(Truth in Advertising)
- Relation between strict domination and neural
networks? - Apparently not a problem in the case of the CV
Theory
60To be encoded
- How many different kinds of units are there?
- What information is necessary (from the source
units point of view) to identify the location of
a target unit, and the strength of the connection
with it? - How are constraints initially specified?
- How are they maintained through the learning
process?
61Unit types
- Input units C V
- Output units C V x
- Correspondence units C V
- 7 distinct unit types
- Each represented in a distinct sub-region of the
abstract genome - Help ourselves to implicit machinery to spell
out these sub-regions as distinct cell types,
located in grid as illustrated
62Connectivity geometry
63Constraint PARSE
- Input units grow south and connect
- Output units grow east and connect
- Correspondence units grow north west and
connect with input output units.
64Constraint ONSET
- Short connections grow north-south between
adjacent V output units, - and between the first V node and the first x
node.
65Direction of projection growth
- Topographic organizations widely attested
throughout neural structures - Activity-dependent growth a possible alternative
- Orientation information (axes)
- Chemical gradients during development
- Cell age a possible alternative
66Projection parameters
- Direction
- Extent
- Local
- Non-local
- Target unit type
- Strength of connections encoded separately
67Connectivity Genome
- Contributions from ONSET and PARSE
68ONSET
x0 segment S S VO
N S x0
69Encoding connection strength
- Network-level specification
- For each constraint ?i , need to embody
- Constraint strength si
- Connection coefficients (F ? ? cell
types) - Product of these is contribution of ?i to the
F ? ? connection weight
70Processing
71Development
72Learning
73Learning Behavior
- Simplified system can be solved analytically
- Learning algorithm turns out to
- Dsi(?) e violations of constrainti P?
74Abstract Gene Map
General Developmental Machinery
Connectivity
Constraint Coefficients
C-I
V-I
C-C
direction
extent
target
CORRESPOND
RESPOND
COVx B 1
CCVC B ?2
CC CICO 1
VC VIVO 1
G??
G??
?
?
75Summary
- Described an attempt to integrate
- Connectionist theory of mental processes
- (computational neuroscience, cognitive
psychology) - Symbolic theory of
- Mental functions (philosophy, linguistics)
- Representations
- General structure (philosophy, AI)
- Specific structure (linguistics)
- Informs theory of UG
- Form, content
- Genetic encoding
76Mystery Quote 4
- Smolensky, it would appear, would like a special
dispensation for connectionist cognitive science
to get the goodness out of Classical constituents
without actually admitting that there are any.
77Mystery Quote 5
- The view that the goal of connectionist
research should be to replace other methodologies
may represent a naive form of eliminative
reductionism. The goal should not be to
replace symbolic cognitive science, but rather
to explain the strengths and weaknesses of
existing symbolic theory to explain how symbolic
computation can emerge out of non-symbolic
computation ... conceptual-level research with
new computational concepts and techniques that
reflect an understanding of how conceptual-level
theoretical constructs emerge from subconceptual
computation
78Mystery Quote 5
- The view that the goal of connectionist
research should be to replace other methodologies
may represent a naive form of eliminative
reductionism. The goal should not be to
replace symbolic cognitive science, but rather to
explain the strengths and weaknesses of existing
symbolic theory to explain how symbolic
computation can emerge out of non-symbolic
computation to enrich conceptual-level research
with new computational concepts and techniques
that reflect an understanding of how
conceptual-level theoretical constructs emerge
from subconceptual computation
79Thanks for your attention