Title: Week 6. Optimality Theory and acquisition.
1GRS LX 865Topics in Linguistics
- Week 6. Optimality Theory and acquisition.
2Optimality Theory
- Grammar involves constraints on the
representations (e.g., SS, LF, PF, or perhaps a
combined representation). - The constraints exist in all languages.
- Where languages differ is in how important each
constraint is with respect to each other
constraint.
3Optimality Theory
- In our analysis, one constraint is Parse-T, which
says that tense must be realized in a clause. A
structure without tense (where TP has been
omitted, say) will violate this constraint. - Another constraint is F (Dont have a
functional category). A structure with TP will
violate this constraint.
4Optimality Theory
- Parse-T and F are in conflictit is impossible
to satisfy both at the same time. - When constraints conflict, the choice made (on a
language-particular basis) of which constraint is
considered to be more important (more highly
ranked) determines which constraint is satisfied
and which must be violated.
5Optimality Theory
- So if F gtgt Parse-T, TP will be omitted.
- and if Parse-T gtgt F, TP will be included.
6Optimality Theorybig picture
- Universal Grammar is the constraints that
languages must obey. - Languages differ only in how those constraints
are ranked relative to one another. (So,
parameter ranking) - The kids job is to re-rank constraints until
they match the order which generated the input
that s/he hears.
7French kid data
- This means if a kid uses 3sg or present tense, we
cant tell if they are really using 3sg (they
might be) or if they are not using agreement at
all and just pronouncing the default. - So, we looked at non-present tense forms and
non-3sg forms only to avoid the question of the
defaults.
8The idea
- Kids are subject to conflicting constraints
- Parse-T Include a projection for tense
- Parse-Agr Include a project for agreement
- F Dont complicate your tree with functional
projections - F2 Dont complicate your tree so much as to
have two functional projections.
9The idea
- Sometimes Parse-T beats out F, and then theres
a TP. Or Parse-Agr beats out F, and then theres
an AgrP. Or both Parse-T and Parse-Agr beat out
F2, and so theres both a TP and an AgrP. - But what does sometimes mean?
10Floating constraints
- The innovation in Legendre et al. (2000) that
gets us off the ground is the idea that as kids
re-rank constraints, the position of the
constraint in the hierarchy can get somewhat
fuzzy, such that two positions can
overlap. F Parse-T
11Floating constraints
- F Parse-T
- When the kid evaluates a form in the constraint
system, the position of Parse-T is fixed
somewhere in the rangeand winds up sometimes
outranking, and sometimes outranked by, F.
12Floating constraints
- F Parse-T
- (Under certain assumptions) this predicts that we
would see TP in the structure 50 of the time,
and see structures without TP the other 50 of
the time.
13French kid data
- Looked at 3 French kids from CHILDES
- Broke development into stages based on a modified
MLU-type measure based on how long most of their
utterances were (2 words, more than 2 words) and
how many of the utterances contain verbs. - Looked at tense and agreement in each of the
three stages represented in the data.
14French kid data
- Kids start out using 3sg agreement and present
tense for practically everything (correct or
not). - We took this to be a default
- (No agreement? Pronounce it as 3sg. No tense?
pronounce it as present. Neither? Pronounce it as
an infinitive.).
15French kid data
- This means if a kid uses 3sg or present tense, we
cant tell if they are really using 3sg (they
might be) or if they are not using agreement at
all and just pronouncing the default. - So, we looked at non-present tense forms and
non-3sg forms only to avoid the question of the
defaults.
16French kids data
- We found that tense and agreement develop
differentlyspecifically, in the first stage we
looked at, kids were using tense fine, but then
in the next stage, they got worse as the
agreement improved. - Middle stage looks likecompetition between
Tand Agr for a single node.
17A detail about counting
- We counted non-3sg and non-present verbs.
- In order to see how close kids utterances were
to adults utterances, we need to know how often
adults use non-3sg and non-present, and then see
how close the kids are to matching that level. - So, adults use non-present tense around 31 of
the timeso when a kid uses 31 non-present
tense, we take that to be 100 success - In the last stage we looked at, kids were
basically right at the 100 success level for
both tense and agreement.
18Proportion of non-present and non-3sg verbs
19Proportion of non-finite root forms
20A model to predict the percentages
- Stage 3b (first stage)
- no agreement
- about 1/3 NRFs, 2/3 tensed forms F2 FParse
T ParseA
21A model to predict the percentages
- Stage 4b (second stage)
- non-3sg agreement and non-present tense each
about 15 (about 40 agreeing, 50 tensed) - about 20 NRFs F2 FParseT ParseA
22A model to predict the percentages
- Stage 4c (third stage)
- everything appears to have tense and agreement
(adult-like levels) F2 FParseT ParseA
23Predicted vs. observedtense
24Predicted vs. observedagrt
25Predicted vs. observedNRFs
26Various things (homework)
- Is the OT model just proposed a
structure-building or full competence model? - How does the OT model fit in the overall big
picture with the ATOM model?
27Various things (homework)
- For French, we assumed that NRFs appear when both
TP and AgrP are missing. Yet, Schütze Wexler
1996 claimed the root infinitives appeared with
either TP or AgrP were missing. - Which one is it?
28French v. English
- English TAgr is pronounced like
- /s/ if we have features 3, sg, present
- /ed/ if we have the feature past
- /Ø/ otherwise
- French TAgr is pronounced like
- danser NRF
- a dansé (3sg) past
- je danse 1sg (present)
- jai dansé 1sg past
29?
30What were doing
- The driver who my neighbor who I trust suggested
took me to the airport. - The driver who my neighbor who my boss trusts
suggested took me to the airport. - Overarching hypothesis Sentence difficulty has
to do with holding onto several unsatisfied
dependencies. Longer ones are harder to hold. - Question What measures length?
- Hypothesis New referents.
31How do we see if thats right?
- Center-embedded sentences are the most taxing,
several started dependencies, center-most element
triple-counted. - The driver who my neighbor who I trust
- Thats the most sensitive point, seems to be near
critical point of processability.
32Experimenting
- Does it matter whether we have a known referent
(I, you) or a new referent (my neighbor)? - To know for sure, we try holding everything
constant except the most embedded subject and see
if there are differences (which can then be
attributed to the only thing thats different,
the properties of the most embedded subject).
33Building the items
- The driver who my neighbor who I trust suggested
took me to the airport. - The driver who my neighbor who John trusts
suggested took me to the airport. - The driver who my neighbor who the housekeeper
trusts suggested took me to the airport. - The driver who my neighbor who they trust
suggested took me to the airport.
34Planning the experiment
- Each set of four sentences constitutes a token
set (a.k.a. item) - Each item are four conditions (1/2 pronoun, name,
definite description, 3 pronoun). - Counterbalancing rules
- Each subject will judge no more than one sentence
from each token set. - Each subject will judge all conditions and will
see equal numbers of sentences from each
condition - Every sentence in every token set will be judged
by some subject.
35Trial lists
- We have four conditions, so we need
- Four different scripts (versions of the lists)
- Some number of fourples of token sets.
- E.g., items 1-4, each with conds a-d
- Subj W 1a, 2b, 3c, 4d (script 1)
- Subj X 1b, 2c, 3d, 4a (script 2)
- Subj Y 1c, 2d, 3a, 4b (script 3)
- Subj Z 1d, 2a, 3b, 4c (script 4)
36Our experiment
- We will have 20 items (picked from the ones you
submitted) and 20 fillers. - (Note Thats on the small side for a real
experiment) - Next steps
- Create the lists of test sentences for the four
different scripts. - Spec out and pseudocode our experiment
- Investigate PsyScript
- Run the experiment
- Deal with the data
37Creating the scripts
- Our sentences are made of very predictable
components - The X who/that the Y who/that Z VP1 VP2 VP3
- The only thing that changes across conditions is
Z, while the rest changes across token sets. - We can use Excel to build these from their
pieces, to avoid unnecessary errors.
38Worksheets
- Components
- Subj1
- Rel1
- Subj2
- Rel2
- Subj3a
- Subj3b
- Subj3c
- VP3
- VP2
- VP1
- Answer
- Question
- Fillers
- Question
- Answer
- Regions
- The way Ive set it up, everything needs to be
exactly 8 regions long (even the fillers)
39Worksheets
- Constructed
- Computes item (token group) and condition based
on row number, comes up with a code like I5V2
(fifth token group, version 2). Builds the
sentence region by region based on the condition
number.
- Tables
- Keeps track of what will be on each script.
- Scripts are divided into blocks, and each block
has one of each condition and four fillers,
randomized. - Sort column is 2block plus a random number (to
order the blocks, but randomly within)
40Worksheets
- Script
- The master script sheet
- This generates a script based on the columns you
put into I1 and J1. (The column refer to the
tables sheet, where the item and condition
numbers will be found) - B and C for script 1
- D and E for script 2
- F and G for script 3
- H and I for script 4
- Script a, script d
- Actual scripts.
- Select the part of script sheet that has data
(A1O41) and copy. - Go to script a sheet
- Paste special and choose Value (so we dont copy
formulas, only results). - Delete column B-D (item, cond, row), select rows
2-41, hit sort button, delete column A (sort),
and row 1 (labels) - Save as tab-delimited text.
41The scripts are ready
- So, we have the data that were going to use.
- The next thing is to figure out how were going
to test these. - The goal is to test reading time on each region
of the sentence by presenting the sentence region
by region.
42Thinking through the experiment
- What do we want to have happen?
- Display some instructions
- Do some practice trials
- Display practice is over message
- Do some real trials
- Display thanks!
- The trials
- Show fully obscured sentence, wait for a key
- Reveal next word, wait for a key, until done
- Ask question, wait for response
- Give sound feedback about correctness
43PsyScript
- To do this, well use PsyScript, an environment
for creating psychology experiments on the Mac. - (Its basically the only freely available
software of this type that has promise for
working in the future if PsyScope had not
become commercial as E-Prime, wed be learning
that instead).
44AppleScript
- The underlying machinery behind PsyScript is
something called AppleScript. - This has been part of the Mac OS for about the
past 10 years, although it is gaining power and
popularity recently. - AppleScript is a means by which you can tell
other programs what to do. - For example, tell Internet Explorer to go to a
particular web page, tell Word to create a new
document and type the date, - Until you have an actual need for this, it
doesnt seem very exciting
45AppleScript
- AppleScript is a sophisticated high-level
programming language designed to be human
readable (and kind of human writable). Its
supposed to look a lot like English. - PsyScript itself is an application that can be
bossed around by AppleScript, and has the
features that are useful in psycholinguistic
experiments, such as timing, drawing, input, data
recording functions.
46Getting started
- To write (and use) AppleScript, we use Script
Editor. - Easiest way to do this Find the end experiment
script and double-click on it. - tell application PsyScript
- end experiment
- end tell
47Note about PsyScript
- PsyScript runs faster from the Script Editor
- If you run PsyScript from the Script Editor you
have to manually tell it where your script is. - To do this, find the line that says tell
fileHelper to setContainer and change the thing
in parentheses to what you see when you
Command-click on the name of the script in the
title bar of the Script Editor Window, bottom to
top, each separated by and not including the
actual name of the script. E.g., - setContainer(Station 5Desktop
FolderPsyScript)
48Movingwindow
- I wrote a script called movingwindow to do what
were going to do today. - The stimuli and instructions files are in a
folder called resources in the same folder as
the script is. The names of these files are set
at the top of the script, in mine, they are - Mwstimuli.txt sentence list as exported from
Excel (tab-delimited text, exporting e.g., script
a) - Mwpractice.txt sentence list for the practice
items - Mwinstruc.txt initial instructions
- Mwready.txt post-practice instructions
- Mwthanks.txt end of experiment debriefing.
- Results are stored in results folder.
49Sentence lists
- To generate the sentence lists in the right
format for movingwindow, go to one of the script
a-d pages, do Save As from Excel, and choose
tab-delimited text. - Columns should be code, question, answer,
sentence (in eight columns) - The end results will come out in a file that you
can load back into Excel (a tab-delimited file) - Columns are code, region number, time for
region, correct answer 1/0, text of region
50?