Title: Given an annotated corpus
1Given an annotated corpus
Using annotated corpora to study syntactic
variation and change
Ann Taylor University of York (UK)
2Outline
- Introduction to our series of syntactically
annotated corpora of earlier stages of English - Illustration of the kind of research that can be
done with these corpora that couldnt be done
without them
3Syntactically annotated corpora of earlier stages
of English
- The York-Toronto-Helsinki Parsed Corpus of Old
English Prose (Taylor et al, 2003) - The Penn-Helsinki Parsed Corpus of Middle
English II (Kroch and Taylor, 2000) - The Penn-Helsinki Parsed Corpus of Early Modern
English (Kroch et al, 2005) - The Parsed Corpus of Early English
Correspondence (Taylor et al, 2006) - The Penn Parsed Corpus of Modern British English
(Kroch et al, in progress)
4Corpus Period Word Count
YCOE c.800-1100 1,452,086
PPCME2 1125-1500 1,155,965
PPCEME 1500-1710 1,657,058
PCEEC 1410-1700 2,162,134
Total 6,427,243
PPCMBE 1710-1914 3,000,000
5Audience
- The corpora are intended primarily to support
quantitative work in language variation and
change - Goals
- Easy to access structures not just lexis or part
of speech - Large enough to generate valid statistics
- Sufficient coverage to be able to trace changes
over time
6The annotation system
- A modified Penn Treebank scheme
- Cosmetic changes
- Nodes are given labels more familiar to
generative linguists - Major changes
- No VP
- Function is marked on a wider range of sentential
and NP nodes, but not on PPs
7( (IP-MAT (CONJ and) (NP-SBJ (PRO I))
(BEP am) (ADJP (ADJ sure)
(CP-THT (C 0)
(IP-SUB (NP-SBJ (PRO I))
(MD shall)
(VB desyre)
(NP-OB1 (PRO it))))) (PP (PN because)
(CP-ADV (C 0)
(IP-SUB (NP-SBJ (PRO you))
(BEP are)
(ADVP-LOC (ADV there))))) (. .)) (ID
OSBORNE,5.002.40))
8Old English
( (IP-MAT (CONJ ac) (NP-NOM (PRON
he)) (VBD bediglode) (ADVP
(ADV swa) (ADV teah)) (NP-ACC (PRO
his) (NA dada)) (NP-DAT (DD tam)
(ND casere) (NP-DAT-PRN (NRD
Dioclitiane)) (CP-REL
(WNP-NOM-1 (DN se)) (C
0) (IP-SUB (NP-NOM
T-1) (BEDI
was)
(NP-NOM-PRD (NP-GEN (NG deofles))
(NN
biggencga))))) (. .)) (ID
coaelive,ALS_Sebastian8.1215))
9Correspondence Corpus
( (METADATA (AUTHOR BRIAN_DUPPAMALEFRIEND15896
1) (RECIPIENT JUSTINIAN_ISHAMMALEFRI
END161139) (LETTER
DUPPA_001E31650AUTOGRAPHFRIEND)) (IP-IMP
(IP-MAT-PRN (NP-SBJ (PRO I))
(VBP pray)) (VBI putt)
(NP-OB1 (PRO it)) (PP (P upon)
(NP (PRO your) (N score))) (. ,))
(ID DUPPA,4.001.13))
10Searching the corpora with CorpusSearch
- Searches structures using dominance and
precedence relations - Generates statistics
- Can search its own output
11Variation in verb-object order in Old and Middle
English
12 Verb-object order in Old English Ac he sceal
pa sacfullan gesibbian But he must the
quarrelsome reconcile But he must reconcile the
quarrelsome ... (colwstan1,ALet_2_Wulfstan_1
188.256) Se wolde gelytlian pone lyfigendan
hælend He would diminish the living
lord He would diminish the living lord
... (colwstan1,ALet_2_Wulfstan_155.98)
13 Verb-object order in Middle English ear
he hefde his ranceun fulleliche ipaizet before
he had his ransom fully paid Before
he had fully paid his ransom ... (CMANCRIW,II.10
1.1228) zef pu wult habben bricht sichde wid
pine heorte echnen if you will have
bright sight with your hearts eyes If you
will have bright sight with your hearts eyes
... (CMANCRIW,II.73.839)
14- The question what factors affect object position
in OE and ME? - The data 10,000 tokens containing a medial
auxiliary, a non-finite verb and an object
15Factors affecting object position
- Date of text
- Length of object
- Type of object
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20Type of object
- Quantified
- Negative
- Positive (non-negative, non-quantified)
21 Quantified objects (Middle English) zef ze
habbed ani god don if you have any good
done ... if you have done any good
... (CMANCRIW,I.76.310) fordon pe he scal
azein zeuen awiht for he shall again
give something ... for he shall again give
something. (CMLAMBX1,31.396)
22 Negative objects (Middle English) pt he ne
mai nan ping don us buten godes leaue that
he neg can no thing do us without Gods
leave ... that he can do nothing to us without
Gods leave. (CMANCRIW,II.169.2346) swa pet
ho ne scal of pere wunde habbe nan oder
uuel so that she neg shall from her wound
have no other evil ... so that she shall
have no other evil from her wound. (CMLAMB1,83.1
95)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26Syntactic variation in PDE
- Heavy-NP shift (Wasow Arnold)
- Dative alternation (Wasow, Bresnan)
- Particle shift (Gries)
- Saxon vs. of-genitive (Szmrecsanyi)
- Complementizer omission in complement and
relative clauses (Jaeger Wasow) - Topicalization, etc. (Cresswell)
27Conclusions
- The study of syntactic variation is an up and
coming topic in linguistics - It cant be studied using the usual methods
(introspection, intuition) but requires naturally
occurring data - Text corpora are only so useful for this
- To study syntactic variation efficiently, you
really need annotated data, and the more the
better