Title: Automatic Measurement of Syntactic Development in Child Language
1Automatic Measurement of Syntactic Development
in Child Language
- Kenji Sagae
- Language Technologies Institute
- Student Research Symposium
- September 2005
- Joint work with
- Alon Lavie and Brian MacWhinney
2Using Natural Language Processing in Child
Language Research
- CHILDES Database (MacWhinney, 2000)
- Several megabytes of child-parent dialog
transcripts - Part-of-speech and morphology analysis
- Tools available
- Recently proposed syntactic annotation scheme
(Sagae et al., 2004) - Grammatical Relations (GRs)
- POS analysis not enough for many research
questions - Very small amount of annotated data
- Parsing
- Can we use current NLP tools to analyze CHILDES
GRs? - Allows, for example, automatic measurement of
syntactic development
3Outline
- The CHILDES GR annotation scheme
- Automatic GR analysis
- Measurement of Syntactic Development
4CHILDES GR Scheme(Sagae et al., 2004)
- Addresses needs of child language researchers
- Grammatical Relations (GRs)
- Subject, object, adjunct, etc.
- Labeled dependencies
Dependency Label
Dependent
Head
5CHILDES GR Scheme Includes Important GRs for
Child Language Study
6Automatic Syntactic (GR) Analysis
- Input a sentence
- Output dependency structure (GRs)
- Three steps
- Text preprocessing
- Unlabeled dependency identification
- Dependency labeling
7STEP 1 Text Preprocessing Prepares Utterances
for Parsing
- CHAT transcription system
- Explicitly marks certain extra-grammatical
material disfluency, retracing and repetitions - CLAN tools (MacWhinney, 2000)
- Remove extra-grammatical material
- Provide POS and Morphological analyses
- CHAT and CLAN tools are publicly available
- http//childes.psy.cmu.edu
8Step 2 Unlabeled Dependency Identification
- Why?
- Large training corpus Penn Treebank (Marcus et
al., 1993) - Head-table converts constituents into
dependencies - Use an existing parser (trained on the Penn
Treebank) - Charniak (2000)
- Convert output to dependencies
- Alternatively, a dependency parser
- For example MALT parser (Nivre and Scholz,
2004), Yamada and Matsumoto (2003)
9Unlabeled Dependency Identification
We eat the cheese sandwich
eat
eat
sandwich
10Domain Issues
- Parser training data is in a very different
domain - WSJ vs Parent-child dialogs
- Domain specific training data would be better
- But would have to be created (manually)
- Performance is acceptable
- Shorter, simpler sentences
- Unlabeled dependency accuracy
- WSJ test data 92
- CHILDES data (2,000 words) 90
11Final Step Dependency Labeling
- Training data is required
- Labeling dependencies is easier than finding
unlabeled dependencies - Less training data is needed for labeling than
for full labeled dependency parsing - Use a classifier
- TiMBL (Daelemans et al., 2004)
- Extract features from unlabeled dependency
structure - GR labels are target classes
12Dependency Labeling
13Features Used for GR Labeling
- Head and dependent words
- Also their POS tags
- Whether the dependent comes before or after the
head - How far the dependent is from the head
- The label of the lowest node in the constituent
tree that includes both the head and dependent
14Features Used for GR Labeling
Consider the words we and eat Features we,
pro, eat, v, before, 1, S Class SUBJ
15Good GR Labeling Results with Small Training Set
- 5,000 words for training
- 2,000 words for testing
- Accuracy of dependency labeling (on perfect
dependencies) 91.4 - Overall accuracy (Charniak parser dependency
labeling) 86.9
16Some GRs Are Easier Than Others
- Overall accuracy 86.9
- Easily identifiable GRs
- DET, POBJ, INF, NEG Precision and recall above
98 - Difficult GRs
- COMP, XCOMP below 65
- Less than 4 of the GRs seen in training and test
sets.
17Precision and Recall of Specific GRs
18Index of Productive Syntax (IPSyn)(Scarborough,
1990)
- A measure of child language development
- Assigns a numerical score for grammatical
complexity - (from 0 to 112 points)
-
- Used in hundreds of studies
19IPSyn Measures Syntactic Development
- IPSyn Designed for investigating differences in
language acquisition - Differences in groups (for example bilingual
children) - Individual differences (for example delayed
language development) - Focus on syntax
- Addresses weaknesses of Mean Length of Utterance
(MLU) - MLU surprisingly useful until age 3, then reaches
ceiling (or becomes unreliable) - IPSyn is very time-consuming to compute
20IPSyn Is More Informative Than MLUin Children
Over Age 3yrs
21Computing IPSyn (manually)
- Corpus of 100 transcribed utterances
- Consecutive, no repetitions
- Identify 56 specific language structures (IPSyn
Items) - Examples
- Presence of auxiliaries or modals
- Inverted auxiliary in a wh-question
- Conjoined clauses
- Fronted or center-embedded subordinate clauses
- Count occurrences (zero, one, two or more)
- Add counts
22Automating IPSyn
- Existing state of manual computation
- Spreadsheets
- Search each sentence for language structures
- Use part-of-speech tagging to narrow down the
number of sentences for certain structures - For example Verb Noun, Determiner Adjective
Noun - Cant we just use part-of-speech tagging?
- Only one other automated implementation of IPSyn
exists, and it uses only words and POS tags
23Automating IPSyn without Syntactic Analysis
- Use patterns of words and parts-of-speech to find
language structures - Computerized Profiling, or CP (Long, Fey and
Channell, 2004) - Works well for many IPSyn items
- Det Adjective Noun sequence
- But does not work very well for several important
items - Fronted or center-embedded subordinate clauses
- Inverted auxiliary in a wh-question
- Cuts down manual work significantly (good)
- Fully automatic IPSyn scores only somewhat
accurate (not so good)
24Some IPSyn Items Require Syntactic Analysis for
Reliable Recognition(and some dont)
- Determiner Adjective Noun
- Auxiliary verb
- Adverb modifying adjective or nominal
- Subject Verb Object
- Sentence with 3 clauses
- Conjoined sentences
- Wh-question with inverted auxiliary/modal/copula
- Relative clauses
- Propositional complements
- Fronted subordinate clauses
- Center-embedded clauses
25Automating IPSyn with Grammatical Relation
Analyses
- Search for language structures using patterns
that involve POS tags and GRs (labeled
dependencies) - Still room for under- and over-generalization,
but patterns are easier to write and more
reliable - Examples
- Wh-embedded clauses search for wh-words whose
head (or transitive head) is a dependent in a GR
of types XCSUBJ, XCPRED, XCJCT, XCMOD,
COMP or XCOMP - Relative clauses search for a CMOD where the
dependent is to the right of the head
26Evaluation Data
- Two sets of transcripts with IPSyn scoring from
two different child language research groups - Set A
- Scored fully manually
- 20 transcripts
- Ages about 3 yrs.
- Set B
- Scored with CP first, then manually corrected
- 25 transcripts
- Ages about 8 yrs.
- (Two transcripts in each set were held out for
development and debugging)
27Evaluation Metrics Point Difference
- Point difference
- The absolute point difference between the scores
provided by our system, and the scores computed
manually - Simple, and shows how close the automatic scores
are to the manual scores - Acceptable range
- Smaller for older children
28Evaluation MetricsPoint-to-Point Accuracy
- Point-to-point accuracy
- Reflects overall reliability over each scoring
decision made in the computation of IPSyn scores - Scoring decisions presence or absence of
language structures in the transcript - Point-to-Point Acc C(Correct Decisions)
- C(Total Decisions)
- Commonly used for assessing inter-rater
reliability among human scorers (for IPSyn, about
94).
29Results
- IPSyn scores from
- Our GR-based system (GR)
- Manual scoring (HUMAN)
- Computerized Profiling (CP)
30GR-based IPSyn Is Quite Accurate
31Comparing Our GR-IPSyn and CP-IPSyn
32Error Analysis Four Problematic Items Cause
Half of Error
- Four (of 56) IPSyn items account for about half
of all mistakes made by our GR-based system - Propositional complement 16.9
- I said you can go now
- (b) Copula/Modal/Aux for emphasis or ellipsis
12.3 - I thought he ate his cake, but he didnt.
- (c) Relative clause 10.6
- This is the car I saw.
- (d) Bitransitive predicate 5.8
- I gave her the book.
- (a), (c), (d) Incorrect GR analysis
- (b) Imperfect search pattern
33Conclusion and Future Work
- We can annotate transcripts of child language
with Grammatical Relations using current NLP
tools and a small amount of manually annotated
data - The reliability of an automated version of IPSyn
that uses CHILDES GRs is close to that of human
scoring - GR analysis still needs work
- More training data
- Other parsing techniques
- Use of GR-based IPSyn by child language
researchers should reveal additional problem areas
34References
- Charniak, E. 2000. A maximum-entropy-inspired
parser. Proceedings of the First Annual Meeting
of the North American Chapter of the Association
for Computational Linguistics. Seattle, WA. - Daelemans, W., Zavrel, J., van der Sloot, K., and
van den Bosch. 2004. TiMBL Tilburg Memory Based
Learner, version 5.1, Reference Guide. ILK
Research Group Technical Report Series, no.
04-02, 2004. - Long, S. H., Fey, M. E., Channell, R. W. 2004.
Computerized Profiling (version 9.6.0).
Cleveland, OH Case Western Reserve University. - MacWhinney, B. 2000. The CHILDES Project Tools
for Analyzing Talk. Mahwah, NJ Lawrence Erlbaum
Associates. - Marcus, M. P., Santorini, B., Marcinkiewics, M.
A. 1993. Building a large annotated corpus of
English the Penn Treebank. Computational
Linguistics, 19. - Nivre, J., Scholz, M. 2004. Deterministic
parsing of English text. Proceedings of the
International Conference on Computational
Linguistics (pp. 64-70). Geneva, Switzerland. - Sagae, K., MacWhinney, B., Lavie, A. 2004.
Adding syntactic annotations to transcripts of
parent-child dialogs. Proceedings of the Fourth
International Conference on Language Resources
and Evaluation. Lisbon, Portugal. - Scarborough, H. S. 1990. Index of Productive
Syntax. Applied Psycholinguistics, 11, 1-22.
35Where POS Tagging is not enough
- Sentences with same POS sequence may have
different structure - Before , he told the man he was cold.
- Before he told the story , he was cold.
- Some syntactic structures are difficult to
recognize using only POS tags and words - Search patterns may under- and over-generate
- Using syntactic analysis is easier and more
reliable