Automatic Measurement of Syntactic Development in Child Language - PowerPoint PPT Presentation

About This Presentation

Title:

Automatic Measurement of Syntactic Development in Child Language

Description:

Large training corpus: Penn Treebank (Marcus et al., 1993) ... Use an existing parser (trained on the Penn Treebank) Charniak (2000) ... – PowerPoint PPT presentation

Number of Views:138

Avg rating:3.0/5.0

Slides: 36

Provided by: ks94

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Measurement of Syntactic Development in Child Language

1
Automatic Measurement of Syntactic Development
in Child Language

Kenji Sagae
Language Technologies Institute
Student Research Symposium
September 2005
Joint work with
Alon Lavie and Brian MacWhinney

2
Using Natural Language Processing in Child
Language Research

CHILDES Database (MacWhinney, 2000)
Several megabytes of child-parent dialog
transcripts
Part-of-speech and morphology analysis
Tools available
Recently proposed syntactic annotation scheme
(Sagae et al., 2004)
Grammatical Relations (GRs)
POS analysis not enough for many research
questions
Very small amount of annotated data
Parsing
Can we use current NLP tools to analyze CHILDES
GRs?
Allows, for example, automatic measurement of
syntactic development

3
Outline

The CHILDES GR annotation scheme
Automatic GR analysis
Measurement of Syntactic Development

4
CHILDES GR Scheme(Sagae et al., 2004)

Addresses needs of child language researchers
Grammatical Relations (GRs)
Subject, object, adjunct, etc.
Labeled dependencies

Dependency Label
Dependent
Head
5
CHILDES GR Scheme Includes Important GRs for
Child Language Study
6
Automatic Syntactic (GR) Analysis

Input a sentence
Output dependency structure (GRs)
Three steps
Text preprocessing
Unlabeled dependency identification
Dependency labeling

7
STEP 1 Text Preprocessing Prepares Utterances
for Parsing

CHAT transcription system
Explicitly marks certain extra-grammatical
material disfluency, retracing and repetitions
CLAN tools (MacWhinney, 2000)
Remove extra-grammatical material
Provide POS and Morphological analyses
CHAT and CLAN tools are publicly available
http//childes.psy.cmu.edu

8
Step 2 Unlabeled Dependency Identification

Why?
Large training corpus Penn Treebank (Marcus et
al., 1993)
Head-table converts constituents into
dependencies
Use an existing parser (trained on the Penn
Treebank)
Charniak (2000)
Convert output to dependencies
Alternatively, a dependency parser
For example MALT parser (Nivre and Scholz,
2004), Yamada and Matsumoto (2003)

9
Unlabeled Dependency Identification
We eat the cheese sandwich
eat
eat
sandwich
10
Domain Issues

Parser training data is in a very different
domain
WSJ vs Parent-child dialogs
Domain specific training data would be better
But would have to be created (manually)
Performance is acceptable
Shorter, simpler sentences
Unlabeled dependency accuracy
WSJ test data 92
CHILDES data (2,000 words) 90

11
Final Step Dependency Labeling

Training data is required
Labeling dependencies is easier than finding
unlabeled dependencies
Less training data is needed for labeling than
for full labeled dependency parsing
Use a classifier
TiMBL (Daelemans et al., 2004)
Extract features from unlabeled dependency
structure
GR labels are target classes

12
Dependency Labeling
13
Features Used for GR Labeling

Head and dependent words
Also their POS tags
Whether the dependent comes before or after the
head
How far the dependent is from the head
The label of the lowest node in the constituent
tree that includes both the head and dependent

14
Features Used for GR Labeling
Consider the words we and eat Features we,
pro, eat, v, before, 1, S Class SUBJ
15
Good GR Labeling Results with Small Training Set

5,000 words for training
2,000 words for testing
Accuracy of dependency labeling (on perfect
dependencies) 91.4
Overall accuracy (Charniak parser dependency
labeling) 86.9

16
Some GRs Are Easier Than Others

Overall accuracy 86.9
Easily identifiable GRs
DET, POBJ, INF, NEG Precision and recall above
98
Difficult GRs
COMP, XCOMP below 65
Less than 4 of the GRs seen in training and test
sets.

17
Precision and Recall of Specific GRs
18
Index of Productive Syntax (IPSyn)(Scarborough,
1990)

A measure of child language development
Assigns a numerical score for grammatical
complexity
(from 0 to 112 points)
Used in hundreds of studies

19
IPSyn Measures Syntactic Development

IPSyn Designed for investigating differences in
language acquisition
Differences in groups (for example bilingual
children)
Individual differences (for example delayed
language development)
Focus on syntax
Addresses weaknesses of Mean Length of Utterance
(MLU)
MLU surprisingly useful until age 3, then reaches
ceiling (or becomes unreliable)
IPSyn is very time-consuming to compute

20
IPSyn Is More Informative Than MLUin Children
Over Age 3yrs
21
Computing IPSyn (manually)

Corpus of 100 transcribed utterances
Consecutive, no repetitions
Identify 56 specific language structures (IPSyn
Items)
Examples
Presence of auxiliaries or modals
Inverted auxiliary in a wh-question
Conjoined clauses
Fronted or center-embedded subordinate clauses
Count occurrences (zero, one, two or more)
Add counts

22
Automating IPSyn

Existing state of manual computation
Spreadsheets
Search each sentence for language structures
Use part-of-speech tagging to narrow down the
number of sentences for certain structures
For example Verb Noun, Determiner Adjective
Noun
Cant we just use part-of-speech tagging?
Only one other automated implementation of IPSyn
exists, and it uses only words and POS tags

23
Automating IPSyn without Syntactic Analysis

Use patterns of words and parts-of-speech to find
language structures
Computerized Profiling, or CP (Long, Fey and
Channell, 2004)
Works well for many IPSyn items
Det Adjective Noun sequence
But does not work very well for several important
items
Fronted or center-embedded subordinate clauses
Inverted auxiliary in a wh-question
Cuts down manual work significantly (good)
Fully automatic IPSyn scores only somewhat
accurate (not so good)

24
Some IPSyn Items Require Syntactic Analysis for
Reliable Recognition(and some dont)

Determiner Adjective Noun
Auxiliary verb
Adverb modifying adjective or nominal
Subject Verb Object
Sentence with 3 clauses
Conjoined sentences
Wh-question with inverted auxiliary/modal/copula
Relative clauses
Propositional complements
Fronted subordinate clauses
Center-embedded clauses

25
Automating IPSyn with Grammatical Relation
Analyses

Search for language structures using patterns
that involve POS tags and GRs (labeled
dependencies)
Still room for under- and over-generalization,
but patterns are easier to write and more
reliable
Examples
Wh-embedded clauses search for wh-words whose
head (or transitive head) is a dependent in a GR
of types XCSUBJ, XCPRED, XCJCT, XCMOD,
COMP or XCOMP
Relative clauses search for a CMOD where the
dependent is to the right of the head

26
Evaluation Data

Two sets of transcripts with IPSyn scoring from
two different child language research groups
Set A
Scored fully manually
20 transcripts
Ages about 3 yrs.
Set B
Scored with CP first, then manually corrected
25 transcripts
Ages about 8 yrs.
(Two transcripts in each set were held out for
development and debugging)

27
Evaluation Metrics Point Difference

Point difference
The absolute point difference between the scores
provided by our system, and the scores computed
manually
Simple, and shows how close the automatic scores
are to the manual scores
Acceptable range
Smaller for older children

28
Evaluation MetricsPoint-to-Point Accuracy

Point-to-point accuracy
Reflects overall reliability over each scoring
decision made in the computation of IPSyn scores
Scoring decisions presence or absence of
language structures in the transcript
Point-to-Point Acc C(Correct Decisions)
C(Total Decisions)
Commonly used for assessing inter-rater
reliability among human scorers (for IPSyn, about
94).

29
Results

IPSyn scores from
Our GR-based system (GR)
Manual scoring (HUMAN)
Computerized Profiling (CP)

30
GR-based IPSyn Is Quite Accurate
31
Comparing Our GR-IPSyn and CP-IPSyn
32
Error Analysis Four Problematic Items Cause
Half of Error

Four (of 56) IPSyn items account for about half
of all mistakes made by our GR-based system
Propositional complement 16.9
I said you can go now
(b) Copula/Modal/Aux for emphasis or ellipsis
12.3
I thought he ate his cake, but he didnt.
(c) Relative clause 10.6
This is the car I saw.
(d) Bitransitive predicate 5.8
I gave her the book.
(a), (c), (d) Incorrect GR analysis
(b) Imperfect search pattern

33
Conclusion and Future Work

We can annotate transcripts of child language
with Grammatical Relations using current NLP
tools and a small amount of manually annotated
data
The reliability of an automated version of IPSyn
that uses CHILDES GRs is close to that of human
scoring
GR analysis still needs work
More training data
Other parsing techniques
Use of GR-based IPSyn by child language
researchers should reveal additional problem areas

34
References

Charniak, E. 2000. A maximum-entropy-inspired
parser. Proceedings of the First Annual Meeting
of the North American Chapter of the Association
for Computational Linguistics. Seattle, WA.
Daelemans, W., Zavrel, J., van der Sloot, K., and
van den Bosch. 2004. TiMBL Tilburg Memory Based
Learner, version 5.1, Reference Guide. ILK
Research Group Technical Report Series, no.
04-02, 2004.
Long, S. H., Fey, M. E., Channell, R. W. 2004.
Computerized Profiling (version 9.6.0).
Cleveland, OH Case Western Reserve University.
MacWhinney, B. 2000. The CHILDES Project Tools
for Analyzing Talk. Mahwah, NJ Lawrence Erlbaum
Associates.
Marcus, M. P., Santorini, B., Marcinkiewics, M.
A. 1993. Building a large annotated corpus of
English the Penn Treebank. Computational
Linguistics, 19.
Nivre, J., Scholz, M. 2004. Deterministic
parsing of English text. Proceedings of the
International Conference on Computational
Linguistics (pp. 64-70). Geneva, Switzerland.
Sagae, K., MacWhinney, B., Lavie, A. 2004.
Adding syntactic annotations to transcripts of
parent-child dialogs. Proceedings of the Fourth
International Conference on Language Resources
and Evaluation. Lisbon, Portugal.
Scarborough, H. S. 1990. Index of Productive
Syntax. Applied Psycholinguistics, 11, 1-22.

35
Where POS Tagging is not enough

Sentences with same POS sequence may have
different structure
Before , he told the man he was cold.
Before he told the story , he was cold.
Some syntactic structures are difficult to
recognize using only POS tags and words
Search patterns may under- and over-generate
Using syntactic analysis is easier and more
reliable

Write a Comment

User Comments (0)