Automatic Measurement of Syntactic Development in Child Language - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Measurement of Syntactic Development in Child Language

Description:

Large training corpus: Penn Treebank (Marcus et al., 1993) ... Use an existing parser (trained on the Penn Treebank) Charniak (2000) ... – PowerPoint PPT presentation

Number of Views:138
Avg rating:3.0/5.0
Slides: 36
Provided by: ks94
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Automatic Measurement of Syntactic Development in Child Language


1
Automatic Measurement of Syntactic Development
in Child Language
  • Kenji Sagae
  • Language Technologies Institute
  • Student Research Symposium
  • September 2005
  • Joint work with
  • Alon Lavie and Brian MacWhinney

2
Using Natural Language Processing in Child
Language Research
  • CHILDES Database (MacWhinney, 2000)
  • Several megabytes of child-parent dialog
    transcripts
  • Part-of-speech and morphology analysis
  • Tools available
  • Recently proposed syntactic annotation scheme
    (Sagae et al., 2004)
  • Grammatical Relations (GRs)
  • POS analysis not enough for many research
    questions
  • Very small amount of annotated data
  • Parsing
  • Can we use current NLP tools to analyze CHILDES
    GRs?
  • Allows, for example, automatic measurement of
    syntactic development

3
Outline
  • The CHILDES GR annotation scheme
  • Automatic GR analysis
  • Measurement of Syntactic Development

4
CHILDES GR Scheme(Sagae et al., 2004)
  • Addresses needs of child language researchers
  • Grammatical Relations (GRs)
  • Subject, object, adjunct, etc.
  • Labeled dependencies

Dependency Label
Dependent
Head
5
CHILDES GR Scheme Includes Important GRs for
Child Language Study
6
Automatic Syntactic (GR) Analysis
  • Input a sentence
  • Output dependency structure (GRs)
  • Three steps
  • Text preprocessing
  • Unlabeled dependency identification
  • Dependency labeling

7
STEP 1 Text Preprocessing Prepares Utterances
for Parsing
  • CHAT transcription system
  • Explicitly marks certain extra-grammatical
    material disfluency, retracing and repetitions
  • CLAN tools (MacWhinney, 2000)
  • Remove extra-grammatical material
  • Provide POS and Morphological analyses
  • CHAT and CLAN tools are publicly available
  • http//childes.psy.cmu.edu

8
Step 2 Unlabeled Dependency Identification
  • Why?
  • Large training corpus Penn Treebank (Marcus et
    al., 1993)
  • Head-table converts constituents into
    dependencies
  • Use an existing parser (trained on the Penn
    Treebank)
  • Charniak (2000)
  • Convert output to dependencies
  • Alternatively, a dependency parser
  • For example MALT parser (Nivre and Scholz,
    2004), Yamada and Matsumoto (2003)

9
Unlabeled Dependency Identification
We eat the cheese sandwich
eat
eat
sandwich
10
Domain Issues
  • Parser training data is in a very different
    domain
  • WSJ vs Parent-child dialogs
  • Domain specific training data would be better
  • But would have to be created (manually)
  • Performance is acceptable
  • Shorter, simpler sentences
  • Unlabeled dependency accuracy
  • WSJ test data 92
  • CHILDES data (2,000 words) 90

11
Final Step Dependency Labeling
  • Training data is required
  • Labeling dependencies is easier than finding
    unlabeled dependencies
  • Less training data is needed for labeling than
    for full labeled dependency parsing
  • Use a classifier
  • TiMBL (Daelemans et al., 2004)
  • Extract features from unlabeled dependency
    structure
  • GR labels are target classes

12
Dependency Labeling
13
Features Used for GR Labeling
  • Head and dependent words
  • Also their POS tags
  • Whether the dependent comes before or after the
    head
  • How far the dependent is from the head
  • The label of the lowest node in the constituent
    tree that includes both the head and dependent

14
Features Used for GR Labeling
Consider the words we and eat Features we,
pro, eat, v, before, 1, S Class SUBJ
15
Good GR Labeling Results with Small Training Set
  • 5,000 words for training
  • 2,000 words for testing
  • Accuracy of dependency labeling (on perfect
    dependencies) 91.4
  • Overall accuracy (Charniak parser dependency
    labeling) 86.9

16
Some GRs Are Easier Than Others
  • Overall accuracy 86.9
  • Easily identifiable GRs
  • DET, POBJ, INF, NEG Precision and recall above
    98
  • Difficult GRs
  • COMP, XCOMP below 65
  • Less than 4 of the GRs seen in training and test
    sets.

17
Precision and Recall of Specific GRs
18
Index of Productive Syntax (IPSyn)(Scarborough,
1990)
  • A measure of child language development
  • Assigns a numerical score for grammatical
    complexity
  • (from 0 to 112 points)
  • Used in hundreds of studies

19
IPSyn Measures Syntactic Development
  • IPSyn Designed for investigating differences in
    language acquisition
  • Differences in groups (for example bilingual
    children)
  • Individual differences (for example delayed
    language development)
  • Focus on syntax
  • Addresses weaknesses of Mean Length of Utterance
    (MLU)
  • MLU surprisingly useful until age 3, then reaches
    ceiling (or becomes unreliable)
  • IPSyn is very time-consuming to compute

20
IPSyn Is More Informative Than MLUin Children
Over Age 3yrs
21
Computing IPSyn (manually)
  • Corpus of 100 transcribed utterances
  • Consecutive, no repetitions
  • Identify 56 specific language structures (IPSyn
    Items)
  • Examples
  • Presence of auxiliaries or modals
  • Inverted auxiliary in a wh-question
  • Conjoined clauses
  • Fronted or center-embedded subordinate clauses
  • Count occurrences (zero, one, two or more)
  • Add counts

22
Automating IPSyn
  • Existing state of manual computation
  • Spreadsheets
  • Search each sentence for language structures
  • Use part-of-speech tagging to narrow down the
    number of sentences for certain structures
  • For example Verb Noun, Determiner Adjective
    Noun
  • Cant we just use part-of-speech tagging?
  • Only one other automated implementation of IPSyn
    exists, and it uses only words and POS tags

23
Automating IPSyn without Syntactic Analysis
  • Use patterns of words and parts-of-speech to find
    language structures
  • Computerized Profiling, or CP (Long, Fey and
    Channell, 2004)
  • Works well for many IPSyn items
  • Det Adjective Noun sequence
  • But does not work very well for several important
    items
  • Fronted or center-embedded subordinate clauses
  • Inverted auxiliary in a wh-question
  • Cuts down manual work significantly (good)
  • Fully automatic IPSyn scores only somewhat
    accurate (not so good)

24
Some IPSyn Items Require Syntactic Analysis for
Reliable Recognition(and some dont)
  • Determiner Adjective Noun
  • Auxiliary verb
  • Adverb modifying adjective or nominal
  • Subject Verb Object
  • Sentence with 3 clauses
  • Conjoined sentences
  • Wh-question with inverted auxiliary/modal/copula
  • Relative clauses
  • Propositional complements
  • Fronted subordinate clauses
  • Center-embedded clauses

25
Automating IPSyn with Grammatical Relation
Analyses
  • Search for language structures using patterns
    that involve POS tags and GRs (labeled
    dependencies)
  • Still room for under- and over-generalization,
    but patterns are easier to write and more
    reliable
  • Examples
  • Wh-embedded clauses search for wh-words whose
    head (or transitive head) is a dependent in a GR
    of types XCSUBJ, XCPRED, XCJCT, XCMOD,
    COMP or XCOMP
  • Relative clauses search for a CMOD where the
    dependent is to the right of the head

26
Evaluation Data
  • Two sets of transcripts with IPSyn scoring from
    two different child language research groups
  • Set A
  • Scored fully manually
  • 20 transcripts
  • Ages about 3 yrs.
  • Set B
  • Scored with CP first, then manually corrected
  • 25 transcripts
  • Ages about 8 yrs.
  • (Two transcripts in each set were held out for
    development and debugging)

27
Evaluation Metrics Point Difference
  • Point difference
  • The absolute point difference between the scores
    provided by our system, and the scores computed
    manually
  • Simple, and shows how close the automatic scores
    are to the manual scores
  • Acceptable range
  • Smaller for older children

28
Evaluation MetricsPoint-to-Point Accuracy
  • Point-to-point accuracy
  • Reflects overall reliability over each scoring
    decision made in the computation of IPSyn scores
  • Scoring decisions presence or absence of
    language structures in the transcript
  • Point-to-Point Acc C(Correct Decisions)
  • C(Total Decisions)
  • Commonly used for assessing inter-rater
    reliability among human scorers (for IPSyn, about
    94).

29
Results
  • IPSyn scores from
  • Our GR-based system (GR)
  • Manual scoring (HUMAN)
  • Computerized Profiling (CP)

30
GR-based IPSyn Is Quite Accurate
31
Comparing Our GR-IPSyn and CP-IPSyn
32
Error Analysis Four Problematic Items Cause
Half of Error
  • Four (of 56) IPSyn items account for about half
    of all mistakes made by our GR-based system
  • Propositional complement 16.9
  • I said you can go now
  • (b) Copula/Modal/Aux for emphasis or ellipsis
    12.3
  • I thought he ate his cake, but he didnt.
  • (c) Relative clause 10.6
  • This is the car I saw.
  • (d) Bitransitive predicate 5.8
  • I gave her the book.
  • (a), (c), (d) Incorrect GR analysis
  • (b) Imperfect search pattern

33
Conclusion and Future Work
  • We can annotate transcripts of child language
    with Grammatical Relations using current NLP
    tools and a small amount of manually annotated
    data
  • The reliability of an automated version of IPSyn
    that uses CHILDES GRs is close to that of human
    scoring
  • GR analysis still needs work
  • More training data
  • Other parsing techniques
  • Use of GR-based IPSyn by child language
    researchers should reveal additional problem areas

34
References
  • Charniak, E. 2000. A maximum-entropy-inspired
    parser. Proceedings of the First Annual Meeting
    of the North American Chapter of the Association
    for Computational Linguistics. Seattle, WA.
  • Daelemans, W., Zavrel, J., van der Sloot, K., and
    van den Bosch. 2004. TiMBL Tilburg Memory Based
    Learner, version 5.1, Reference Guide. ILK
    Research Group Technical Report Series, no.
    04-02, 2004.
  • Long, S. H., Fey, M. E., Channell, R. W. 2004.
    Computerized Profiling (version 9.6.0).
    Cleveland, OH Case Western Reserve University.
  • MacWhinney, B. 2000. The CHILDES Project Tools
    for Analyzing Talk. Mahwah, NJ Lawrence Erlbaum
    Associates.
  • Marcus, M. P., Santorini, B., Marcinkiewics, M.
    A. 1993. Building a large annotated corpus of
    English the Penn Treebank. Computational
    Linguistics, 19.
  • Nivre, J., Scholz, M. 2004. Deterministic
    parsing of English text. Proceedings of the
    International Conference on Computational
    Linguistics (pp. 64-70). Geneva, Switzerland.
  • Sagae, K., MacWhinney, B., Lavie, A. 2004.
    Adding syntactic annotations to transcripts of
    parent-child dialogs. Proceedings of the Fourth
    International Conference on Language Resources
    and Evaluation. Lisbon, Portugal.
  • Scarborough, H. S. 1990. Index of Productive
    Syntax. Applied Psycholinguistics, 11, 1-22.

35
Where POS Tagging is not enough
  • Sentences with same POS sequence may have
    different structure
  • Before , he told the man he was cold.
  • Before he told the story , he was cold.
  • Some syntactic structures are difficult to
    recognize using only POS tags and words
  • Search patterns may under- and over-generate
  • Using syntactic analysis is easier and more
    reliable
Write a Comment
User Comments (0)
About PowerShow.com