Report on Semisupervised Training for Statistical Parsing - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Report on Semisupervised Training for Statistical Parsing

Description:

Idea: two different students learn from each other, incrementally, mutually improving ' ... Composite vs. Monolithic. Large parameter space vs. Small ... LTAG ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 25
Provided by: zhan52
Category:

less

Transcript and Presenter's Notes

Title: Report on Semisupervised Training for Statistical Parsing


1
Report on Semi-supervised Training for
Statistical Parsing
  • Zhang Hao
  • 2002-12-18

2
Brief Introduction
  • Why semi-supervised training?
  • Co-training framework and applications
  • Can parsing fit in this framework?
  • How?
  • Conclusion

3
Why Semi-supervised Training
  • Compromise between su and unsu
  • Pay-offs
  • Minimize the need for labeled data
  • Maximize the value of unlabeled data
  • Easy portability

4
Co-training Scenario
  • Idea two different students learn from each
    other, incrementally, mutually improving
  • ???????difference(motive) mutual
    learning(optimize)-gt agreement(objective).
  • Task to optimize the objective function of
    agreement.
  • Heuristic selection is important what to learn?

5
Blum Mitchell, 98 Co-training Assumptions
  • Classification problem
  • Feature redundancy
  • Allows different views of data
  • Each view is sufficient for classification
  • View independency of features, given class

6
Blum Mitchell, 98 Co-training example
  • Course home page classification (y/n)
  • Two views content text/anchor text(more perfect
    example two sides of a coin)
  • Two naïve Bayes classifiers should agree

7
Blum Mitchell, 98 Co-Training Algorithm
  • Given
  • A set L of labeled training examples
  • A set U of unlabeled examples
  • Create a pool U of examples by choosing u
    examples at random from U.
  • Loop for k iterations
  • Use L to train a classifier h1 that considers
    only the x1 portion of x
  • Use L to train a classifier h2 that considers
    only the x2 portion of x
  • Allow h1 to label p positive and n negative
    examples from U
  • Allow h2 to label p positive and n negative
    examples from U
  • Add these self-labeled examples to L
  • Randomly choose 2p2n examples from U to
    replenish U

The selected examples are those most
confidently labeled ones, i.e. heuristic
selection
np matches the ratio of negtive to positive
examples
8
Family of Algorithms Related to Co-training
Nigam Ghani 2000
9
Parsing As Supertagging and Attaching Sarkar
2001
  • The difference between parsing and other NLP
    applicationsWSD, WBPC, TC, NEI
  • A tree vs. A label
  • Composite vs. Monolithic
  • Large parameter space vs. Small
  • LTAG
  • Each word is tagged with a lexicalized elementary
    tree (supertagging)
  • Parsing is a process of substitution and
    adjoining of elementary trees
  • Supertagger finishes a very large part of job a
    traditional parser must do

10
A glimpse of Suppertags
11
Two Models to Co-training
  • H1 selects trees based on previous context
    (tagging probability model)
  • H2 computes attachment between trees and returns
    best parse(parsing probability model)

12
Sarkar 2000 Co-training Algorithm
  • 1. Input labeled and unlabeled
  • 2. Update cache
  • Randomly select sentences from unlabeled and
    refill cache
  • If cache is empty Exit
  • 3. Train models H1 and H2 using labeled
  • 4. Apply H1 and H2 to cache
  • 5. Pick most probable n from H1 (run through H2)
    and add to labeled
  • 6. Pick most probable n from H2 and add to
    labeled
  • 7. nnk Go to step 2

13
JHU SW2002 tasks
  • Co-train Collins CFG parser with Sarkar LTAG
    parser
  • Co-train Rerankers
  • Co-train CCG supertaggers and parsers

14
Co-training The Algorithm
  • Requires
  • Two learners with different views of the task
  • Cache Manager (CM) to interface with the
    disparate learners
  • A small set of labeled seed data and a larger
    pool of unlabelled data
  • Pseudo-Code
  • Init Train both learners with labeled seed data
  • Loop
  • CM picks unlabelled data to add to cache
  • Both learners label cache
  • CM selects newly labeled data to add to the
    learners' respective training sets
  • Learners re-train

15
Novel Methods-Parse Selection
  • Want to select training examples for one parser
    (student) labeled by the other (teacher) so as to
    minimize noise and maximize training utility.
  • Top-n choose the n examples for which the
    teacher assigned the highest scores.
  • Difference choose the examples for which the
    teacher assigned a higher score than the student
    by some threshold.
  • Intersection choose the examples that received
    high scores from the teacher but low scores from
    the student.
  • Disagreement choose the examples for which the
    two parsers provided different analyses and the
    teacher assigned a higher score than the student.

16
Effect of Parse Selection
17
CFG-LTAG Co-training
18
Re-rankers Co-training
  • What is Re-ranking?
  • A re-ranker reorders the output of an n-best
    (probabilistic) parser based on features of the
    parse
  • While parsers use local features to make
    decisions, re-rankers use features that can span
    the entire tree
  • Instead of co-training parsers, co-train
    different re-rankers

19
Re-rankers Co-training
  • Motivation Why re-rankers?
  • Speed
  • parse data once
  • reordered many times
  • Objective function
  • The lower runtime of re-rankers allows us to
    explicitly maximize agreement between parses

20
Re-rankers Co-training
  • Motivation Why re-rankers?
  • Accuracy
  • Re-rankers can improve performance of existing
    parsers
  • Collins 00 cites a 13 percent reduction of error
    rate by re-ranking
  • Task closer to classification
  • A re-ranker can be seen as a binary classifier
    either a parse is the best for a sentence or it
    isnt
  • This is the original domain cotraining was
    intended for

21
Re-rankers Co-training
  • Experimental. But much to be explored. Remember
    a re-ranker is easier to develop
  • Reranker 1 Log linear model
  • Reranker 2 Linear perceptron model

Room for improvement Current best parser
89.7 Oracle that picks best parse from top 50
95
22
JHU SW2002 Conclusion
  • Largest experimental study to date on the use of
    unlabelled data for improving parser performance.
  • Co-training enhances performance for parsers and
    taggers trained on small (50010,000 sentences)
    amounts of labeled data.
  • Co-training can be used for porting parsers
    trained on one genre to parse on another without
    any new human-labeled data at all, improving on
    state-of-the-art for this task.
  • Even tiny amounts of human-labelled data for the
    target genre enhace porting via co-training.
  • New methods for Parse Selection have been
    developed, and play a crucial role.

23
How to Improve Our Parser?
  • Similar settinglimited labeled data(Penn CTB)
    large amount of unlabeled and somewhat deferent
    domain data(PKU People Daily)
  • To try
  • Re-rankers developing cycle is much shorter,
    worthy of trying. Many ML techniques may be
    utilized.
  • Re-rankers agreement is still an open question

24
Thanks
Write a Comment
User Comments (0)
About PowerShow.com