Semisupervised learning and selftraining - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Semisupervised learning and selftraining

Description:

... unlabeled data is ... strong and consistent clues to the sense of a target word. ... Sense A: 'life' Sense B: 'manufacturing' Our L(0) U(0) = S L(0) ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 37
Provided by: facultyWa4
Category:

less

Transcript and Presenter's Notes

Title: Semisupervised learning and selftraining


1
Semi-supervised learning and self-training
  • LING 572
  • Fei Xia
  • 02/14/06

2
Outline
  • Overview of Semi-supervised learning (SSL)
  • Self-training (a.k.a. Bootstrapping)

3
Additional Reference
  • Xiaojin Zhu (2006) Semi-supervised learning
    literature survey.
  • Olivier Chapelle et al. (2005) Semi-supervised
    Learning. The MIT Press.

4
Overview of SSL
5
What is SSL?
  • Labeled data
  • Ex POS tagging tagged sentences
  • Creating labeled data is difficult, expensive,
    and/or time-consuming.
  • Unlabeled data
  • Ex POS tagging untagged sentences.
  • Obtaining unlabeled data is easier.
  • Goal use both labeled and unlabeled data to
    improve the performance

6
  • Learning
  • Supervised (labeled data only)
  • Semi-supervised (both labeled and unlabeled data)
  • Unsupervised (unlabeled data only)
  • Problems
  • Classification
  • Regression
  • Clustering
  • ? Focus on semi-supervised classification problem

7
A brief history of SSL
  • The idea of self-training appeared in the 1960s.
  • SSL took off in the 1970s.
  • The interest for SSL increased in the 1990s,
    mostly due to applications in NLP.

8
Does SSL work?
  • Yes, under certain conditions.
  • Problem itself the knowledge on p(x) carry
    information that is useful in the inference of
    p(y x).
  • Algorithm the modeling assumption fits well with
    the problem structure.
  • SSL will be most useful when there are far more
    unlabeled data than labeled data.
  • SSL could degrade the performance when mistakes
    reinforce themselves.

9
Illustration(Zhu, 2006)
10
Illustration (cont)
11
Assumptions
  • Smoothness (continuity) assumption if two points
    x1 and x2 in a high-density region are close,
    then so should be the corresponding outputs y1
    and y2.
  • Cluster assumption If points are in the same
    cluster, they are likely to be of the same class.
  • Low density separation the decision boundary
    should lie in a low density region.
  • .

12
SSL algorithms
  • Self-training
  • Co-training
  • Generative models
  • Ex EM with generative mixture models
  • Low Density Separations
  • Ex Transductive SVM
  • Graph-based models

13
Which SSL method should we use?
  • It depends.
  • Semi-supervised methods make strong model
    assumptions. Choose the ones whose assumptions
    fit the problem structure.

14
Semi-supervised and active learning
  • They address the same issue labeled data are
    hard to get.
  • Semi-supervised choose the unlabeled data to be
    added to the labeled data.
  • Active learning choose the unlabeled data to be
    annotated.

15
Self-training
16
Basics of self-training
  • Probably the earliest SSL idea.
  • Also called self-teaching or bootstrapping.
  • Appeared in the 1960s and 1970s.
  • First well-known NLP paper (Yarowsky, 1995)

17
Self-training algorithm
  • Let L be the set of labeled data, U be the set of
    unlabeled data.
  • Repeat
  • Train a classifier h with training data L
  • Classify data in U with h
  • Find a subset U of U with the most confident
    scores.
  • L U ? L
  • U U ? U

18
Case study (Yarowsky, 1995)
19
Setting
  • Task WSD
  • Ex plant living / factory
  • Unsupervised just need a few seed collocations
    for each sense.
  • Learner Decision list

20
Assumption 1 One sense per collocation
  • Nearby words provide strong and consistent clues
    to the sense of a target word.
  • The effect varies depending on the type of
    collocation
  • It is strongest for immediately adjacent
    collocations.
  • Assumption 1 ? Use collocations in the decision
    rules.

21
Assumption 2 One sense per discourse
  • The sense of a target word is highly consistent
    within any given document.
  • The assumption holds most of the time (99.8 in
    their experiment)
  • Assumption 2 ? filter and augment the addition
    of unlabeled data.

22
Step 1 identify all examples of the given word
plant
Our sample set S
23
Step 2 Create initial labeled data using a small
number of seed collocations
Sense A life
Our L(0)
Sense B manufacturing
U(0) S L(0) residual data set.
24
Initial labeled data
25
Step 3a Train a BL classifier
26
Step 3b Apply the classifier to the entire set
  • Add to L the members in U which are tagged with
    prob above a threshold.

or
27
Step 3c filter and augment this addition with
assumption 2
28
Repeat step 3 until converge
29
The final DL classifier
30
The original algorithm
Keep initial labeling unchanged.
31
Options for obtaining the seeds
  • Use words in dictionary definitions
  • Use a single defining collocate for each sense
  • Ex bird and machine for the word crane
  • Label salient corpus collocates
  • Collect frequent collocates
  • Manually check the collocates
  • ? Getting the seeds is not hard.

32
Performance
Baseline 63.9 Supervised learning
96.1 Unsupervised learning 90.6 - 96.5
33
Why does it work?
  • (Steven Abney, 2004) Understanding the Yarowsky
    Algorithm

34
Summary of self-training
  • The algorithm is straightforward and intuitive.
  • It produces outstanding results.
  • Added unlabeled data pollute the original labeled
    data

35
Additional slides
36
Papers on self-training
  • Yarowsky (1995) WSD
  • Riloff et al. (2003) identify subjective nouns
  • Maeireizo et al. (2004) classify dialogues as
    emotional or non-emotional.
Write a Comment
User Comments (0)
About PowerShow.com