Semisupervised learning and selftraining - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

Semisupervised learning and selftraining

Description:

... unlabeled data is ... strong and consistent clues to the sense of a target word. ... Sense A: 'life' Sense B: 'manufacturing' Our L(0) U(0) = S L(0) ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 37

Provided by: facultyWa4

Learn more at: http://faculty.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: Semisupervised learning and selftraining

1
Semi-supervised learning and self-training

LING 572
Fei Xia
02/14/06

2
Outline

Overview of Semi-supervised learning (SSL)
Self-training (a.k.a. Bootstrapping)

3
Additional Reference

Xiaojin Zhu (2006) Semi-supervised learning
literature survey.
Olivier Chapelle et al. (2005) Semi-supervised
Learning. The MIT Press.

4
Overview of SSL
5
What is SSL?

Labeled data
Ex POS tagging tagged sentences
Creating labeled data is difficult, expensive,
and/or time-consuming.
Unlabeled data
Ex POS tagging untagged sentences.
Obtaining unlabeled data is easier.
Goal use both labeled and unlabeled data to
improve the performance

Learning
Supervised (labeled data only)
Semi-supervised (both labeled and unlabeled data)
Unsupervised (unlabeled data only)
Problems
Classification
Regression
Clustering
? Focus on semi-supervised classification problem

7
A brief history of SSL

The idea of self-training appeared in the 1960s.
SSL took off in the 1970s.
The interest for SSL increased in the 1990s,
mostly due to applications in NLP.

8
Does SSL work?

Yes, under certain conditions.
Problem itself the knowledge on p(x) carry
information that is useful in the inference of
p(y x).
Algorithm the modeling assumption fits well with
the problem structure.
SSL will be most useful when there are far more
unlabeled data than labeled data.
SSL could degrade the performance when mistakes
reinforce themselves.

9
Illustration(Zhu, 2006)
10
Illustration (cont)
11
Assumptions

Smoothness (continuity) assumption if two points
x1 and x2 in a high-density region are close,
then so should be the corresponding outputs y1
and y2.
Cluster assumption If points are in the same
cluster, they are likely to be of the same class.
Low density separation the decision boundary
should lie in a low density region.
.

12
SSL algorithms

Self-training
Co-training
Generative models
Ex EM with generative mixture models
Low Density Separations
Ex Transductive SVM
Graph-based models

13
Which SSL method should we use?

It depends.
Semi-supervised methods make strong model
assumptions. Choose the ones whose assumptions
fit the problem structure.

14
Semi-supervised and active learning

They address the same issue labeled data are
hard to get.
Semi-supervised choose the unlabeled data to be
added to the labeled data.
Active learning choose the unlabeled data to be
annotated.

15
Self-training
16
Basics of self-training

Probably the earliest SSL idea.
Also called self-teaching or bootstrapping.
Appeared in the 1960s and 1970s.
First well-known NLP paper (Yarowsky, 1995)

17
Self-training algorithm

Let L be the set of labeled data, U be the set of
unlabeled data.
Repeat
Train a classifier h with training data L
Classify data in U with h
Find a subset U of U with the most confident
scores.
L U ? L
U U ? U

18
Case study (Yarowsky, 1995)
19
Setting

Task WSD
Ex plant living / factory
Unsupervised just need a few seed collocations
for each sense.
Learner Decision list

20
Assumption 1 One sense per collocation

Nearby words provide strong and consistent clues
to the sense of a target word.
The effect varies depending on the type of
collocation
It is strongest for immediately adjacent
collocations.
Assumption 1 ? Use collocations in the decision
rules.

21
Assumption 2 One sense per discourse

The sense of a target word is highly consistent
within any given document.
The assumption holds most of the time (99.8 in
their experiment)
Assumption 2 ? filter and augment the addition
of unlabeled data.

22
Step 1 identify all examples of the given word
plant
Our sample set S
23
Step 2 Create initial labeled data using a small
number of seed collocations
Sense A life
Our L(0)
Sense B manufacturing
U(0) S L(0) residual data set.
24
Initial labeled data
25
Step 3a Train a BL classifier
26
Step 3b Apply the classifier to the entire set

Add to L the members in U which are tagged with
prob above a threshold.

or
27
Step 3c filter and augment this addition with
assumption 2
28
Repeat step 3 until converge
29
The final DL classifier
30
The original algorithm
Keep initial labeling unchanged.
31
Options for obtaining the seeds