An introduction to selftaught learning

About This Presentation

Title:

An introduction to selftaught learning

Description:

Except the training data (labeled), a large set of test data ... Unlabeled data can be assigned with supervised learning task's class ... Multitask ... – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 31

Provided by: shankar2

Category:

more less

Transcript and Presenter's Notes

Title: An introduction to selftaught learning

1
An introduction to self-taught learning
Raina et. al, 2007 Self-taught Learning
Transfer Learning from Unlabeled Data

Presented by Zenglin Xu
10-09-2007

2
Outline

Related learning paradigms
A self-taught learning algorithm

3
Related learning paradigms

Semi-supervised learning
Transfer learning
Multi-task learning
Domain adaptation
Biased sample selection
Self-taught learning

4
Semi-supervised learning

Except the training data (labeled), a large set
of test data (unlabeled) are available
The training data and test data are drawn from
the same distribution
Unlabeled data can be assigned with supervised
learning tasks class labels
Reference
Chapelle, et. al, 2006 Semi-supervised
learning
Zhu, 2005 Semi-supervised learning literature
survey

5
Transfer learning

Transfer Learning
The Theory of Transfer of Learning was introduced
by Thorndike and Woodworth (1901). They explored
how individuals would transfer learning in one
context to another context that shared similar
characteristics
Transfer of knowledge from one supervised task to
other Requires labeled data from a different but
related task
E.g., transferring the knowledge from Newsgroup
data to Reuters data
Related work in computer science
Thrun Mitchell, 1995 Learning one more thing
Ando Zhang, 2005 A framework for learning
predictive structures from multiple tasks and
unlabeled data

6
Multi-task learning

It learns a problem together with other related
problems at the same time, using a shared
representation.
This often leads to a better model for the main
task, because it allows the learner to use the
commonality among the tasks.
Multi-task learning is a kind of inductive
transfer.
It does this by learning tasks in parallel while
using a shared representation what is learned
for each task can help other tasks be learned
better.
Reference,
Caruana, 1997 Multitask Learning
Ben-David Schuller, 2003 Exploiting task
relatedness for multiple task learning

7
Domain adaptation

A term hot in language processing
Indeed, it can be called transfer learning
The supervised setting is usually like
A large pool of out-of-domain labeled data
A small pool of in-domain labeled data
Reference
Daume III, 2007 Frustratingly Easy Domain
Adaptation
Daume III Marcu , 2006 Domain Adaptation for
Statistical Classifiers
Ben-David et. al, 2006 Analysis of
Representations for Domain Adaptation

8
Biased sample selection

Also called Covariance Shift
It deals with the case that the training data and
test data are selected from different
distributions in the same domain
The objective is to correct the bias
Reference
Shimodaira, 2000 Improving predictive inference
under covariate shift
Zadrozny, 2004 Learning and evaluating
classifiers under sample selection bias
Bickel et. al, 2007 Discriminative learning for
differing training and test distributions

9
Self-taught learning

Self-taught learning
Uses unlabeled data
Does not require unlabeled data to have same
generative distribution
The unlabeled data can have different labels as
those of the supervised learning tasks data.
Reference
Raina et. al Self-taught learning transfer
learning from unlabeled data

10
(No Transcript)
11
Outline

Related learning paradigms
A self-taught learning algorithm
Algorithm
Experiment

12
Sparse coding a self-taught learning algorithm

Learn high level feature representation using
unlabeled data
E. g. random unlabeled images usually contain
basic visual patterns (like edges) that are
similar to images (like that of elephant) which
needs to be classified
Apply the representation to the labeled data and
use it for classification

13
Step 1 learning higher level representations
Given unlabeled data Optimize the
following where
are the basis
vectors and
are the activations
14
Bases learned from image patches and speech data
15
Step 2 apply the representation to the labeled
data and use it for classification
16
High-level features computed
Using a set of 512 learned image bases (Fig 2
left), Figure 3 illustrates a solution to the
previous optimization problem
17
High-level features computed
18
High-level features computed
19
(No Transcript)
20
Connection to PCA
21
Connection to PCA

PCA results in linear feature extraction, in that
the features a(i)j are simply a linear function
of the input.
The bases bj should be orthogonal, thus the
number of PCA features cannot be greater than the
dimension n of the input. Sparse coding does not
have either of these limitations

22
Outline

Related Learning paradigms
A self-taught learning algorithm
Algorithm
Experiment

23
Experiment setting
24
Experiment setting
25
Experimental results on image
26
Experimental results on characters
27
Experimental results on music data
28
Experimental results on text data
29
Compare with results using features trained on
labeled data
Table 7. Accuracy on the self-taught learning
tasks when sparse coding bases are learned on
unlabeled data (third column), or when principal
components/sparse coding bases are learned on the
labeled training set (fourth/fth column).
30
Discussion

Is it useful to learn a high-level feature
representation in a unified process using both
the labeled data and the unlabeled data?
How the similarity between the labeled data and
the unlabeled data affect the performance?
And more?

Write a Comment

User Comments (0)