Transfer Learning with Applications to Text Classification

About This Presentation

Title:

Transfer Learning with Applications to Text Classification

Description:

Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department – PowerPoint PPT presentation

Number of Views:132

Avg rating:3.0/5.0

Slides: 68

Provided by: bon156

Category:

more less

Transcript and Presenter's Notes

Title: Transfer Learning with Applications to Text Classification

1
Transfer Learning with Applications to Text
Classification

Jing Peng
Computer Science Department

Machine learning
study of algorithms that
improve performance P
on some task T
using experience E
Well defined learning task ltP,T,Egt

3
Learning to recognize targets in images
4
Learning to classify text documents
5
Learning to build forecasting models
6
Growth of Machine Learning

Machine learning is preferred approach to
Speech processing
Computer vision
Medical diagnosis
Robot control
News articles processing
This machine learning niche is growing
Improved machine learning algorithms
Lots of data available
Software too complex to code by hand

7
Learning

Given
Least squares methods
Learning focuses on minimizing

approximation error
H
8
Transfer Learning with Applications to Text
Classification

Main Challenge
Transfer learning
High Dimensional (4000 features)
Overlapping (lt80 features are the same)
Solution with performance bounds

9
Standard Supervised Learning
training (labeled)?
test (unlabeled)?
Classifier
85.5
New York Times
New York Times
10
In Reality
training (labeled)?
test (unlabeled)?
Classifier
64.1
Labeled data not available!
Reuters
New York Times
New York Times
11
Domain Difference ? Performance Drop
train
test
ideal setting
Classifier
NYT
NYT
85.5
New York Times
New York Times
realistic setting
Classifier
NYT
Reuters
64.1
Reuters
New York Times
12
High Dimensional Data Transfer

High Dimensional Data
Text Categorization
Image Classification

The number of features in our experiments is more
than 4000

Challenges
High dimensionality.
more than training examples
Euclidean distance becomes meaningless

13
Why Dimension Reduction?
DMAX
DMIN
14
Curse of Dimensionality
Dimensions
15
Curse of Dimensionality
Dimensions
16
High Dimensional Data Transfer

High Dimensional Data
Text Categorization
Image Classification

The number of features in our experiments is more
than 4000

Challenges
High dimensionality.
more than training examples
Euclidean distance becomes meaningless
Feature sets completely overlapping?
No. Some less than 80 features are the same.
Marginally not so related?
Harder to find transferable structures
Proper similarity definition.

17
PAC (Probably Approximately Correct) learning
requirement

Training and test distributions must be the same

18
Transfer between high dimensional overlapping
distributions

Overlapping Distributions

Data from two domains may not come from the same
part of space potentially overlap at best.
19
Transfer between high dimensional overlapping
distributions

Overlapping Distribution

Data from two domains may not come from the same
part of space potentially overlap at best.
x y z label
A ? 1 0.2 1
B 0.09 ? 0.1 1
C 0.01 ? 0.3 -1
20
Transfer between high dimensional overlapping
distributions

Overlapping Distribution

Data from two domains may not come from the same
part of space potentially overlap at best.
x y z label
A ? 1 0.2 1
B 0.09 ? 0.1 1
C 0.01 ? 0.3 -1
21
Transfer between high dimensional overlapping
distributions

Overlapping Distribution

Data from two domains may not be lying on exactly
the same space, but at most an overlapping one.
x y z label
A ? 1 0.2 1
B 0.09 ? 0.1 1
C 0.01 ? 0.3 -1
22
Transfer between high dimensional overlapping
distributions

Overlapping Distribution

Data from two domains may not be lying on exactly
the same space, but at most an overlapping one.
x y z label
A ? 1 0.2 1
B 0.09 ? 0.1 1
C 0.01 ? 0.3 -1
23
Transfer between high dimensional overlapping
distributions

Problems with overlapping distributions
Overlapping features alone may not provide
sufficient predictive power

24
Transfer between high dimensional overlapping
distributions

Problems with overlapping distributions
Overlapping features alone may not provide
sufficient predictive power

f1 f2 f3 label
A ? 1 0.2 1
B 0.09 ? 0.1 1
C 0.01 ? 0.3 -1
25
Transfer between high dimensional overlapping
distributions

Problems with overlapping distributions
Overlapping features alone may not provide
sufficient predictive power

f1 f2 f3 label
A ? 1 0.2 1
B 0.09 ? 0.1 1
C 0.01 ? 0.3 -1
26
Transfer between high dimensional overlapping
distributions

Problems with overlapping distributions
Overlapping features alone may not provide
sufficient predictive power

Hard to predict correctly
f1 f2 f3 label
A ? 1 0.2 1
B 0.09 ? 0.1 1
C 0.01 ? 0.3 -1
27
Transfer between high dimensional overlapping
distributions

Overlapping Distributions
Use the union of all features and fill in missing
values with zeros?

28
Transfer between high dimensional overlapping
distributions

Overlapping Distributions
Use the union of all features and fill in missing
values with zeros?

f1 f2 f3 label
A 0 1 0.2 1
B 0.09 0 0.1 1
C 0.01 0 0.3 -1
29
Transfer between high dimensional overlapping
distributions

Overlapping Distribution
Use the union of all features and fill in the
missing values with zeros?

Does it helps?
f1 f2 f3 label
A 0 1 0.2 1
B 0.09 0 0.1 1
C 0.01 0 0.3 -1
30
Transfer between high dimensional overlapping
distributions
31
Transfer between high dimensional overlapping
distributions
D2 A, B 0.0181 gt D2 A, C 0.0101
32
Transfer between high dimensional overlapping
distributions
D2 A, B 0.0181 gt D2 A, C 0.0101
A is mis-classified as in the class of C,
instead of B
33
Transfer between high dimensional overlapping
distributions

When one uses the union of overlapping and
non-overlapping features and replaces missing
values with zero,
distance of two marginal distributions p(x) can
become asymptotically very large as a function of
non-overlapping features
becomes a dominant factor in similarity measure.

34
Transfer between high dimensional overlapping
distributions

High dimensionality can underpin important
features

35
Transfer between high dimensional overlapping
distributions
36
Transfer between high dimensional overlapping
distributions
The blues are closer to the greens than to
the reds
37
LatentMap two step correction

Missing value regression
Bring marginal distributions closer
Latent space dimensionality reduction
Further bring marginal distributions closer
Ignore non-important noisy and error imported
features
Identify transferable substructures across two
domains.

38
Missing Value Regression

Predict missing values (recall the previous
example)

39
Missing Value Regression

Predict missing values (recall the previous
example)

40
Missing Value Regression

Predict missing values (recall the previous
example)

1. Project to overlapped feature
41
Missing Value Regression

Predict missing values (recall the previous
example)

2. Map from z to x Relationship found
byregression
1. Project to overlapped feature
42
Missing Value Regression

Predict missing values (recall the previous
example)

2. Map from z to x Relationship found
byregression
1. Project to overlapped feature
43
Missing Value Regression
D img(A), B 0.0109 lt D img(A), C
0.0125

Predict missing values (recall the previous
example)

2. Map from z to x Relationship found
byregression
1. Project to overlapped feature
44
Missing Value Regression
D img(A), B 0.0109 lt D img(A), C
0.0125

Predcit missing values (recall the previous
example)

2. Map from z to x Relationship found
byregression
1. Project to overlapped feature
A is correctly classified as in the same class
as B
45
Dimensionality Reduction
46
Dimensionality Reduction
Missing Values
47
Dimensionality Reduction
Missing Values
Overlapping Features
48
Dimensionality Reduction
Missing Values
Missing Values Filled
Overlapping Features
49
Dimensionality Reduction
Word vector Matrix
Missing Values
Missing Values Filled
Overlapping Features
50
Dimensionality Reduction

Project the word vector matrix to the most
important and inherent sub-space

51
Dimensionality Reduction

Project the word vector matrix to the most
important and inherent sub-space

52
Dimensionality Reduction

Project the word vector matrix to the most
important and inherent sub-space

Low dimensional representation
53
Solution (high dimensionality)

recall the previous example

54
Solution (high dimensionality)

recall the previous example

55
Solution (high dimensionality)

recall the previous example

The blues are closer to the greens than to the
reds
56
Solution (high dimensionality)

recall the previous example

57
Solution (high dimensionality)
The blues are closer to the reds than to the
greens

recall the previous example

58
Properties

It can bring the marginal distributions of two
domains closer.
- Marginal distributions are brought closer in
high-dimensional space (section 3.2)
- Two marginal distributions are further
minimized in low dimensional space. (theorem
3.2)
It brings two domains conditional distributions
closer.
- Nearby instances from two domains have similar
conditional distributions (section 3.3)
It can reduce domain transfer risk
- The risk of nearest neighbor classifier can be
bounded in transfer learning settings. (theorem
3.3)

59
Experiment (I)?

Data Sets
20 News Groups
20000 newsgroup articles
SRAA (simulated real auto aviation)
73128 articles from 4 discussion groups
(simulated auto racing, simulated aviation, real
autos, and real aviation)
Reuters
21758 Reuters news articles (1987)

60
Experiment (I)?

Data Sets
20 News Groups
20000 newsgroup articles
SRAA (simulated real auto aviation)
73128 articles from 4 discussion groups
(simulated auto racing, simulated aviation, real
autos, and real aviation)
Reuters
21758 Reuters news articles (1987)

61
Experiment (I)?

Data Sets
20 News Groups
20000 newsgroup articles
SRAA (simulated real auto aviation)
73128 articles from 4 discussion groups
(simulated auto racing, simulated aviation, real
autos, and real aviation)
Reuters
21758 Reuters news articles (1987)
Baseline methods
naïve Bayes, logistic regression, SVMs
Knn-Reg missing value filled without SVD
pLatentMap SVD but missing value as 0

62
Experiment (I)?

Data Sets
20 News Groups
20000 newsgroup articles
SRAA (simulated real auto aviation)
73128 articles from 4 discussion groups
Reuters
21758 Reuters news articles
Baseline methods
naïve Bayes, logistic regression, SVM
Knn-Reg missing value filled without SVD
pLatentMap SVD but missing value as 0

Try to justify the two steps in our framework
63
Learning Tasks
64
Experiment (II)?
10 win 1 loss
Overall performance
65
Experiment (III)?
66
Conclusion

Problem High dimensional overlapping domain
transfer
- text and image categorization
Step 1 Missing values filling up
--- Bring two domains marginal distributions
closer
Step 2 SVD dimension reduction
--- Further bring two marginal distributions
closer (Theorem 3.2)
--- Cluster points from two domains, making
conditional distribution transferable. (Theorem
3.3

67
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Transfer Learning with Applications to Text Classification - PowerPoint PPT Presentation

Transfer Learning with Applications to Text Classification

Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department – PowerPoint PPT presentation