Title: SemiSupervised Learning Using Label Mean
1Semi-Supervised Learning Using Label Mean
- Yu-Feng Li1, James T. Kwok2, Zhi-Hua Zhou1
- 1LAMDA Group, Nanjing University, China
- liyf, zhouzh_at_lamda.nju.edu.cn
- 2 Dept. Computer Science Engineering,
- Hong Kong University of Science and Technology,
Hong Kong - jamesk_at_cse.ust.hk
2The Problem
Many SVM algorithms for supervised learning are
efficient. Existing S3VMs (Semi-Supervised SVMs)
are not so efficient.
Whats the major obstacle to designing efficient
S3VMs? How to design an efficient S3VM?
3Outline
- Introduction
- Our Methods
- Experiments
- Conclusion
4Introduction Semi-Supervised Learning (SSL)
Optimal Hyperplane
The goal of SSL is to improve the performance of
supervised learning by utilizing unlabeled data
5Introduction SSL Applications
- Text categorization Joachims. ICML99
- Hand-written digit classification Zhu et al.,
ICML03 Zhu et al., ICML05 - Medical image segmentation Grady Funka-Lea,
ECCV04 - Image retrieval He at al., ACM Multimedia04
- Word sense disambiguation Niu et al., ACL04
Yarowsky et al., ACL95 CUONG, Thesis07 - Object detection Rosenberg et al., WACV05
-
6Introduction Many SSL Algorithms
- Generative methods Miller Uyar, NIPS96
Nigam et al., MLJ00 Fujino et al., AAAI05
etc. - Disagreement-based methods Blum Mitchell,
COLT98 Mitchell, ICCS99 Nigam Ghahi,
CIKM00 Zhou Li, TKDE05 - Graph-based methods Zhou et al., NIPS02, Zhu et
al., ICML03 Belkin et al., JMLR06 -
- Recent surveys of SSL literature
- Chapelle et al., eds., Semi-Supervised Learning,
MIT Press, 2006 - Zhu, Semi-Supervised Learning Literature Survey,
2007 - Zhou Li, Semi-supervised learning by
disagreement, KAIS, 2009
7Introduction S3VMs
- Semi-supervised Support Vector Machine Bennett
Demiriz, NIPS99 - Transductive SVM Joachims, ICML99
- Laplacian SVM Belkin et al., JMLR06
- SDP relaxations De Bie Cristianimi, NIPS04
De Bie Cristianim, JMLR06 - Many optimization algorithms for S3VM Chepelle
et al., JMLR08 -
8Introduction S3VMs
Optimal Hyperplane
Low-Density Assumption Cluster Assumption
Chellepe et al., ICML05
9Introduction S3VMs formulations
Loss on labeled data, e.g., hinge loss
The effect of the objective in S3VM has been
well-studied in Chellepe et al., JMLR08.
10Introduction Efficiency of existing S3VMs
- Bennett Demiriz, NIPS99 formulated S3VM as
a mixed-integer programming problem, so it is
computationally intractable in general - Transductive SVM Joachims, ICML99 iteratively
solves standard supervised SVM problems, however,
the number of iterations may be quite large in
practice - Laplacian SVM Belkin et al., JMLR06 solves a
small SVM with labeled data only, but it needs to
calculate the inverse of an n?n matrix ( O(n3)
time and O(n2) memory)
Existing S3VMs are inefficient
11Introduction Analysis
- Our main observation
- Most S3VM algorithms aim at estimating the
correct label of each unlabeled instance - The number of constraints in the optimization
problem will be as many as the unlabeled samples
Can we use simpler statistics instead of the
labels to reduce the number of constraints while
still achieves competitively performance with
state-of-art ssl methods? - label means.
12Outline
- Introduction
- Our Methods
- Experiments
- Conclusion
13Our Methods Usefulness of the Label Mean
We consider the following optimization problem
are estimations of the label means
14Our Methods Usefulness of the Label Mean (cont.)
MeanS3VM
This motivates us to first estimate the label
means of the unlabeled instances.
Difference only exists when samples are
non-separable
This analysis suggests that, if an S3VM knows
the label means of the unlabeled instances, it
can closely approximate an SVM that knows all
the labels of the unlabeled instances!
15Our Methods Estimate the label mean
Maximal margin approach
We propose two algorithms to solve it, one is
based on convex relaxation, the other is based
on alternating optimization.
- Note that it has much fewer constraints than
S3VM, which greatly reduces the time complexity
of the optimization. - It can also be explained in terms of MMD Gretton
et al., NIPS06 which aims to separate
distribution of different classes with large
margin.
16Our Methods Convex relaxation approach
Consider the dual
Consider the minimax relaxation Li et al.,
AISTATS09
Multiple Kernel Learning
17Our Methods Convex relaxation approach (cont.)
Exponential number of base kernels. Too expensive
Cutting plane algorithm
18Our Methods Find the most violated d
To find the most violated d, we need to solve the
following maximization problem
Rewritten as
It is a concave QP, and could not be solved
efficiently
Not related to d
However, the cutting plane method only requires
to add a violated constraint at each iteration
Hence, we propose a simple and efficient method
for finding a good approximation of the most
violated d
Linear problem, can be solved by sorting
19Our Methods Alternating Optimization
Iterate until convergence.
Fixed d, solve the dual variable
Standard SVM
Fixed dual variable, solve the d
Can still be solved by sorting
20Our Methods Comparison and means3vm
implementation
Convex relaxation approach is global optimization
Alternating optimization approach may get stuck
in local solution, but simple and empirically
faster
We use the result of d from these two approaches,
together with the labels of the labeled data, to
train a final SVM
We denote convex relaxation approach as
meanS3vm-mkl and alternating optimization
approach as meanS3vm-iter
21Outline
- Introduction
- Our Methods
- Experiments
- Conclusion
22Experiments Four Kinds of Tasks
- Benchmark tasks
- UCI data sets
- Text categorization
- Speed
23Experiments Benchmark Tasks
Following the same setup as S3VM
24Experiments UCI datasets
9 data sets, 10 labeled data, 50 train / 50
test, 20 runs
Means3vms achieve highly competitive performance
in all data sets. In particular, they achieve the
best performance in 6 of 9 tasks.
25Experiments Text Categorization
10 binary tasks 2 labeled data, 50 train / 50
test, 20 runs
Means3vms achieve highly competitive performance
in all data sets. They achieve the best
performance in 8 of 10
26Experiments Speed
On large data sets (with more than 1,000
instances), means3vm-mkl is much faster than
Laplacian SVM.
means3vm-iter is almost the fastest method. On
large data sets, means3vm-iter is 10 times faster
than Laplacian SVM, 100 times faster than TSVM.
27Conclusion
- Main contribution
- S3VM label means SVM with full labels
- Two efficient and effective SSL methods
- Future work
- Theoretical study on the effect of label means
- Other approaches to estimating label means
Thanks!