SemiSupervised Learning Using Label Mean - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

SemiSupervised Learning Using Label Mean

Description:

Semi-Supervised Learning Using. Label Mean. Yu-Feng Li1, James T. Kwok2, Zhi-Hua Zhou1 ... Medical image segmentation [Grady & Funka-Lea, ECCV'04] ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 28

Provided by: zhi75

Category:

more less

Transcript and Presenter's Notes

Title: SemiSupervised Learning Using Label Mean

1
Semi-Supervised Learning Using Label Mean

Yu-Feng Li1, James T. Kwok2, Zhi-Hua Zhou1
1LAMDA Group, Nanjing University, China
liyf, zhouzh_at_lamda.nju.edu.cn
2 Dept. Computer Science Engineering,
Hong Kong University of Science and Technology,
Hong Kong
jamesk_at_cse.ust.hk

2
The Problem
Many SVM algorithms for supervised learning are
efficient. Existing S3VMs (Semi-Supervised SVMs)
are not so efficient.
Whats the major obstacle to designing efficient
S3VMs? How to design an efficient S3VM?
3
Outline

Introduction
Our Methods
Experiments
Conclusion

4
Introduction Semi-Supervised Learning (SSL)
Optimal Hyperplane
The goal of SSL is to improve the performance of
supervised learning by utilizing unlabeled data
5
Introduction SSL Applications

Text categorization Joachims. ICML99
Hand-written digit classification Zhu et al.,
ICML03 Zhu et al., ICML05
Medical image segmentation Grady Funka-Lea,
ECCV04
Image retrieval He at al., ACM Multimedia04
Word sense disambiguation Niu et al., ACL04
Yarowsky et al., ACL95 CUONG, Thesis07
Object detection Rosenberg et al., WACV05

6
Introduction Many SSL Algorithms

Generative methods Miller Uyar, NIPS96
Nigam et al., MLJ00 Fujino et al., AAAI05
etc.
Disagreement-based methods Blum Mitchell,
COLT98 Mitchell, ICCS99 Nigam Ghahi,
CIKM00 Zhou Li, TKDE05
Graph-based methods Zhou et al., NIPS02, Zhu et
al., ICML03 Belkin et al., JMLR06

Recent surveys of SSL literature
Chapelle et al., eds., Semi-Supervised Learning,
MIT Press, 2006
Zhu, Semi-Supervised Learning Literature Survey,
2007
Zhou Li, Semi-supervised learning by
disagreement, KAIS, 2009

7
Introduction S3VMs

Semi-supervised Support Vector Machine Bennett
Demiriz, NIPS99
Transductive SVM Joachims, ICML99
Laplacian SVM Belkin et al., JMLR06
SDP relaxations De Bie Cristianimi, NIPS04
De Bie Cristianim, JMLR06
Many optimization algorithms for S3VM Chepelle
et al., JMLR08

8
Introduction S3VMs
Optimal Hyperplane
Low-Density Assumption Cluster Assumption
Chellepe et al., ICML05
9
Introduction S3VMs formulations
Loss on labeled data, e.g., hinge loss
The effect of the objective in S3VM has been
well-studied in Chellepe et al., JMLR08.
10
Introduction Efficiency of existing S3VMs

Bennett Demiriz, NIPS99 formulated S3VM as
a mixed-integer programming problem, so it is
computationally intractable in general
Transductive SVM Joachims, ICML99 iteratively
solves standard supervised SVM problems, however,
the number of iterations may be quite large in
practice
Laplacian SVM Belkin et al., JMLR06 solves a
small SVM with labeled data only, but it needs to
calculate the inverse of an n?n matrix ( O(n3)
time and O(n2) memory)

Existing S3VMs are inefficient
11
Introduction Analysis

Our main observation
Most S3VM algorithms aim at estimating the
correct label of each unlabeled instance
The number of constraints in the optimization
problem will be as many as the unlabeled samples

Can we use simpler statistics instead of the
labels to reduce the number of constraints while
still achieves competitively performance with
state-of-art ssl methods? - label means.
12
Outline

Introduction
Our Methods
Experiments
Conclusion

13
Our Methods Usefulness of the Label Mean
We consider the following optimization problem
are estimations of the label means
14
Our Methods Usefulness of the Label Mean (cont.)
MeanS3VM
This motivates us to first estimate the label
means of the unlabeled instances.
Difference only exists when samples are
non-separable
This analysis suggests that, if an S3VM knows
the label means of the unlabeled instances, it
can closely approximate an SVM that knows all
the labels of the unlabeled instances!
15
Our Methods Estimate the label mean
Maximal margin approach
We propose two algorithms to solve it, one is
based on convex relaxation, the other is based
on alternating optimization.

Note that it has much fewer constraints than
S3VM, which greatly reduces the time complexity
of the optimization.
It can also be explained in terms of MMD Gretton
et al., NIPS06 which aims to separate
distribution of different classes with large
margin.

16
Our Methods Convex relaxation approach
Consider the dual
Consider the minimax relaxation Li et al.,
AISTATS09
Multiple Kernel Learning
17
Our Methods Convex relaxation approach (cont.)
Exponential number of base kernels. Too expensive
Cutting plane algorithm
18
Our Methods Find the most violated d
To find the most violated d, we need to solve the
following maximization problem
Rewritten as
It is a concave QP, and could not be solved
efficiently
Not related to d
However, the cutting plane method only requires
to add a violated constraint at each iteration
Hence, we propose a simple and efficient method
for finding a good approximation of the most
violated d
Linear problem, can be solved by sorting
19
Our Methods Alternating Optimization
Iterate until convergence.
Fixed d, solve the dual variable
Standard SVM
Fixed dual variable, solve the d
Can still be solved by sorting
20
Our Methods Comparison and means3vm
implementation
Convex relaxation approach is global optimization

Alternating optimization approach may get stuck
in local solution, but simple and empirically
faster
We use the result of d from these two approaches,
together with the labels of the labeled data, to
train a final SVM
We denote convex relaxation approach as
meanS3vm-mkl and alternating optimization
approach as meanS3vm-iter
21
Outline

Introduction
Our Methods
Experiments
Conclusion

22
Experiments Four Kinds of Tasks

Benchmark tasks
UCI data sets
Text categorization
Speed

23
Experiments Benchmark Tasks
Following the same setup as S3VM
24
Experiments UCI datasets
9 data sets, 10 labeled data, 50 train / 50
test, 20 runs
Means3vms achieve highly competitive performance
in all data sets. In particular, they achieve the
best performance in 6 of 9 tasks.
25
Experiments Text Categorization
10 binary tasks 2 labeled data, 50 train / 50
test, 20 runs
Means3vms achieve highly competitive performance
in all data sets. They achieve the best
performance in 8 of 10
26
Experiments Speed
On large data sets (with more than 1,000
instances), means3vm-mkl is much faster than
Laplacian SVM.
means3vm-iter is almost the fastest method. On
large data sets, means3vm-iter is 10 times faster
than Laplacian SVM, 100 times faster than TSVM.
27
Conclusion