SemiSupervised Learning Using Label Mean - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

SemiSupervised Learning Using Label Mean

Description:

Semi-Supervised Learning Using. Label Mean. Yu-Feng Li1, James T. Kwok2, Zhi-Hua Zhou1 ... Medical image segmentation [Grady & Funka-Lea, ECCV'04] ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 28
Provided by: zhi75
Category:

less

Transcript and Presenter's Notes

Title: SemiSupervised Learning Using Label Mean


1
Semi-Supervised Learning Using Label Mean
  • Yu-Feng Li1, James T. Kwok2, Zhi-Hua Zhou1
  • 1LAMDA Group, Nanjing University, China
  • liyf, zhouzh_at_lamda.nju.edu.cn
  • 2 Dept. Computer Science Engineering,
  • Hong Kong University of Science and Technology,
    Hong Kong
  • jamesk_at_cse.ust.hk

2
The Problem
Many SVM algorithms for supervised learning are
efficient. Existing S3VMs (Semi-Supervised SVMs)
are not so efficient.
Whats the major obstacle to designing efficient
S3VMs? How to design an efficient S3VM?
3
Outline
  • Introduction
  • Our Methods
  • Experiments
  • Conclusion

4
Introduction Semi-Supervised Learning (SSL)
Optimal Hyperplane
The goal of SSL is to improve the performance of
supervised learning by utilizing unlabeled data
5
Introduction SSL Applications
  • Text categorization Joachims. ICML99
  • Hand-written digit classification Zhu et al.,
    ICML03 Zhu et al., ICML05
  • Medical image segmentation Grady Funka-Lea,
    ECCV04
  • Image retrieval He at al., ACM Multimedia04
  • Word sense disambiguation Niu et al., ACL04
    Yarowsky et al., ACL95 CUONG, Thesis07
  • Object detection Rosenberg et al., WACV05

6
Introduction Many SSL Algorithms
  • Generative methods Miller Uyar, NIPS96
    Nigam et al., MLJ00 Fujino et al., AAAI05
    etc.
  • Disagreement-based methods Blum Mitchell,
    COLT98 Mitchell, ICCS99 Nigam Ghahi,
    CIKM00 Zhou Li, TKDE05
  • Graph-based methods Zhou et al., NIPS02, Zhu et
    al., ICML03 Belkin et al., JMLR06
  • Recent surveys of SSL literature
  • Chapelle et al., eds., Semi-Supervised Learning,
    MIT Press, 2006
  • Zhu, Semi-Supervised Learning Literature Survey,
    2007
  • Zhou Li, Semi-supervised learning by
    disagreement, KAIS, 2009

7
Introduction S3VMs
  • Semi-supervised Support Vector Machine Bennett
    Demiriz, NIPS99
  • Transductive SVM Joachims, ICML99
  • Laplacian SVM Belkin et al., JMLR06
  • SDP relaxations De Bie Cristianimi, NIPS04
    De Bie Cristianim, JMLR06
  • Many optimization algorithms for S3VM Chepelle
    et al., JMLR08

8
Introduction S3VMs
Optimal Hyperplane
Low-Density Assumption Cluster Assumption
Chellepe et al., ICML05
9
Introduction S3VMs formulations
Loss on labeled data, e.g., hinge loss
The effect of the objective in S3VM has been
well-studied in Chellepe et al., JMLR08.
10
Introduction Efficiency of existing S3VMs
  • Bennett Demiriz, NIPS99 formulated S3VM as
    a mixed-integer programming problem, so it is
    computationally intractable in general
  • Transductive SVM Joachims, ICML99 iteratively
    solves standard supervised SVM problems, however,
    the number of iterations may be quite large in
    practice
  • Laplacian SVM Belkin et al., JMLR06 solves a
    small SVM with labeled data only, but it needs to
    calculate the inverse of an n?n matrix ( O(n3)
    time and O(n2) memory)

Existing S3VMs are inefficient
11
Introduction Analysis
  • Our main observation
  • Most S3VM algorithms aim at estimating the
    correct label of each unlabeled instance
  • The number of constraints in the optimization
    problem will be as many as the unlabeled samples

Can we use simpler statistics instead of the
labels to reduce the number of constraints while
still achieves competitively performance with
state-of-art ssl methods? - label means.
12
Outline
  • Introduction
  • Our Methods
  • Experiments
  • Conclusion

13
Our Methods Usefulness of the Label Mean
We consider the following optimization problem
are estimations of the label means
14
Our Methods Usefulness of the Label Mean (cont.)
MeanS3VM
This motivates us to first estimate the label
means of the unlabeled instances.
Difference only exists when samples are
non-separable
This analysis suggests that, if an S3VM knows
the label means of the unlabeled instances, it
can closely approximate an SVM that knows all
the labels of the unlabeled instances!
15
Our Methods Estimate the label mean
Maximal margin approach
We propose two algorithms to solve it, one is
based on convex relaxation, the other is based
on alternating optimization.
  • Note that it has much fewer constraints than
    S3VM, which greatly reduces the time complexity
    of the optimization.
  • It can also be explained in terms of MMD Gretton
    et al., NIPS06 which aims to separate
    distribution of different classes with large
    margin.

16
Our Methods Convex relaxation approach
Consider the dual
Consider the minimax relaxation Li et al.,
AISTATS09
Multiple Kernel Learning
17
Our Methods Convex relaxation approach (cont.)
Exponential number of base kernels. Too expensive
Cutting plane algorithm
18
Our Methods Find the most violated d
To find the most violated d, we need to solve the
following maximization problem
Rewritten as
It is a concave QP, and could not be solved
efficiently
Not related to d
However, the cutting plane method only requires
to add a violated constraint at each iteration
Hence, we propose a simple and efficient method
for finding a good approximation of the most
violated d
Linear problem, can be solved by sorting
19
Our Methods Alternating Optimization
Iterate until convergence.
Fixed d, solve the dual variable
Standard SVM
Fixed dual variable, solve the d
Can still be solved by sorting
20
Our Methods Comparison and means3vm
implementation
Convex relaxation approach is global optimization

Alternating optimization approach may get stuck
in local solution, but simple and empirically
faster
We use the result of d from these two approaches,
together with the labels of the labeled data, to
train a final SVM
We denote convex relaxation approach as
meanS3vm-mkl and alternating optimization
approach as meanS3vm-iter
21
Outline
  • Introduction
  • Our Methods
  • Experiments
  • Conclusion

22
Experiments Four Kinds of Tasks
  • Benchmark tasks
  • UCI data sets
  • Text categorization
  • Speed

23
Experiments Benchmark Tasks
Following the same setup as S3VM
24
Experiments UCI datasets
9 data sets, 10 labeled data, 50 train / 50
test, 20 runs
Means3vms achieve highly competitive performance
in all data sets. In particular, they achieve the
best performance in 6 of 9 tasks.
25
Experiments Text Categorization
10 binary tasks 2 labeled data, 50 train / 50
test, 20 runs
Means3vms achieve highly competitive performance
in all data sets. They achieve the best
performance in 8 of 10
26
Experiments Speed
On large data sets (with more than 1,000
instances), means3vm-mkl is much faster than
Laplacian SVM.
means3vm-iter is almost the fastest method. On
large data sets, means3vm-iter is 10 times faster
than Laplacian SVM, 100 times faster than TSVM.
27
Conclusion
  • Main contribution
  • S3VM label means SVM with full labels
  • Two efficient and effective SSL methods
  • Future work
  • Theoretical study on the effect of label means
  • Other approaches to estimating label means

Thanks!
Write a Comment
User Comments (0)
About PowerShow.com