Co-Training and Expansion: Towards Bridging Theory and Practice

About This Presentation

Title:

Co-Training and Expansion: Towards Bridging Theory and Practice

Description:

Combining Labeled and Unlabeled Data (a.k.a. Semi-supervised Learning) ... Co-training: method for combining labeled & unlabeled data ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 22

Provided by: dorub

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Co-Training and Expansion: Towards Bridging Theory and Practice

1
Co-Training and Expansion Towards Bridging
Theory and Practice

Maria-Florina Balcan, Avrim Blum, Ke Yang
Carnegie Mellon University,
Computer Science Department

2
Combining Labeled and Unlabeled Data (a.k.a.
Semi-supervised Learning)

Many applications have lots of unlabeled data,
but labeled data is rare or expensive
Web page, document classification
OCR, Image classification
Several methods have been developed to try to use
unlabeled data to improve performance, e.g.
Transductive SVM
Co-training
Graph-based methods

3
Co-training method for combining labeled
unlabeled data

Works in scenarios where examples have distinct,
yet sufficient feature sets
An example
Belief is that the two parts of the example are
consistent, i.e. 9 c1, c2 such that
Each view is sufficient for correct
classification
Works by using unlabeled data to propagate
learned information.

4
Co-Training method for combining labeled
unlabeled data

For example, if we want to classify web pages

5
Iterative Co-Training

Have learning algorithms A1, A2 on each of the
two views.
Use labeled data to learn two initial hypotheses
h1, h2.
Look through unlabeled data to find examples
where one of hi is confident but other is not.
Have the confident hi label it for algorithm A3-i.

Repeat
6
Iterative Co-Training A Simple Example
Learning Intervals
h21
h11
Use labeled data to learn h11 and h21
Use unlabeled data to bootstrap
7
Theoretical/Conceptual Question

What properties do we need for co-training to
work well?
Need assumptions about
the underlying data distribution
the learning algorithms on the two sides

8
Theoretical/Conceptual Question

What property of the data do we need for
co-training to work well?
Previous work
Independence given the label
Weak rule dependence
Our work - much weaker assumption about how the
data should behave
expansion property of the underlying distribution
Though we will need stronger assumption on the
learning algorithm compared to (1).

9
Co-Training, Formal Setting

Assume that examples are drawn from distribution
D over instance space .
Let c be the target function assume that each
view is sufficient for correct classification
c can be decomposed into c1, c2 over each view s.
t. D has no probability mass on examples x with
c1(x1) ? c2(x2)
Let X and X- denote the positive and negative
regions of X.
Let D and D- be the marginal distribution of D
over X and X- respectively.
Let
think of as

D
D-
10
(Formalization)
Expansion

We assume that D is expanding.
Expansion
This is a natural analog of the graph-theoretic
notions of conductance and expansion.

11
Property of the underlying distribution
Expansion

Necessary condition for co-training to work well
If S1 and S2 (our confident sets) do not expand,
then we might never see examples for which one
hypothesis could help the other.
We show, sufficient for co-training to generalize
well in a relatively small number of iterations,
under some assumptions
the data is perfectly separable
have strong learning algorithms on the two sides

12
Expansion, Examples Learning Intervals
Non-expanding distribution
Expanding distribution
13
Expansion

Weaker than independence given the label than
weak rule dependence.

e.g, w.h.p. a random degree-3 bipartite graph is
expanding, but would NOT have independence given
the label, or weak rule dependence
D
D-
14
Main Result

Assume D is ?-expanding.
Assume that on each of the two views we have
algorithms A1 and A2 for learning from positive
data only.
Assume that we have initial confident sets S10
and S20 such that

15
Main Result, Interpretation

Assumption on A1, A2 implies the they never
generalize incorrectly.
Question is what needs to be true for them to
actually generalize to whole of D?

16
Main Result, Proof Idea

Expansion implies that at each iteration, there
is reasonable probability mass on "new, useful"
data.
Algorithms generalize to most of this new region.
See paper for real proof.

17
What if assumptions are violated?

What if our algorithms can make incorrect
generalizations and/or there is no perfect
separability?

18
What if assumptions are violated?

Expect "leakage" to negative region.
If negative region is expanding too, then
incorrect generalizations will grow at
exponential rate.
Correct generalization are growing at exponential
rate too, but will slow down first.
Expect overall accuracy to go up then down.

19
Synthetic Experiments

Create a 2n-by-2n bipartite graph
nodes 1 to n on each side represent positive
clusters
nodes n1 to 2n on each side represent negative
clusters
Connect each node on the left to 3 nodes on the
right
each neighbor is chosen with prob. 1-? to be a
random node of the same class, and with prob. ?
to be a random node of the opposite class
Begin with an initial confident set
and then propagate confidence through rounds of
co-training
monitor the percentage of the positive class
covered, the percent of the negative class
mistakenly covered, and the overall accuracy

20
Synthetic Experiments
?0.01, n5000, d3
?0.001, n5000, d3

solid line indicates overall accuracy
green curve is accuracy on positives
red curve is accuracy on negatives

21
Conclusions

We propose a much weaker expansion assumption of
the underlying data distribution.
It seems to be the right condition on the
distribution for co-training to work well.
It directly motivates the iterative nature of
many of the practical co-training based
algorithms.

Write a Comment

User Comments (0)

About PowerShow.com

Co-Training and Expansion: Towards Bridging Theory and Practice - PowerPoint PPT Presentation

Co-Training and Expansion: Towards Bridging Theory and Practice

Combining Labeled and Unlabeled Data (a.k.a. Semi-supervised Learning) ... Co-training: method for combining labeled & unlabeled data ... – PowerPoint PPT presentation