Title: Effective MultiLabel Active Learning for Text Classification
1Effective Multi-Label Active Learning for Text
Classification
- Bishan Yang1, Jian-Tao Sun2, Tengjiao Wang1, and
Zheng Chen2
Computer Science Department, Peking
University1 Microsoft Research Asia2
KDD 2009, Paris
2Outline
- Motivation
- Related Work
- SVM-Based Active Learning for Multi-Label Text
Classification - Experiments
- Summary
3Motivation
- Text classification is everywhere
- Web search
- News classification
- Email classification
-
- Many text data are multi-labeled
Business
Politics
Travel
World news
Entertainment
Local news
4Labeling Effort is Huge
- Supervised learning approach
- The model is trained on a set of randomly labeled
data - Requires a sufficient amount of labeled data to
ensure the quality of the model.
The more categories, the more judging effort for
each document, and more data needed to be
labeled.
xxxxxxxxxxxxxxxxxxxxxxxx
C2
C4
C1
C3
C5
xxxxxxxxxxxxxxxxxxxxxxxx
C2
C4
C1
C3
C5
xxxxxxxxxxxxxxxxxxxxxxxx
C2
C4
C1
C3
C5
5Active Learning Reduce Labeling Effort
Train Classifier
Select an optimal set from
Selection Strategy
Data Pool
With an effective selection strategy, active
learner can obtain comparable accuracy with
supervised learner using much less labeled
data.
Important for multi-label text classification.
Query for true labels
Augment the labeled set
6Challenges for Multi-Label Active Learning
- How to select the most informative multi-labeled
data? - Use selection strategy for single-label case?
No - E.g.
x1
0.8
0.5
0.1
x2 is more informative?
C3
C1
C2
What about x1 actually has two labels?
0.6
0.1
0.1
x2
C3
C1
C2
7Related Work
- Single-label Active learning
- Uncertainty sampling SIGIR94, JMLR05
- Aims to label the most uncertain data
- Expected-error reduction NIPS95, ICML01,
ICCV03 - Labels data to minimize the expected error
- Committee-based COLT92, JMLR02
- Labels data which has the largest disagreement
among several committee members (classifiers)
from the version space - Multi-label active learning
- BinMin Springer06
- Minimizes the loss on the most uncertain category
for each data - MML ICIP04
- Optimize the mean of the SVM hinge loss for the
predicted classes - Two-dimensional active learning ICCV08,
TPAMI08 - Minimize the classification error on
picture-label pairs
8Our approach SVM-Based Active Learning for
Multi-Label Text Classification
- Optimization goal
- Maximize the reduction of the expected model loss
if x belongs to category , ,
otherwise, .
9Sample Selection Strategy with SVM
- Two main issues
- How to measure the loss reduction of the
multi-label classifier? - How to provide a good probability estimation for
the conditional probability?
probability estimation
loss reduction
10Estimation of Loss Reduction
- Decompose the multi-label problem into several
binary classifiers - For each binary classifier, the model loss is
measured by the size of the version space. - SVM version space S. Tong 02
- is the parameter space. The size of a
version space is defined as the surface area of
the hypersphere in .
11Estimation of Loss Reduction (Cont.)
- With version space duality, the loss reduction
rate can be approximated by using the SVM output
margin - Maximize the sum of loss reduction rate for all
binary classifiers
loss of binary classifier built on ,
associated with class
If f correctly predict x, then f(x) ,
uncertainty If f does not correctly predict x,
Then f(x) , uncertainty
size of the version space for classifier
if x belongs to class i, then ,
otherwise
12Probability Estimation
- Intractable to directly compute the expected loss
function - Limited training data
- Large number of possible label vectors
for each x - Approximate by the loss function with the largest
conditional probability .
the label vector with the largest conditional
probability
13How to predict ?
- Main ideas
- First build a classification model to predict the
possible label number each data may have. - Then determine the label vector based on the
prediction result.
14How to predict ? (Cont.)
For each x, sort the probabilities in decreased
order and normalized to make their sum equals 1.
Assign probability output for each class
Train Logistic Regression Model
For each unlabeled data, predict the
probabilities of having different number of
labels.
Features Label the true label number of x
If the label number with the largest probability
is j, then
15Experiments
- Data sets
- RCV1-V2 D. D. Lewis 04
- Reuters newswire stories
- Yahoos webpage collection N. Ueda 02, H. Kazawa
05 - hyperlinks from Yahoo!s top directory
16Experiment Setup
- Comparing methods
- MMC (Maximum loss reduction with Maximal
Confidence) - BinMin
- MML
- Random
- SVMLight T. Joachims 02 is used as the based
classifier. - Performance measures
- Micro-Average F1 score
are the predicted labels
17Results on RCV1-V2 Data set
- Compare the label prediction methods
- The proposed prediction method
- Scut D. D. Lewis 04
- Tune threshold for each class
- SCut (threshold0)
18Results on RCV1-V2 Data set (Cont.)
- Initial labeled set 500 examples
- 50 iterations, S 20
19Results on RCV1-V2 Data set (Cont.)
- Vary the size of initial labeled set, 50
iterations, S 20 -
20Results on RCV1-V2 Data set (Cont.)
- Vary the sampling size per run initial labeled
set 500 examples - Stop after adding 1,000 labeled data
21Results on Yahoo! Data set
- Initial labeled set 500
- examples
- 50 iterations, S 50
22Summary
- Multi-Label Active Learning for Text
Classification - Important to reduce human labeling effort
- Challenge task
- SVM-based Multi-Label Active Learning
- Optimize loss reduction rate based on SVM version
space - Effective label prediction method
- Successfully reduce labeling effort on real-world
datasets - Future work
- More efficient evaluation on the unlabeled pool
- More multi-label classification tasks e.g. image
classification
23Thank you!