Effective MultiLabel Active Learning for Text Classification - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Effective MultiLabel Active Learning for Text Classification

Description:

Related Work. SVM-Based Active Learning for Multi-Label Text Classification. Experiments ... The model is trained on a set of randomly labeled data ... – PowerPoint PPT presentation

Number of Views:267

Avg rating:3.0/5.0

Slides: 24

Provided by: Emi2129

Category:

more less

Transcript and Presenter's Notes

Title: Effective MultiLabel Active Learning for Text Classification

1
Effective Multi-Label Active Learning for Text
Classification

Bishan Yang1, Jian-Tao Sun2, Tengjiao Wang1, and
Zheng Chen2

Computer Science Department, Peking
University1 Microsoft Research Asia2
KDD 2009, Paris
2
Outline

Motivation
Related Work
SVM-Based Active Learning for Multi-Label Text
Classification
Experiments
Summary

3
Motivation

Text classification is everywhere
Web search
News classification
Email classification
Many text data are multi-labeled

Business
Politics
Travel
World news
Entertainment
Local news

4
Labeling Effort is Huge

Supervised learning approach
The model is trained on a set of randomly labeled
data
Requires a sufficient amount of labeled data to
ensure the quality of the model.

The more categories, the more judging effort for
each document, and more data needed to be
labeled.
xxxxxxxxxxxxxxxxxxxxxxxx
C2
C4
C1
C3
C5

xxxxxxxxxxxxxxxxxxxxxxxx
C2
C4
C1
C3
C5

xxxxxxxxxxxxxxxxxxxxxxxx
C2
C4
C1
C3
C5

5
Active Learning Reduce Labeling Effort
Train Classifier
Select an optimal set from
Selection Strategy
Data Pool
With an effective selection strategy, active
learner can obtain comparable accuracy with
supervised learner using much less labeled
data.
Important for multi-label text classification.
Query for true labels
Augment the labeled set
6
Challenges for Multi-Label Active Learning

How to select the most informative multi-labeled
data?
Use selection strategy for single-label case?
No
E.g.

x1
0.8
0.5
0.1
x2 is more informative?
C3
C1
C2
What about x1 actually has two labels?
0.6
0.1
0.1
x2
C3
C1
C2
7
Related Work

Single-label Active learning
Uncertainty sampling SIGIR94, JMLR05
Aims to label the most uncertain data
Expected-error reduction NIPS95, ICML01,
ICCV03
Labels data to minimize the expected error
Committee-based COLT92, JMLR02
Labels data which has the largest disagreement
among several committee members (classifiers)
from the version space
Multi-label active learning
BinMin Springer06
Minimizes the loss on the most uncertain category
for each data
MML ICIP04
Optimize the mean of the SVM hinge loss for the
predicted classes
Two-dimensional active learning ICCV08,
TPAMI08
Minimize the classification error on
picture-label pairs

8
Our approach SVM-Based Active Learning for
Multi-Label Text Classification

Optimization goal
Maximize the reduction of the expected model loss

if x belongs to category , ,
otherwise, .
9
Sample Selection Strategy with SVM

Two main issues
How to measure the loss reduction of the
multi-label classifier?
How to provide a good probability estimation for
the conditional probability?

probability estimation
loss reduction
10
Estimation of Loss Reduction

Decompose the multi-label problem into several
binary classifiers
For each binary classifier, the model loss is
measured by the size of the version space.
SVM version space S. Tong 02
is the parameter space. The size of a
version space is defined as the surface area of
the hypersphere in .

11
Estimation of Loss Reduction (Cont.)

With version space duality, the loss reduction
rate can be approximated by using the SVM output
margin
Maximize the sum of loss reduction rate for all
binary classifiers

loss of binary classifier built on ,
associated with class
If f correctly predict x, then f(x) ,
uncertainty If f does not correctly predict x,
Then f(x) , uncertainty
size of the version space for classifier
if x belongs to class i, then ,
otherwise
12
Probability Estimation

Intractable to directly compute the expected loss
function
Limited training data
Large number of possible label vectors
for each x
Approximate by the loss function with the largest
conditional probability .

the label vector with the largest conditional
probability
13
How to predict ?

Main ideas
First build a classification model to predict the
possible label number each data may have.
Then determine the label vector based on the
prediction result.

14
How to predict ? (Cont.)
For each x, sort the probabilities in decreased
order and normalized to make their sum equals 1.
Assign probability output for each class
Train Logistic Regression Model
For each unlabeled data, predict the
probabilities of having different number of
labels.
Features Label the true label number of x
If the label number with the largest probability
is j, then
15
Experiments