Learning%20with%20Positive%20and%20Unlabeled%20Examples%20using%20Weighted%20Logistic%20Regression

About This Presentation

Title:

Learning%20with%20Positive%20and%20Unlabeled%20Examples%20using%20Weighted%20Logistic%20Regression

Description:

Add c times sum of squared values of weights to cost function as regularization ... Selecting Regularization Parameter ... tuning regularization parameter c ... –

Number of Views:172

Avg rating:3.0/5.0

Slides: 16

Provided by: leewe8

Category:

more less

Transcript and Presenter's Notes

Title: Learning%20with%20Positive%20and%20Unlabeled%20Examples%20using%20Weighted%20Logistic%20Regression

1
Learning with Positive and Unlabeled Examples
using Weighted Logistic Regression

Wee Sun Lee
National University of Singapore
Bing Liu
University of Illinois, Chicago

2
Personalized Web Browser

Learn web pages that are of interest to you!
Information that is available to browser when it
is installed
Your bookmark (or cached documents) Positive
examples
All documents in the web Unlabeled examples!!

3
Direct Marketing

Company has database with details of its customer
positive examples
Want to find people who are similar to their own
customer
Buy a database consisting of details of people,
some of whom may be potential customers
unlabeled examples.

4
Assumptions

All examples are drawn independently from a fixed
underlying distribution
Negative examples are never labeled
With fixed probability ?, positive example is
independently left unlabeled.

5
Are Unlabeled Examples Helpful?

Function known to be either x1 lt 0 or x2 gt 0
Which one is it?

x1 lt 0
Not learnable with only positiveexamples.
However, addition ofunlabeled examples makes it
learnable.
x2 gt 0
6
Related Works

Denis (1998) showed that function classes
learnable in the statistical query model is
learnable from positive and unlabeled examples.
Muggleton (2001) showed that learning from
positive examples is possible if the distribution
of inputs is known.
Liu et.al. (2002) give sample complexity bounds
and an algorithm based on EM
Yu et.al. (2002) gives algorithm based on SVM

7
Approach

Label all unlabeled examples as negative (Denis
1998)
Negative examples are always labeled negative
Positive examples are labeled negative with
probability ?
Training with one-sided noise
Problem ? is not known
Also, what if there is some noise on the negative
examples? Negative examples occasionally labeled
positive with small probability.

8
Selecting Threshold and Robustness to Noise

Approach Reweigh examples and learn conditional
probability P(Y1X)
If you weight the examples by
Multiplying the negative examples with weight
equal to the number of positive examples and
Multiplying the positive examples with weight
equal to the number of negative examples

9
Selecting Threshold and Robustness to Noise

Then P(Y1X) gt 0.5 when X is a positive example
and P(Y1X) lt 0.5 when X is a negative example,
as long as
?? lt 1 where
? is probability that positive example is labeled
negative
? is probability that negative example is labeled
positive
Okay, even if some of the positive examples are
not actually positive (noise).

10
Weighted Logistic Regression

Practical algorithm Reweigh the examples and
then do logistic regression with linear function
to learn P(Y1X).
Compose linear function with sigmoid then do
maximum likelihood estimation
Convex optimization problem
Will learn the correct conditional probability if
it can be represented
Minimize upper bound to weighted classification
error if cannot be represented still makes
sense.

11
Selecting Regularization Parameter

Regularization important when learning with noise
Add c times sum of squared values of weights to
cost function as regularization
How to choose the value of c?
When both positive and negative examples
available, can use validation set to choose c.
Can use weighted examples in a validation set to
choose c, but not sure if this makes sense?

12
Selecting Regularization Parameter

Performance criteria pr/P(Y1) can be estimated
directly from validation set as r2/P(f(X) 1)
Recall r P(f(X) 1 Y 1)
Precision p P(Y 1 f(X) 1)
Can use for
tuning regularization parameter c
also to compare different algorithms when only
positive and unlabeled examples (no negative)
available
Behavior similar to commonly used F-score F
2pr/(pr)
Reasonable when use of F-score reasonable

13
Experimental Setup

20 Newsgroup dataset
1 group positive, 19 others negative
Term frequency as features, normalized to length
1
Randomly split
50 train
20 validation
30 test
Validation set used to select regularization
parameter from small discrete set then retrain on
trainingvalidation set

14
Results
F-score averaged over 20 groups
? Opt pr/P(Y1) Weighted Error S-EM 1-Cls SVM
0.3 0.757 0.754 0.646 0.661 0.15
0.7 0.675 0.659 0.619 0.59 0.153
15
Conclusions

Learning from positive and unlabeled examples by
learning P(Y1X) after setting all unlabeled
examples negative.
Reweighing examples allows threshold at 0.5 and
makes it tolerant to negative examples that are
misclassified as positive
Performance measure pr/P(Y1) can be estimated
from data
Useful when F-score is reasonable
Can be used to select regularization parameter
Logistic regression using linear regression and
these methods works well on text classification