Learning%20with%20Positive%20and%20Unlabeled%20Examples%20using%20Weighted%20Logistic%20Regression

About This Presentation
Title:

Learning%20with%20Positive%20and%20Unlabeled%20Examples%20using%20Weighted%20Logistic%20Regression

Description:

Add c times sum of squared values of weights to cost function as regularization ... Selecting Regularization Parameter ... tuning regularization parameter c ... –

Number of Views:172
Avg rating:3.0/5.0
Slides: 16
Provided by: leewe8
Category:

less

Transcript and Presenter's Notes

Title: Learning%20with%20Positive%20and%20Unlabeled%20Examples%20using%20Weighted%20Logistic%20Regression


1
Learning with Positive and Unlabeled Examples
using Weighted Logistic Regression
  • Wee Sun Lee
  • National University of Singapore
  • Bing Liu
  • University of Illinois, Chicago

2
Personalized Web Browser
  • Learn web pages that are of interest to you!
  • Information that is available to browser when it
    is installed
  • Your bookmark (or cached documents) Positive
    examples
  • All documents in the web Unlabeled examples!!

3
Direct Marketing
  • Company has database with details of its customer
    positive examples
  • Want to find people who are similar to their own
    customer
  • Buy a database consisting of details of people,
    some of whom may be potential customers
    unlabeled examples.

4
Assumptions
  • All examples are drawn independently from a fixed
    underlying distribution
  • Negative examples are never labeled
  • With fixed probability ?, positive example is
    independently left unlabeled.

5
Are Unlabeled Examples Helpful?
  • Function known to be either x1 lt 0 or x2 gt 0
  • Which one is it?

x1 lt 0
Not learnable with only positiveexamples.
However, addition ofunlabeled examples makes it
learnable.
x2 gt 0
6
Related Works
  • Denis (1998) showed that function classes
    learnable in the statistical query model is
    learnable from positive and unlabeled examples.
  • Muggleton (2001) showed that learning from
    positive examples is possible if the distribution
    of inputs is known.
  • Liu et.al. (2002) give sample complexity bounds
    and an algorithm based on EM
  • Yu et.al. (2002) gives algorithm based on SVM

7
Approach
  • Label all unlabeled examples as negative (Denis
    1998)
  • Negative examples are always labeled negative
  • Positive examples are labeled negative with
    probability ?
  • Training with one-sided noise
  • Problem ? is not known
  • Also, what if there is some noise on the negative
    examples? Negative examples occasionally labeled
    positive with small probability.

8
Selecting Threshold and Robustness to Noise
  • Approach Reweigh examples and learn conditional
    probability P(Y1X)
  • If you weight the examples by
  • Multiplying the negative examples with weight
    equal to the number of positive examples and
  • Multiplying the positive examples with weight
    equal to the number of negative examples

9
Selecting Threshold and Robustness to Noise
  • Then P(Y1X) gt 0.5 when X is a positive example
    and P(Y1X) lt 0.5 when X is a negative example,
    as long as
  • ?? lt 1 where
  • ? is probability that positive example is labeled
    negative
  • ? is probability that negative example is labeled
    positive
  • Okay, even if some of the positive examples are
    not actually positive (noise).

10
Weighted Logistic Regression
  • Practical algorithm Reweigh the examples and
    then do logistic regression with linear function
    to learn P(Y1X).
  • Compose linear function with sigmoid then do
    maximum likelihood estimation
  • Convex optimization problem
  • Will learn the correct conditional probability if
    it can be represented
  • Minimize upper bound to weighted classification
    error if cannot be represented still makes
    sense.

11
Selecting Regularization Parameter
  • Regularization important when learning with noise
  • Add c times sum of squared values of weights to
    cost function as regularization
  • How to choose the value of c?
  • When both positive and negative examples
    available, can use validation set to choose c.
  • Can use weighted examples in a validation set to
    choose c, but not sure if this makes sense?

12
Selecting Regularization Parameter
  • Performance criteria pr/P(Y1) can be estimated
    directly from validation set as r2/P(f(X) 1)
  • Recall r P(f(X) 1 Y 1)
  • Precision p P(Y 1 f(X) 1)
  • Can use for
  • tuning regularization parameter c
  • also to compare different algorithms when only
    positive and unlabeled examples (no negative)
    available
  • Behavior similar to commonly used F-score F
    2pr/(pr)
  • Reasonable when use of F-score reasonable

13
Experimental Setup
  • 20 Newsgroup dataset
  • 1 group positive, 19 others negative
  • Term frequency as features, normalized to length
    1
  • Randomly split
  • 50 train
  • 20 validation
  • 30 test
  • Validation set used to select regularization
    parameter from small discrete set then retrain on
    trainingvalidation set

14
Results
F-score averaged over 20 groups
? Opt pr/P(Y1) Weighted Error S-EM 1-Cls SVM
0.3 0.757 0.754 0.646 0.661 0.15
0.7 0.675 0.659 0.619 0.59 0.153
15
Conclusions
  • Learning from positive and unlabeled examples by
    learning P(Y1X) after setting all unlabeled
    examples negative.
  • Reweighing examples allows threshold at 0.5 and
    makes it tolerant to negative examples that are
    misclassified as positive
  • Performance measure pr/P(Y1) can be estimated
    from data
  • Useful when F-score is reasonable
  • Can be used to select regularization parameter
  • Logistic regression using linear regression and
    these methods works well on text classification
Write a Comment
User Comments (0)
About PowerShow.com