PEBL: Web Page Classification without Negative Examples - PowerPoint PPT Presentation

About This Presentation
Title:

PEBL: Web Page Classification without Negative Examples

Description:

PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No ... – PowerPoint PPT presentation

Number of Views:185
Avg rating:3.0/5.0
Slides: 16
Provided by: chw8
Learn more at: https://sites.pitt.edu
Category:

less

Transcript and Presenter's Notes

Title: PEBL: Web Page Classification without Negative Examples


1
PEBL Web Page Classification without Negative
Examples
  • Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan
  • IEEE Transactions on Knowledge and Data
    Engineering, Vol. 16, No. 1, 2004
  • Presented by Chirayu Wongchokprasitti

2
Introduction
  • Web page classification is one of the main
    techniques for Web mining
  • Constructing a classifier requires positive and
    negative training examples
  • Cautious to avoid bias and laborious to collect
    negative training examples

3
Typical Learning Framework
4
Positive Example Base Learning (PEBL) Framework
  • Learn from positive data and unlabeled data
  • Unlabeled data indicates random samples of the
    universal set
  • Apply the Mapping-Convergence (M-C) Algorithm

5
Mapping-Convergence (M-C) Algorithm
  • Divide into 2 stages
  • Mapping stage
  • Use any classifier that does not generate false
    negatives
  • They chose 1-DNF ( monotone Disjunctive Normal
    Form)
  • Convergence stage
  • For maximizing margin
  • They chose SVM (Support Vector Machine)

6
Mapping Stage
  • Use a weak classifier to draw an initial
    approximation of strong negative data.
  • First, Identify strong positive features from
    positive and unlabeled data by checking the
    frequency of those features.
  • If feature frequency in positive data is larger
    than one in the universal data, it is a strong
    positive
  • Filter out any possible positive, leaving only
    strong negatives.

7
Convergence Stage
  • Use SVM to scope down the class boundary
  • Iterate SVM for certain times to extract negative
    data from unlabeled data
  • The boundary will converge into the true boundary.

8
Support Vector Machines
Visualization of a Support Vector Machine
9
Convergence of SVM
10
Data Flow Diagram
11
Experimental Results
  • Report the result with precision-recall breakeven
    point (P-R)
  • Experiment 1 the Internet
  • Use DMOZ as the universal set
  • Experiment 2 University CS department
  • Use WebKB data set
  • Mixture Models

12
Experiment 1
13
Experiment 2
14
Mixture Models
15
Summary and Conclusions
  • PEBL framework eliminates the need for manually
    collecting negative training examples
  • The Mapping-Convergence (M-C) algorithm achieves
    classification accuracy as high as that of
    traditional SVM
  • PEBL needs faster training time
Write a Comment
User Comments (0)
About PowerShow.com