WWW2005 Poster - PowerPoint PPT Presentation

About This Presentation
Title:

WWW2005 Poster

Description:

sources, such as the text, the title, the meta data, the anchor. text, etc. ... AT: Anchor Text. LT: Link Text. MT: Meta Data. TI: Title. PT: Plain Text ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 2
Provided by: wanyeu
Category:
Tags: anchor | poster | www2005

less

Transcript and Presenter's Notes

Title: WWW2005 Poster


1
The Chinese University of Hong Kong
Web Page Classification with Heterogeneous Data
Fusion
Zenglin Xu, Irwin King and Michael R.
Lyu Department of Computer Science and
Engineering The Chinese University of Hong
Kong zlxu, king, lyu_at_cse.cuhk.edu.hk
Motivations
1
2
Contributions
  • For web page classification, there are many
    available data
  • sources, such as the text, the title, the meta
    data, the anchor
  • text, etc.
  • Simply putting them together would not greatly
    enhance the
  • classification performance.
  • Different dimensions and types of data sources
    can be
  • represented into a common format of kernel
    matrix.
  • A kernel learning approach is thus proposed to
    integrate
  • multiple data sources
  • A systematic way of integrating multiple
  • data sources.
  • Better classification accuracy.

Architacture Model
3
  • 1. Feature Extraction.
  • 2. Similarity Representation. Each data source
    is represented as a kernel matrix (Ki)
  • 3. Similarity Combination.
  • 4. Classification.
  • Substitute K into the dual SVM
  • We have the following QCQP problem

Experiment results
4
  • Dataset DMOZ
  • AT Anchor Text
  • LT Link Text
  • MT Meta Data
  • TI Title
  • PT Plain Text
  • UW Universally Weighted sources
  • KC sources by Kernel Combination
  • Mi -F1 Micro-F1
  • Ma-F1 Macro-F1

WWW 2007, May 812, 2007, Banff, Alberta, Canada.
The Chinese University of Hong Kong
Write a Comment
User Comments (0)
About PowerShow.com