The Cooccurrence Retrieval Framework applied to Text Classification

About This Presentation

Title:

The Cooccurrence Retrieval Framework applied to Text Classification

Description:

The Co-occurrence Retrieval Framework. applied to Text ... Newswire (Business news articles) Choosing the optimal. Weight function. Extent function ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 28

Provided by: cogsS

Category:

more less

Transcript and Presenter's Notes

Title: The Cooccurrence Retrieval Framework applied to Text Classification

1
The Co-occurrence Retrieval Frameworkapplied to
Text Classification

Jonathon Read
j.l.read_at_sussex.ac.uk
http//www.sussex.ac.uk/Users/jlr24

2
Outline

Introduction
Feature Retrieval for Text Classification
Model optimisation
Some initial results
Some observations
Future work

3
Introduction

Co-occurrence Retrieval Framework
Weeds 2003
Weeds and Weir 2003
Measuring the distributional similarity of words
using co-occurrence information

4
Introduction

Measuring the similarity of feature vectors, w1
and w2
by analogy with Information Retrieval
w1 retrieved features
w2 desired features
Precision the proportion of features correctly
retrieved
Recall proportion of desired features that have
been retrieved

5
Introduction

Similarity metric for feature vectors
Documents can be represented as feature vectors
can Co-occurrence Retrieval be used to measure
similarity of documents?
Test task Sentiment Classification

6
Feature Retrieval forText Classification

A subset, s, is the unification of n texts
A text, t, is a vector of features, f, each with
an associated weight, D( t, f )

7
Feature Retrieval forText Classification

Subsets and texts are text units referred to
using a polymorphic term, u
SF is the set of features that are shared by two
units of text

8
Feature Retrieval forText Classification

The Precision of u1s retrieval of u2s features
is the proportion of u1s features that appear in
both units, weighted by their importance in u1

9
Feature Retrieval forText Classification

The Recall of u1s retrieval of u2s features is
the proportion of u2s features that appear in
both units, weighted by their importance in u2

10
Feature Retrieval forText Classification

The measures of precision and recall may be
combined by weighting the harmonic and arithmetic
means (using some constants ? and ?)

11
Feature Retrieval forText Classification

Given a corpus C
we can say that a problem text, t, is predicted
to be a member of the subset, s, that has the
highest similarity score

12
Feature Retrieval forText Classification

Additive models make no distinction about the
extent of feature occurrence with respect to each
unit
Extent can be measured in terms of precision and
recall of individual features

13
Feature Retrieval forText Classification

Weight functions
Determine the importance of each feature in a
given unit of text

14
Feature Retrieval forText Classification

Extent functions
Determine the extent to which a feature goes with
a unit of text

15
Model Optimisation

Sentiment datasets
Polarity 1.0 (Movie Reviews before 2002)
Polarity 2004 (Movie Reviews after 2002)
Newswire (Business news articles)
Choosing the optimal
Weight function
Extent function
? and ? parameters

16
Model Optimisation
Optimal parameters for datasets
17
Some initial results
Five-fold cross-validated accuracies of
classifiers on datasets, in percent with standard
deviations
18
Some observations
Optimising ? and ? using Polarity 1.0, Dz Ewmi
19
Some observations
Optimising ? and ? using Polarity 2004, Dz Ewmi
20
Some observations
Optimising ? and ? using Newswire, Dwmi Et
21
Some observations
Optimising ? and ? using Newswire, Dz Ewmi
22
Some observations

Interpreting the Information Retrieval metaphor
for Text Classification
Precision measures the similarity using the
features observed in the problem text
Recall measures the similarity using the
features absent in the problem text

23
Some observations

The optimised ? indicates the relative importance
of Precision or Recall in a set
? ?

24
Some observations
? 0.43
Recall
Precision
Optimising ? using Polarity 1.0, Dwmi Et ? 0
25
Some observations
? 0.26
Recall
Precision
Optimising ? using Polarity 2004, Dwmi Et ?
0
26
Some observations

Feature Retrieval is different from other models
as it considers both the presence and absence of
features
but this is also a drawback significantly
greater computational expense!

27
Future work