Text Categorization: Support Vector Machines - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Text Categorization: Support Vector Machines

Description:

Separating hyperplane is described by a normal vector w and a translation ... For SVMs with a hyperplane passing trough the origin and without soft margin it ... – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 26

Provided by: remo9

Category:

more less

Transcript and Presenter's Notes

Title: Text Categorization: Support Vector Machines

1
Text CategorizationSupport Vector Machines

Remo Frey 2007

2
(No Transcript)
3
Modulation ofText Categorization (1)

Each text is converted into a vector xi. A
component of xi describes the frequency of a
certain word in this text.
We take d words in our dictionary. This are all
words, which we want to consider in our problem.
We call them features. They build together the
feature space c. Thus xi Î feature space c Î Âd
We have a predefined set of categories
Category1,...,Categoryk,
Label yi is the category of xi. yi Î
Category1,,Categoryk

4
Modulation ofText Categorization (2)

Training data are (x1,y1),(x2,y2),,(xn,yn).
A classifier is a function, which maps a text to
a category y c(x) c -gt Category1,,Categoryk
The text, which we want to classifier is xn1.
The Categorization Problem is the followingWhat
is yn1 Î Category1,,Categoryk for xn1 Î c?

5
Example
Sport Politics Music
x20 (2,0,0)T
y20 c(x20) ?
Âd
c
Bush
Beatles
x
Euro08
y20 ?
6
Text Categorization

High-Dimensional Feature Space c
Sparse Text Vector xi
Few irrelevant Words
Stopwords

7
Support Vector Machine (SVM)
Sport Politics
Support Vectors
m1
m2
Feature 2
Feature 2
x
Separating Hyperplane
Feature 1
Feature 1
8
Nonlinear Dividing Line
Kernel function
?
F Â2 ? Â3 (F1,F2) ? (Z1,Z2,Z3)
(F12,v2F1F2,F22)
9
Soft Margin
Outlier
10
More than two Categories
yn1 ?
x
11
Mathematical Formulation

Training data(x1,y1),(x2,y2),,(xn,yn)
Separating hyperplane is described by a normal
vector w and a translation parameter b. So it
holds wTx b 0 for each point on the plane
For Support Vectors (on dashed lines) holds wTxi
b m
Label yiyi Category1 ( ) if wTxi b
myi Category2 ( ) if wTxi b m
Classifier cyi1 c(xi1) sgn(wTxi1 b)

m
Feature 2
w
Feature 1
12
Learning Problem
m

Find w (w normalized w 1),
such that the margin m is maximized
Maximize m
(m geometric Margin, see Figure)
Subject to "xi Îc yi(wTxi b) m

Feature 2
w
Feature 1
13
Alternative Formulationwithout m!

Rescaling w w/m, b b/m
Þ m2 1/w2 1/(2½wTw)
(without derivation!)
Minimize w for a given margin m 1
(m functional margin)
Þ Minimize ½wTw
Subject to "xi Îc yi(wTxi b) 1
Þ Generalized Lagrange Function
L(w,b,a) ½wTw aiyi(wTxi b) 1
Þ Find saddle point
Minimize w and b
Maximize the ai

14
Solution

Solving this optimization problem analytically
leads us to the decision function ( classifier
c) of our text classification Problem
yi1 c(xi1) sgn(wTxi1 b) sgn(
aiyixiTxi1 b)
(difficult derivation, see handout)

15
Soft Margin

Introduce a cost function!
Minimize ½wTw C ?i
Subject to
"xi Îc yi(wTxi b) 1 ?i
"xi Îc ?i gt 0
Cost parameter C

Feature 2
?2
?1
Feature 1
16
Quality Measure

How good is the classifier, which we trained
previously?
Find a lower boundfor the margin m!

17
Text Categorization Example

We have 3 predefined categories Music ,
Politics , Sport
Training data 100 Documents per category. Each
document consists of exactly 150 words.
Feature space We choose 20,000 words into
dictionary, so the feature space c has a
dimension of 20,000. We assume that each word in
training documents is in dictionary.
We use one against many
Sport against Ø Sport
(Ø Sport Music È Politics )

18
Odds Ratio
Examples

Þ Odds Ratio of ball is

Þ Odds Ratio of Iraq is
19
Sorting Features

An Odds Ratio of
1 means, that the feature fit for Sport as well
as for Ø Sport. Such a feature do not carry
information. E.g. stopwords
gt 1 means, that the feature helps to identify the
category Sport.
lt 1 means, that the feature doesnt belong to
Sport possibly.

20
Example
TCat(p1n1 f1,,psnsfs)-concept

TCatSport( 5842105, stopwords
26896,1127158, high freq.
143864, 6271602, medium freq.
412108,2106231 low freq.
29328836 irrelevant
)

Þ Subsets instead of words! Easier to find a
lower bound!
21
Find a Lower Bound (1)

Define p (p1,,ps)T, n (n1,,ns)T, F
diag(f1,,fs)
For SVMs with a hyperplane passing trough the
origin and without soft margin it holds the
following optimization problem (see section 3.2)
W(w) min(½wTw), s.t. "xi Îc yi(wTxi) 1
It holds
m2 1/w2 1/(2½wTw) 1/(2W(w)) for
the solution vector w
Simplification of optimization problem
Let us add the constraint that within each group
of fi features the weights are required
identical. Then wTw vTFv, v Î Âs.
By definition, each example contains a certain
number of features from each group. This means
that all constraints for positive examples are
equivalent to pTv 1 and nTv 1.
Þ V(v) min(½vTFv), s.t. pTv 1, nTv 1
v is the solution vector. So we get a lower
bound V(v) W(w) Þ m2 1/(2V(v))

22
Find a Lower Bound (2)

Introducing and solving Lagrange multiplayers
L(v, a, a) ½vTFv a(vTp 1) a(vTn
1), a 0, a 0
Û v F1(ap an)
For ease of notation we write
v F1XYa, with X (p, n), Y diag(1, 1), aT
(a, a)
Þ L(a) 1Ta ½aTYXTF1XYa
Maximize L(a), s.t. a 0, a 0
Since only a lower bound on the margin is
needed, it is possible to drop the constraints a
0 and a 0, because removing this constraints
can only increase the objective function at the
solution. So the unconstrainted maximum L(a) is
greater or equal to L(a).
Û a (YXTF1XY)11
Þ L(a) ½1T(YXTF1XY)11 ?
The special form of (YXTF1XY) makes it possible
to compute its inverse in closed form
Substituting it into ?.

23
Lower Bound for the Margin m

For TCat(p1n1 f1,,psnsfs)-concepts ,
there is always a hyperplane passing trough the
origin that has a margin m bounded by

m
m
24
Our Example
TCatSport( 5842105, stopwords 26896,
1127158, high freq. 143864,
6271602, medium freq. 412108,210623
1 low freq. 29328836 irrelevant
)
m
m