Text Categorization: Support Vector Machines - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Text Categorization: Support Vector Machines

Description:

Separating hyperplane is described by a normal vector w and a translation ... For SVMs with a hyperplane passing trough the origin and without soft margin it ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 26
Provided by: remo9
Category:

less

Transcript and Presenter's Notes

Title: Text Categorization: Support Vector Machines


1
Text CategorizationSupport Vector Machines
  • Remo Frey 2007

2
(No Transcript)
3
Modulation ofText Categorization (1)
  • Each text is converted into a vector xi. A
    component of xi describes the frequency of a
    certain word in this text.
  • We take d words in our dictionary. This are all
    words, which we want to consider in our problem.
    We call them features. They build together the
    feature space c. Thus xi Î feature space c Î Âd
  • We have a predefined set of categories
    Category1,...,Categoryk,
  • Label yi is the category of xi. yi ÃŽ
    Category1,,Categoryk

4
Modulation ofText Categorization (2)
  • Training data are (x1,y1),(x2,y2),,(xn,yn).
  • A classifier is a function, which maps a text to
    a category y c(x) c -gt Category1,,Categoryk
  • The text, which we want to classifier is xn1.
  • The Categorization Problem is the followingWhat
    is yn1 ÃŽ Category1,,Categoryk for xn1 ÃŽ c?

5
Example
Sport Politics Music
x20 (2,0,0)T
y20 c(x20) ?
Âd
c
Bush
Beatles
x
Euro08
y20 ?
6
Text Categorization
  • High-Dimensional Feature Space c
  • Sparse Text Vector xi
  • Few irrelevant Words
  • Stopwords

7
Support Vector Machine (SVM)
Sport Politics
Support Vectors
m1
m2
Feature 2
Feature 2
x
Separating Hyperplane
Feature 1
Feature 1
8
Nonlinear Dividing Line
Kernel function
?
F Â2 ? Â3 (F1,F2) ? (Z1,Z2,Z3)
(F12,v2F1F2,F22)
9
Soft Margin
Outlier
10
More than two Categories
yn1 ?
x
11
Mathematical Formulation
  • Training data(x1,y1),(x2,y2),,(xn,yn)
  • Separating hyperplane is described by a normal
    vector w and a translation parameter b. So it
    holds wTx b 0 for each point on the plane
  • For Support Vectors (on dashed lines) holds wTxi
    b m
  • Label yiyi Category1 ( ) if wTxi b
    myi Category2 ( ) if wTxi b m
  • Classifier cyi1 c(xi1) sgn(wTxi1 b)

m
Feature 2
w
Feature 1
12
Learning Problem
m
  • Find w (w normalized w 1),
  • such that the margin m is maximized
  • Maximize m
  • (m geometric Margin, see Figure)
  • Subject to "xi ÃŽc yi(wTxi b) m

Feature 2
w
Feature 1
13
Alternative Formulationwithout m!
  • Rescaling w w/m, b b/m
  • Þ m2 1/w2 1/(2½wTw)
  • (without derivation!)
  • Minimize w for a given margin m 1
  • (m functional margin)
  • Þ Minimize ½wTw
  • Subject to "xi ÃŽc yi(wTxi b) 1
  • Þ Generalized Lagrange Function
  • L(w,b,a) ½wTw aiyi(wTxi b) 1
  • Þ Find saddle point
  • Minimize w and b
  • Maximize the ai

14
Solution
  • Solving this optimization problem analytically
    leads us to the decision function ( classifier
    c) of our text classification Problem
  • yi1 c(xi1) sgn(wTxi1 b) sgn(
    aiyixiTxi1 b)
  • (difficult derivation, see handout)

15
Soft Margin
  • Introduce a cost function!
  • Minimize ½wTw C ?i
  • Subject to
  • "xi ÃŽc yi(wTxi b) 1 ?i
  • "xi ÃŽc ?i gt 0
  • Cost parameter C

Feature 2
?2
?1
Feature 1
16
Quality Measure
  • How good is the classifier, which we trained
    previously?
  • Find a lower boundfor the margin m!

17
Text Categorization Example
  • We have 3 predefined categories Music ,
    Politics , Sport
  • Training data 100 Documents per category. Each
    document consists of exactly 150 words.
  • Feature space We choose 20,000 words into
    dictionary, so the feature space c has a
    dimension of 20,000. We assume that each word in
    training documents is in dictionary.
  • We use one against many
  • Sport against Ø Sport
  • (Ø Sport Music È Politics )

18
Odds Ratio
Examples
  • Þ Odds Ratio of ball is

Þ Odds Ratio of Iraq is
19
Sorting Features
  • An Odds Ratio of
  • 1 means, that the feature fit for Sport as well
    as for Ø Sport. Such a feature do not carry
    information. E.g. stopwords
  • gt 1 means, that the feature helps to identify the
    category Sport.
  • lt 1 means, that the feature doesnt belong to
    Sport possibly.

20
Example
TCat(p1n1 f1,,psnsfs)-concept
  • TCatSport( 5842105, stopwords
  • 26896,1127158, high freq.
  • 143864, 6271602, medium freq.
  • 412108,2106231 low freq.
  • 29328836 irrelevant
  • )

Þ Subsets instead of words! Easier to find a
lower bound!
21
Find a Lower Bound (1)
  • Define p (p1,,ps)T, n (n1,,ns)T, F
    diag(f1,,fs)
  • For SVMs with a hyperplane passing trough the
    origin and without soft margin it holds the
    following optimization problem (see section 3.2)
  • W(w) min(½wTw), s.t. "xi ÃŽc yi(wTxi) 1
  • It holds
  • m2 1/w2 1/(2½wTw) 1/(2W(w)) for
    the solution vector w
  • Simplification of optimization problem
  • Let us add the constraint that within each group
    of fi features the weights are required
    identical. Then wTw vTFv, v Î Âs.
  • By definition, each example contains a certain
    number of features from each group. This means
    that all constraints for positive examples are
    equivalent to pTv 1 and nTv 1.
  • Þ V(v) min(½vTFv), s.t. pTv 1, nTv 1
  • v is the solution vector. So we get a lower
    bound V(v) W(w) Þ m2 1/(2V(v))

22
Find a Lower Bound (2)
  • Introducing and solving Lagrange multiplayers
  • L(v, a, a) ½vTFv a(vTp 1) a(vTn
    1), a 0, a 0
  • Û v F1(ap an)
  • For ease of notation we write
  • v F1XYa, with X (p, n), Y diag(1, 1), aT
    (a, a)
  • Þ L(a) 1Ta ½aTYXTF1XYa
  • Maximize L(a), s.t. a 0, a 0
  • Since only a lower bound on the margin is
    needed, it is possible to drop the constraints a
    0 and a 0, because removing this constraints
    can only increase the objective function at the
    solution. So the unconstrainted maximum L(a) is
    greater or equal to L(a).
  • Û a (YXTF1XY)11
  • Þ L(a) ½1T(YXTF1XY)11 ?
  • The special form of (YXTF1XY) makes it possible
    to compute its inverse in closed form
  • Substituting it into ?.

23
Lower Bound for the Margin m
  • For TCat(p1n1 f1,,psnsfs)-concepts ,
    there is always a hyperplane passing trough the
    origin that has a margin m bounded by

m
m
24
Our Example
TCatSport( 5842105, stopwords 26896,
1127158, high freq. 143864,
6271602, medium freq. 412108,210623
1 low freq. 29328836 irrelevant
)
m
m
  • a 582/105 262/96 112/158 142/864
    62/1602 42/2108 22/6231 292/8836 40.20
  • b 5842/105 268/96 1127/158 143/864
    627/1602 41/2108 210/6231 2932/8836
    27.51
  • c 422/105 82/96 272/158 32/864 272/1602
    12/2108 102/6231 322/8836 22.68
  • m2 (40.222.7 27.52) / (40.2 227.5 22.7)
    1.32
  • Þ The lower bound is m 1.15!

25
Questions Remarks
  • ???

! ! !
Write a Comment
User Comments (0)
About PowerShow.com