Clustering High Dimensional Data Using SVM - PowerPoint PPT Presentation

About This Presentation
Title:

Clustering High Dimensional Data Using SVM

Description:

Overview Introduction Support Vector Machine (SVM) What is SVM How SVM Works Data Preparation Using SVD Singular Value Decomposition (SVD) ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 43
Provided by: Tam5169
Learn more at: https://csc.csudh.edu
Category:

less

Transcript and Presenter's Notes

Title: Clustering High Dimensional Data Using SVM


1
Clustering High Dimensional Data Using SVM
  • Tsau Young Lin and Tam Ngo
  • Department of Computer Science
  • San José State University

2
Overview
  • Introduction
  • Support Vector Machine (SVM)
  • What is SVM
  • How SVM Works
  • Data Preparation Using SVD
  • Singular Value Decomposition (SVD)
  • Analysis of SVD
  • The Project
  • Conceptual Exploration
  • Result Analysis
  • Conclusion
  • Future Work

3
Introduction
  • World Wide Web
  • No. 1 place for information
  • contains billions of documents
  • impossible to classify by humans
  • Projects Goals
  • Cluster documents
  • Reduce documents size
  • Get reasonable results when compared to humans
    classification

4
Support Vector Machine (SVM)
  • a supervised learning machine
  • outperforms many popular methods for text
    classification
  • used for bioinformatics, signature/hand writing
    recognition, image and text classification,
    pattern recognition, and e-mail spam
    categorization

5
Motivation for SVM
  • How do we separate these points?
  • with a hyperplane

Source Authors Research
6
SVM Process Flow
Feature Space
Input Space
Input Space
Source DTREG
7
Convex Hulls
Source Bennett, K. P., Campbell, C., 2000
8
Simple SVM Example
Class X1
1 0
-1 1
-1 2
1 3
  • How would SVM separates these points?
  • use the kernel trick
  • ? F(X1) (X1, X12)
  • It becomes 2-deminsional

Source Authors Research
9
Simple Points in Feature Space
Class X1 X12
1 0 0
-1 1 1
-1 2 4
1 3 9
  • All points here are support vectors.

Source Authors Research
10
SVM Calculation
  • Positive ?w ? x? b 1
  • Negative ?w ? x? b -1
  • Hyperplane ?w ? x? b 0
  • find the unknowns, w and b
  • Expending the equations
  • w1x1 w2x2 b 1
  • w1x1 w2x2 b -1
  • w1x1 w2x2 b 0

11
Use Linear Algebra to Solve w and b
  • w1x1 w2x2 b 1
  • ? w10 w20 b 1
  • ? w13 w29 b 1
  • w1x1 w2x2 b -1
  • ? w11 w21 b -1
  • ? w12 w24 b -1
  • Solution is w1 -3, w2 1, b 1
  • SVM algorithm can find the solution that returns
    a hyperplane with the largest margin

12
Use Solutions to Draw the Planes
Positive Plane ?w ? x? b 1 w1x1 w2x2 b
1 ? -3x1 1x2 1 1 ? x2 3x1
Negative Plane ?w ? x? b -1 w1x1 w2x2 b
-1 ? -3x1 1x2 1 -1 ? x2 -2 3x1
Hyperplane ?w ? x? b 0 w1x1 w2x2 b
0 ? -3x1 1x2 1 0 ? x2 -1 3x1

X1 X2
0 0
1 3
2 6
3 9
X1 X2
0 -2
1 1
2 4
3 7
X1 X2
0 -1
1 2
2 5
3 8
Source Authors Research
13
Simple Data Separated by a Hyperplane
Source Authors Research
14
LIBSVM and Parameter C
  • LIBSVM A Java Library for SVM
  • C is very small SVM only considers about
    maximizing the margin and the points can be on
    the wrong side of the plane.
  • C value is very large SVM will want very small
    slack penalties to make sure that all data points
    in each group are separated correctly.

15
Choosing Parameter C
Source LIBSVM
16
4 Basic Kernel Types
  • LIBSVM has implemented 4 basic kernel types
    linear, polynomial, radial basis function, and
    sigmoid
  • 0 -- linear u'v
  • 1 -- polynomial (gammau'v coef0)degree
  • 2 -- radial basis function exp(-gammau-v2)
  • 3 -- sigmoid tanh(gammau'v coef0)
  • We use radial basis function with large parameter
    C for this project.

17
Data Preparation Using SVD
  • SVM is excellent for text classification, but
    requires labeled documents to use for training
  • Singular Value Decomposition (SVD)
  • separates a matrix into three parts left
    eigenvectors, singular values, and right
    eigenvectors
  • decompose data such as images and text.
  • reduce data size
  • We will use SVD to cluster

18
SVD Example of 4 Documents
  • D1 Shipment of gold damaged in a fire
  • D2 Delivery of silver arrived in a silver truck
  • D3 Shipment of gold arrived in a truck
  • D4 Gold Silver Truck

Source Garcia, E., 2006
19
Matrix A USVT
D1 D2 D3 D4
a 1 1 1 0
arrived 0 1 1 0
damaged 1 0 0 0
delivery 0 1 0 0
fire 1 0 0 0
gold 1 0 1 1
in 1 1 1 0
of 1 1 1 0
shipment 1 0 1 0
silver 0 2 0 1
truck 0 1 1 1
Given a matrix A, we can factor it into three
parts U, S, and VT.
Source Garcia, E., 2006
20
Using JAMA to Decompose Matrix A
  • U
  • 0.3966 -0.1282 -0.2349 0.0941
  • 0.2860 0.1507 -0.0700 0.5212
  • 0.1106 -0.2790 -0.1649 -0.4271
  • 0.1523 0.2650 -0.2984 -0.0565
  • 0.1106 -0.2790 -0.1649 -0.4271
  • 0.3012 -0.2918 0.6468 -0.2252
  • 0.3966 -0.1282 -0.2349 0.0941
  • 0.3966 -0.1282 -0.2349 0.0941
  • 0.2443 -0.3932 0.0635 0.1507
  • 0.3615 0.6315 -0.0134 -0.4890
  • 0.3428 0.2522 0.5134 0.1453
  • S
  • 4.2055 0.0000 0.0000 0.0000
  • 0.0000 2.4155 0.0000 0.0000
  • 0.0000 0.0000 1.4021 0.0000
  • 0.0000 0.0000 0.0000 1.2302

Source JAMA (MathWorks and the National
Institute of Standards and Technology (NIST))
21
Using JAMA to Decompose Matrix A
  • V
  • 0.4652 -0.6738 -0.2312 -0.5254
  • 0.6406 0.6401 -0.4184 -0.0696
  • 0.5622 -0.2760 0.3202 0.7108
  • 0.2391 0.2450 0.8179 -0.4624
  • VT
  • 0.4652 0.6406 0.5622 0.2391
  • -0.6738 0.6401 -0.2760 0.2450
  • -0.2312 -0.4184 0.3202 0.8179
  • -0.5254 -0.0696 0.7108 -0.4624
  • Matrix A can be reconstructed by multiplying
    matrices USVT

Source JAMA
22
Rank 2 Approximation (Reduced U, S, and V
Matrices)
  • U
  • 0.3966 -0.1282
  • 0.2860 0.1507
  • 0.1106 -0.2790
  • 0.1523 0.2650
  • 0.1106 -0.2790
  • 0.3012 -0.2918
  • 0.3966 -0.1282
  • 0.3966 -0.1282
  • 0.2443 -0.3932
  • 0.3615 0.6315
  • 0.3428 0.2522
  • S
  • 4.2055 0.0000
  • 0.0000 2.4155
  • V
  • 0.4652 -0.6738
  • 0.6406 0.6401
  • 0.5622 -0.2760
  • 0.2391 0.2450

23
Use Matrix V to Calculate Cosine Similarities
  • calculate cosine similarities for each document.
  • sim(D, D) (D D) / (D D)
  • example, Calculate for D1
  • sim(D1, D2) (D1 D2) / (D1 D2)
  • sim(D1, D3) (D1 D3) / (D1 D3)
  • sim(D1, D4) (D1 D4) / (D1 D4)

24
Result for Cosine Similarities
  • Example result for D1
  • sim(D1, D2) ((0.4652 0.6406) (-0.6738
    0.6401)) -0.1797
  • ?( (0.4652)2 (-0.6738)2 ) ?( (0.6406)2
    (0.6401) 2 )
  • sim(D1, D3) ((0.4652 0.5622) (-0.6738
    -0.2760)) 0.8727
  • ?( (0.4652)2 (-0.6738)2 ) ?( (0.5622)2
    (-0.2760)2 )
  • sim(D1, D4) ((0.4652 0.2391) (-0.6738
    0.2450)) -0.1921
  • ?( (0.4652)2 (-0.6738)2 ) ?( (0.2391)2
    (0.2450)2 )
  • D3 returns the highest value, pair D1 with D3
  • Do the same for D2, D3, and D4.

25
Result of Simple Data Set
  • D1 3
  • D2 4
  • D3 1
  • D4 2
  • label 1 1 3
  • label 2 2 4
  • label 1
  • D1 Shipment of gold damaged in a fire
  • D3 Shipment of gold arrived in a truck
  • label 2
  • D2 Delivery of silver arrived in a silver truck
  • D4 Gold Silver Truck

26
Check Cluster Using SVM
  • Now we have the label, we can use it to train
    with SVM
  • SVM input format on original data
  • 1 11.00 20.00 31.00 40.00 51.00 61.00
    71.00 81.00 91.00 100.00 110.00
  • 2 11.00 21.00 30.00 41.00 50.00 60.00
    71.00 81.00 90.00 102.00 111.00
  • 1 11.00 21.00 30.00 40.00 50.00 61.00
    71.00 81.00 91.00 100.00 111.00
  • 2 10.00 20.00 30.00 40.00 50.00 61.00
    70.00 80.00 90.00 101.00 111.00

27
Results from SVMs Prediction
Results from SVMs Prediction on Original Data
Documents use for Training Predict the Following Document SVM Prediction Result SVD Cluster Result
D1, D2, D3 D4 1.0 2
D1, D2, D4 D3 1.0 1
D1, D3, D4 D2 2.0 2
D2, D3, D4 D1 1.0 1
Source Authors Research
28
Using Truncated V Matrix
  • We want to reduce data size, more practical to
    use truncated V matrix
  • SVM input format (truncated V matrix)
  • 1 10.4652 2-0.6738
  • 2 10.6406 20.6401
  • 1 10.5622 2-0.2760
  • 2 10.2391 20.2450

29
SVM Result From Truncated V Matrix
Results from SVMs Prediction on Reduced Data
Documents use for Training Predict the Following Document SVM Prediction Result SVD Cluster Result
D1, D2, D3 D4 2.0 2
D1, D2, D4 D3 1.0 1
D1, D3, D4 D2 2.0 2
D2, D3, D4 D1 1.0 1
Using truncated V matrix gives better results.
Source Authors Research
30
Vector Documents on a Graph
D2
D4
D3
D1
Source Authors Research
31
Analysis of the Rank Approximation
Cluster Results from Different Ranking
Approximation
Rank 1 Rank 2 Rank 3 Rank 4
D1 4 D2 4 D3 4 D4 3 D1 3 D2 4 D3 1 D4 2 D1 3 D2 3 D3 1 D4 3 D1 2 D2 3 D3 2 D4 2
label 1 1 4 2 3 label 1 1 3 label 2 2 4 label 1 1 3 2 4 label 1 1 2 3 4
Source Authors Research
32
Program Process Flow
  • use the previous methods on larger data sets
  • compare the results with that of humans
    classification
  • Program Process Flow

33
Conceptual Exploration
  • Reuters-21578
  • a collection of newswire articles that have been
    human-classified by Carnegie Group, Inc. and
    Reuters, Ltd
  • most widely used data set for text categorization

34
Result Analysis
Clustering with SVD vs. Humans Classification
First Data Set
First Data Set from Reuters-21578 (200 x 9928) First Data Set from Reuters-21578 (200 x 9928) First Data Set from Reuters-21578 (200 x 9928)
of Naturally Formed Cluster using SVD SVD Cluster Accuracy SVM Prediction Accuracy
Rank 002 80 75.0 65.0
Rank 005 66 81.5 82.0
Rank 010 66 60.5 54.0
Rank 015 64 52.0 51.5
Rank 020 67 38.0 46.5
Rank 030 72 60.0 54.0
Rank 040 72 62.5 58.5
Rank 050 73 54.5 51.5
Rank 100 75 45.5 58.5
Source Authors Research
35
Result Analysis
Clustering with SVD vs. Humans Classification
Second Data Set
Second Data Set from Reuters-21578 (200 x 9928) Second Data Set from Reuters-21578 (200 x 9928) Second Data Set from Reuters-21578 (200 x 9928)
of Naturally Formed Cluster using SVD SVD Cluster Accuracy SVM Prediction Accuracy
Rank 002 76 67.0 84.5
Rank 005 73 67.0 84.5
Rank 010 64 70.0 85.5
Rank 015 64 63.0 81.0
Rank 020 67 59.5 50.0
Rank 030 69 68.5 83.5
Rank 040 69 59.0 79.0
Rank 050 76 44.5 25.5
Rank 100 71 52.0 47.0
Source Authors Research
36
Result Analysis
  • highest percentage accuracy for SVD clustering is
    81.5
  • lower rank value seems to give better results
  • SVM predicts about the same accuracy as SVD
    cluster

37
Result Analysis Why results may not be higher?
  • humans classification is more subjective than a
    program
  • reducing many small clusters to only 2 clusters
    by computing the average may decrease the
    accuracy

38
Conclusion
  • Showed how SVM works
  • Explore the strength of SVM
  • Showed how SVD can be used for clustering
  • Analyzed simple and complex data
  • the method seems to cluster data reasonably
  • Our method is able to
  • reduce data size (by using truncated V matrix)
  • cluster data reasonably
  • classify new data efficiently (based on SVM)
  • By combining known methods, we created a form of
    unsupervised SVM

39
Future Work
  • extend SVD to very large data set that can only
    be stored in secondary storage
  • looking for more efficient kernels of SVM

40
Thank You!
41
References
  • Bennett, K. P., Campbell, C. (2000). Support
    Vector Machines Hype or
  • Hellelujah?. ACM SIGKDD Explorations. VOl. 2,
    No. 2, 1-13
  • Chang, C Lin, C. (2006). LIBSVM a library for
    support vector machines,
  • Retrived November 29, 2006, from
    http//www.csie.ntu.edu.tw/cjlin/libsvm
  • Cristianini, N. (2001). Support Vector and Kernel
    Machines. Retrieved November 29, 2005, from
    http//www.support-vector.net/icml-tutorial.pdf
  • Cristianini, N., Shawe-Taylor, J. (2000). An
    Introduction to Support Vector
  • Machines. Cambridge UK Cambridge University
    Press
  • Garcia, E. (2006). SVD and LSI Tutorial 4 Latent
    Semantic Indexing (LSI) How-to Calculations.
    Retrieved November 28, 2006, from
  • http//www.miislita.com/information-retrieval-tut
    orial/svd-lsi-tutorial-4-lsi-how-to-calculations.h
    tml
  • Guestrin, C. (2006). Machine Learning. Retrived
    November 8, 2006, from
  • http//www.cs.cmu.edu/guestrin/Class/10701/
  • Hicklin, J., Moler, C., Webb, P. (2005). JAMA
    A Java Matrix Package. Retrieved November 28,
    2006, from http//math.nist.gov/javanumerics/jama/

42
References
  • Joachims, T. (1998). Text Categorization with
    Support Vector Machines Learning with Many
    Relevant Features. http//www.cs.cornell.edu/Peopl
    e/tj/publications/joachims_98a.pdf
  • Joachims, T. (2004). Support Vector Machines.
    Retrived November 28, 2006, from
    http//svmlight.joachims.org/
  • Reuters-21578 Text Categorization Test
    Collection.
  • Retrived November 28, 2006, from
    http//www.daviddlewis.com/resources/testcollectio
    ns/reuters21578/
  • SVM - Support Vector Machines. DTREG. Retrived
    November 28, 2006, from
  • http//www.dtreg.com/svm.htm
  • Vapnik, V. N. (2000, 1995). The Nature of
    Statistical Learning Theory.
  • Springer-Verlag New York, Inc.
Write a Comment
User Comments (0)
About PowerShow.com