Title: Singular Value Decomposition and Data Management
1Singular Value Decomposition and Data Management
2SVD - Detailed outline
- Motivation
- Definition - properties
- Interpretation
- Complexity
- Case studies
- Additional properties
3SVD - Motivation
- problem 1 text - LSI find concepts
- problem 2 compression / dim. reduction
4SVD - Motivation
- problem 1 text - LSI find concepts
5SVD - Motivation
- problem 2 compress / reduce dimensionality
6Problem - specs
- 106 rows 103 columns no updates
- random access to any cell(s) small error OK
7SVD - Motivation
8SVD - Motivation
9SVD - Definition
- An x m Un x r L r x r (Vm x r)T
- A n x m matrix (eg., n documents, m terms)
- U n x r matrix (n documents, r concepts)
- L r x r diagonal matrix (strength of each
concept) (r rank of the matrix) - V m x r matrix (m terms, r concepts)
10SVD - Properties
- THEOREM Press92 always possible to decompose
matrix A into A U L VT , where - U, L, V unique ()
- U, V column orthonormal (ie., columns are unit
vectors, orthogonal to each other) - UT U I VT V I (I identity matrix)
- L singular values, non-negative and sorted in
decreasing order
11SVD - Example
retrieval
inf.
lung
brain
data
CS
x
x
MD
12SVD - Example
retrieval
CS-concept
inf.
lung
MD-concept
brain
data
CS
x
x
MD
13SVD - Example
doc-to-concept similarity matrix
retrieval
CS-concept
inf.
lung
MD-concept
brain
data
CS
x
x
MD
14SVD - Example
retrieval
strength of CS-concept
inf.
lung
brain
data
CS
x
x
MD
15SVD - Example
term-to-concept similarity matrix
retrieval
inf.
lung
brain
data
CS-concept
CS
x
x
MD
16SVD - Example
term-to-concept similarity matrix
retrieval
inf.
lung
brain
data
CS-concept
CS
x
x
MD
17SVD - Detailed outline
- Motivation
- Definition - properties
- Interpretation
- Complexity
- Case studies
- Additional properties
18SVD - Interpretation 1
- documents, terms and concepts
- U document-to-concept similarity matrix
- V term-to-concept sim. matrix
- L its diagonal elements strength of each
concept
19SVD - Interpretation 2
- best axis to project on (best min sum of
squares of projection errors)
20SVD - Motivation
21SVD - interpretation 2
SVD gives best axis to project
v1
22SVD - Interpretation 2
23SVD - Interpretation 2
24SVD - Interpretation 2
variance (spread) on the v1 axis
x
x
25SVD - Interpretation 2
- A U L VT - example
- U L gives the coordinates of the points in
the projection axis
x
x
26SVD - Interpretation 2
- More details
- Q how exactly is dim. reduction done?
27SVD - Interpretation 2
- More details
- Q how exactly is dim. reduction done?
- A set the smallest singular values to zero
x
x
28SVD - Interpretation 2
x
x
29SVD - Interpretation 2
x
x
30SVD - Interpretation 2
x
x
31SVD - Interpretation 2
32SVD - Interpretation 2
- Equivalent
- spectral decomposition of the matrix
x
x
33SVD - Interpretation 2
- Equivalent
- spectral decomposition of the matrix
l1
x
x
u1
u2
l2
v1
v2
34SVD - Interpretation 2
- Equivalent
- spectral decomposition of the matrix
m
...
n
35SVD - Interpretation 2
- spectral decomposition of the matrix
m
r terms
...
n
n x 1
1 x m
36SVD - Interpretation 2
- approximation / dim. reduction
- by keeping the first few terms (Q how many?)
m
To do the mapping you use VT X VT X
...
n
assume l1 gt l2 gt ...
37SVD - Interpretation 2
- A (heuristic - Fukunaga) keep 80-90 of
energy ( sum of squares of li s)
m
...
n
assume l1 gt l2 gt ...
38SVD - Interpretation 3
- finds non-zero blobs in a data matrix
x
x
39SVD - Interpretation 3
- finds non-zero blobs in a data matrix
x
x
40SVD - Interpretation 3
- Drill find the SVD, by inspection!
- Q rank ??
x
x
??
??
??
41SVD - Interpretation 3
- A rank 2 (2 linearly independent rows/cols)
x
x
??
??
??
??
42SVD - Interpretation 3
- A rank 2 (2 linearly independent rows/cols)
x
x
orthogonal??
43SVD - Interpretation 3
- column vectors are orthogonal - but not unit
vectors
0
0
x
x
0
0
0
0
0
0
0
0
44SVD - Interpretation 3
- and the singular values are
0
0
x
x
0
0
0
0
0
0
0
0
45SVD - Interpretation 3
- A SVD properties
- matrix product should give back matrix A
- matrix U should be column-orthonormal, i.e.,
columns should be unit vectors, orthogonal to
each other - ditto for matrix V
- matrix L should be diagonal, with positive values
46SVD - Complexity
- O( n m m) or O( n n m) (whichever is
less) - less work, if we just want singular values
- or if we want first k left singular vectors
- or if the matrix is sparse Berry
- Implemented in any linear algebra package
(LINPACK, matlab, Splus, mathematica ...)
47Optimality of SVD
- Def The Frobenius norm of a n x m matrix M is
- (reminder) The rank of a matrix M is the number
of independent rows (or columns) of M - Let AULVT and Ak Uk Lk VkT (SVD
approximation of A) - Ak is an nxm matrix, Uk an nxk, Lk kxk, and Vk
mxk - Theorem Eckart and Young Among all n x m
matrices C of rank at most k, we have that
48Kleinbergs Algorithm
- Main idea In many cases, when you search the web
using some terms, the most relevant pages may not
contain this term (or contain the term only a few
times) - Harvard www.harvard.edu
- Search Engines yahoo, google, altavista
- Authorities and hubs
49Kleinbergs algorithm
- Problem dfn given the web and a query
- find the most authoritative web pages for this
query - Step 0 find all pages containing the query terms
(root set) - Step 1 expand by one move forward and backward
(base set)
50Kleinbergs algorithm
- Step 1 expand by one move forward and backward
51Kleinbergs algorithm
- on the resulting graph, give high score (
authorities) to nodes that many important nodes
point to - give high importance score (hubs) to nodes that
point to good authorities)
hubs
authorities
52Kleinbergs algorithm
- observations
- recursive definition!
- each node (say, i-th node) has both an
authoritativeness score ai and a hubness score hi
53Kleinbergs algorithm
- Let E be the set of edges and A be the adjacency
matrix - the (i,j) is 1 if the edge from i to j exists
- Let h and a be n x 1 vectors with the
hubness and authoritativiness scores. - Then
54Kleinbergs algorithm
- Then
- ai hk hl hm
- that is
- ai Sum (hj) over all j that (j,i) edge
exists - or
- a AT h
k
i
l
m
55Kleinbergs algorithm
- symmetrically, for the hubness
- hi an ap aq
- that is
- hi Sum (qj) over all j that (i,j) edge
exists - or
- h A a
n
i
p
q
56Kleinbergs algorithm
- In conclusion, we want vectors h and a such that
- h A a
- a AT h
- Recall properties
- C(2) A n x m v1 m x 1 l1 u1 n x 1
- C(3) u1T A l1 v1T
57Kleinbergs algorithm
- In short, the solutions to
- h A a
- a AT h
- are the left- and right- eigenvectors of the
adjacency matrix A. - Starting from random a and iterating, well
eventually converge - (Q to which of all the eigenvectors? why?)
58Kleinbergs algorithm
- (Q to which of all the eigenvectors? why?)
- A to the ones of the strongest eigenvalue,
because of property B(5) - B(5) (AT A ) k v (constant) v1
59Kleinbergs algorithm - results
- Eg., for the query java
- 0.328 www.gamelan.com
- 0.251 java.sun.com
- 0.190 www.digitalfocus.com (the java developer)
60Kleinbergs algorithm - discussion
- authority score can be used to find similar
pages to page p - closely related to citation analysis, social
networs / small world phenomena
61google/page-rank algorithm
- closely related The Web is a directed graph of
connected nodes - imagine a particle randomly moving along the
edges () - compute its steady-state probabilities. That
gives the PageRank of each pages (the importance
of this page) - () with occasional random jumps
62PageRank Definition
- Assume a page A and pages T1, T2, , Tm that
point to A. Let d is a damping factor. PR(A) the
pagerank of A. C(A) the out-degree of A. Then
63google/page-rank algorithm
- Compute the PR of each pageidentical problem
given a Markov Chain, compute the steady state
probabilities p1 ... p5
2
1
3
4
5
64Computing PageRank
- Iterative procedure
- Also, navigate the web by randomly follow links
or with prob p jump to a random page. Let A the
adjacency matrix (n x n), di out-degree of page i
- Prob(Ai-gtAj) pn-1(1-p)di1Aij
- Ai,j Prob(Ai-gtAj)
65google/page-rank algorithm
- Let A be the transition matrix ( adjacency
matrix, row-normalized sum of each row 1)
2
1
3
4
5
66google/page-rank algorithm
A p p
2
1
3
4
5
67google/page-rank algorithm
- A p p
- thus, p is the eigenvector that corresponds to
the highest eigenvalue (1, since the matrix is
row-normalized)
68Kleinberg/google - conclusions
- SVD helps in graph analysis
- hub/authority scores strongest left- and right-
eigenvectors of the adjacency matrix - random walk on a graph steady state
probabilities are given by the strongest
eigenvector of the transition matrix
69Conclusions so far
- SVD a valuable tool
- given a document-term matrix, it finds concepts
(LSI) - ... and can reduce dimensionality (KL)
70Conclusions contd
- ... and can find fixed-points or steady-state
probabilities (google/ Kleinberg/ Markov Chains) - ... and can solve optimally over- and
under-constraint linear systems (least squares)