SVD - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

SVD

Description:

Title: SVD & LSI Author: Zheng Zhao Last modified by: Zheng Zhao Created Date: 1/24/2006 6:06:35 AM Document presentation format: On-screen Show Company – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 28

Provided by: Zheng65

Category:

more less

Transcript and Presenter's Notes

Title: SVD

1
SVD LSI

ML Reading Group
Jan-24-2006
Presenter Zheng Zhao

2
SVD (Singular value decomposition)

Vector Norm
Matrix Norm
Singular value decomposition
The application of SVD

3
vector norm

A vector norm has the following properties.
1. x ? 0 (non-negative)
2. x 0 implies that all elements xi 0
3. ?x ? x
4. x1 x2 ? x1 x2
(triangular inequality)
Equivalence of norms

4
vector norm (cont.)
5
matrix (operator) norm

A matrix (operator) norm has the following
properties.
1. A ? 0 (non-negative)
2. A 0 implies that all elements xi 0
3. ?A ? A
4. A1 A2 ? A1 A2
(triangular inequality)
5. AB ? A B (multiplicative
property)

An induced norm is defined as the following, for
z Ax
measures how much A stretches x
6
matrix (operator) norm (cont.)
7
SVD

SVD- Singular value decomposition
http//en.wikipedia.org/wiki/Singular_value_decomp
osition

8
Some Properties of SVD
9
Some Properties of SVD

That is, Ak is the optimal approximation in terms
of the approximation error measured by the
Frobenius norm, among all matrices of rank k
Forms the basics of LSI (Latent Semantic
Indexing) in informational retrieval

10
Application of SVD

Pseudoinverse
Range, null space and rank
Matrix approximation
Other examples
http//en.wikipedia.org/wiki/Singular_value_decomp
osition

11
LSI (Latent Semantic Indexing)

Problem Introduction
Latent Semantic Indexing
LSI
Query
Updating
An example
Some comments

12
Problem Introduction

Traditional term-matching method doesnt work
well in information retrieval
We want to capture the concepts instead of words.
Concepts are reflected in the words. However,
One term may have multiple meaning
Different terms may have the same meaning.

13
LSI (Latent Semantic Indexing)

LSI approach tries to overcome the deficiencies
of term-matching retrieval by treating the
unreliability of observed term-document
association data as a statistical problem.
The goal is to find effective models to represent
the relationship between terms and documents.
Hence a set of terms, which is by itself
incomplete and unreliable, will be replaced by
some set of entities which are more reliable
indicants.

14
LSI, the Method

Document-Term M
Decompose M by SVD.
Approximating M using truncated SVD

15
LSI, the Method (cont.)
Each row and column of A gets mapped into the
k-dimensional LSI space, by the SVD.
16
Fundamental Comparison Quantities from the SVD
Model

Comparing Two Terms the dot product between two
row vectors of reflects the extent to which
two terms have a similar pattern of occurrence
across the set of document.
Comparing Two Documents dot product between two
column vectors of
Comparing a Term and a Document

17
Query

A query q is also mapped into this space, by
Compare the similarity in the new space
Intuition Dimension reduction through LSI brings
together related axes in the vector space.

18
Updating

Recomposing
Expensive
Fold in Method

New terms and documents have no effect on the
representation of the preexisting terms and
documents
19
Example
20
Example (cont.)
21
Example (cont. Mapping)
22
Example (cont. Query)
Query Application and Theory
23
Example (cont. Query)
24
Example (cont. fold in)
25
Example (cont. recomposing)
26
Choosing a value for k

LSI is useful only if k ltlt n.
If k is too large, it doesn't capture the
underlying latent semantic space if k is too
small, too much is lost.
No principled way of determining the best k need
to experiment.

27
How well does LSI work?

Effectiveness of LSI compared to regular
term-matching depends on nature of documents.
Typical improvement 0 to 30 better precision.
Advantage greater for texts in which synonymy and
ambiguity are more prevalent.
Best when recall is high.
Costs of LSI might outweigh improvement.
SVD is computationally expensive limited use for
really large document collections
Inverted index not possible