SVD - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

SVD

Description:

Title: SVD & LSI Author: Zheng Zhao Last modified by: Zheng Zhao Created Date: 1/24/2006 6:06:35 AM Document presentation format: On-screen Show Company – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 28
Provided by: Zheng65
Category:
Tags: svd | product

less

Transcript and Presenter's Notes

Title: SVD


1
SVD LSI
  • ML Reading Group
  • Jan-24-2006
  • Presenter Zheng Zhao

2
SVD (Singular value decomposition)
  • Vector Norm
  • Matrix Norm
  • Singular value decomposition
  • The application of SVD

3
vector norm
  • A vector norm has the following properties.
  • 1. x ? 0 (non-negative)
  • 2. x 0 implies that all elements xi 0
  • 3. ?x ? x
  • 4. x1 x2 ? x1 x2
    (triangular inequality)
  • Equivalence of norms

4
vector norm (cont.)
5
matrix (operator) norm
  • A matrix (operator) norm has the following
    properties.
  • 1. A ? 0 (non-negative)
  • 2. A 0 implies that all elements xi 0
  • 3. ?A ? A
  • 4. A1 A2 ? A1 A2
    (triangular inequality)
  • 5. AB ? A B (multiplicative
    property)

An induced norm is defined as the following, for
z Ax
measures how much A stretches x
6
matrix (operator) norm (cont.)
7
SVD
  • SVD- Singular value decomposition
  • http//en.wikipedia.org/wiki/Singular_value_decomp
    osition

8
Some Properties of SVD
9
Some Properties of SVD
  • That is, Ak is the optimal approximation in terms
    of the approximation error measured by the
    Frobenius norm, among all matrices of rank k
  • Forms the basics of LSI (Latent Semantic
    Indexing) in informational retrieval

10
Application of SVD
  • Pseudoinverse
  • Range, null space and rank
  • Matrix approximation
  • Other examples
  • http//en.wikipedia.org/wiki/Singular_value_decomp
    osition

11
LSI (Latent Semantic Indexing)
  • Problem Introduction
  • Latent Semantic Indexing
  • LSI
  • Query
  • Updating
  • An example
  • Some comments

12
Problem Introduction
  • Traditional term-matching method doesnt work
    well in information retrieval
  • We want to capture the concepts instead of words.
    Concepts are reflected in the words. However,
  • One term may have multiple meaning
  • Different terms may have the same meaning.

13
LSI (Latent Semantic Indexing)
  • LSI approach tries to overcome the deficiencies
    of term-matching retrieval by treating the
    unreliability of observed term-document
    association data as a statistical problem.
  • The goal is to find effective models to represent
    the relationship between terms and documents.
    Hence a set of terms, which is by itself
    incomplete and unreliable, will be replaced by
    some set of entities which are more reliable
    indicants.

14
LSI, the Method
  • Document-Term M
  • Decompose M by SVD.
  • Approximating M using truncated SVD

15
LSI, the Method (cont.)
Each row and column of A gets mapped into the
k-dimensional LSI space, by the SVD.
16
Fundamental Comparison Quantities from the SVD
Model
  • Comparing Two Terms the dot product between two
    row vectors of reflects the extent to which
    two terms have a similar pattern of occurrence
    across the set of document.
  • Comparing Two Documents dot product between two
    column vectors of
  • Comparing a Term and a Document

17
Query
  • A query q is also mapped into this space, by
  • Compare the similarity in the new space
  • Intuition Dimension reduction through LSI brings
    together related axes in the vector space.

18
Updating
  • Recomposing
  • Expensive
  • Fold in Method

New terms and documents have no effect on the
representation of the preexisting terms and
documents
19
Example
20
Example (cont.)
21
Example (cont. Mapping)
22
Example (cont. Query)
Query Application and Theory
23
Example (cont. Query)
24
Example (cont. fold in)
25
Example (cont. recomposing)
26
Choosing a value for k
  • LSI is useful only if k ltlt n.
  • If k is too large, it doesn't capture the
    underlying latent semantic space if k is too
    small, too much is lost.
  • No principled way of determining the best k need
    to experiment.

27
How well does LSI work?
  • Effectiveness of LSI compared to regular
    term-matching depends on nature of documents.
  • Typical improvement 0 to 30 better precision.
  • Advantage greater for texts in which synonymy and
    ambiguity are more prevalent.
  • Best when recall is high.
  • Costs of LSI might outweigh improvement.
  • SVD is computationally expensive limited use for
    really large document collections
  • Inverted index not possible
Write a Comment
User Comments (0)
About PowerShow.com