Latent Semantic Analysis - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Latent Semantic Analysis

Description:

Semantics - meaning of and relationships among words in a ... Caveats. LSA is a 'bag-of-words' technique. Blind to word-order, syntax in text. Future directions ... – PowerPoint PPT presentation

Number of Views:290
Avg rating:3.0/5.0
Slides: 16
Provided by: spr79
Category:

less

Transcript and Presenter's Notes

Title: Latent Semantic Analysis


1
Latent Semantic Analysis
  • Dharmendra P. Kanejiya
  • 15 February, 2002

2
Latent Semantic Analysis
  • Semantics
  • Approaches to semantic analysis
  • LSA
  • Building latent semantic space
  • Projection of a text unit in LS space
  • Semantic similarity measure
  • Application areas

3
Semantics
  • Syntax - structure of words, phrases and
    sentences
  • Semantics - meaning of and relationships among
    words in a sentence
  • Extracting an important meaning from a given text
    document
  • Contextual meaning

4
Approaches to semantic analysis
  • Compositional semantics
  • uses parse tree to derive a hierarchical
    structure
  • informational and intentional meaning
  • rule based
  • Classification
  • Bayesian approach
  • Statistics-algebraic approach (LSA)

5
Latent Semantic Analysis
  • LSA is a fully automatic statistics-algebraic
    technique for extracting and inferring relations
    of expected contextual usage of words in
    documents
  • It uses no humanly constructed dictionaries,
    knowledge bases, semantic networks, grammars
  • Takes as input row text

6
Building latent semantic space
  • Training corpus in the domain of interest
  • document
  • a sentence, paragraph, chapter
  • vocabulary size
  • remove stopwords

7
Word-document co-occurrence
  • Given - N documents, vocabulary size M
  • Generate a word-documents co-occurrence matrix W

d1 d2 .. dN
w1 w2 wM
W
8
Discriminate words
  • Normalized entropy
  • close to 0 very important
  • close to 1 less important
  • Scaling and normalization

9
Singular Value Decomposition
v1T v2T .. vNT
u1 u2 uM
VT
W
S
U
10
SVD approximation
  • Dimensionality reduction
  • Best rank-R approximation
  • Optimal energy preservation
  • Captures major structural associations between
    words and documents
  • Removes noisy observations

11
Words and documents
  • Columns of U orthonormal documents
  • Columns of V orthonormal words
  • Word vector uiS
  • Document vector vjS
  • words close in LS space appear in similar
    documents
  • documents close in LS space convey similar meaning

12
LSA as knowledge representation
  • Projecting a new document in LS space
  • Calculate the frequency count di of words in
    the document.
  • d U S vT
  • ? UTd SvT
  • Thus,

13
Semantic Similarity Measure
  • To find similarity between two documents, project
    them in LS space
  • Then calculate the cosine measure between their
    projection
  • With this measure, various problems can be
    addressed e.g., natural language understanding,
    cognitive modeling etc

14
Application Areas
  • Natural language understanding
  • Automatic evaluation of student-answers
  • Cognitive science
  • knowledge representation and acquisition
  • synonym test (TOEFL)
  • Speech recognition and understanding
  • semantic classification
  • semantically large span language modeling

15
Caveats
  • LSA is a bag-of-words technique
  • Blind to word-order, syntax in text
  • Future directions
  • Add syntactic information to LSA ?
  • Integrate local syntax, LSA semantics and global
    pragmatics
Write a Comment
User Comments (0)
About PowerShow.com