Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition

Description:

Title: Slide 1 Author: pf Last modified by: Mark Steyvers Created Date: 1/28/2006 3:25:09 AM Document presentation format: On-screen Show Company: uci – PowerPoint PPT presentation

Number of Views:261
Avg rating:3.0/5.0
Slides: 57
Provided by: pf84
Category:

less

Transcript and Presenter's Notes

Title: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition


1
Latent Semantic AnalysisA Model of Inductive
Knowledge Acquisition
  • Paul Fillmore Stefanie Wong

2
Overview
  • The question of interest
  • The Problem
  • The Proposed Solution LSA
  • Latent Semantic Analysis
  • What is it?
  • What can it do?
  • How does it do it?
  • Evaluation of the model
  • Additional Considerations
  • Demonstrations of LSA

3
The Problem of Induction
  • Platos problem the poverty of the stimulus
  • How do people acquire as much knowledge as they
    do based on the little information they get?
  • Example Language Acquisition
  • Chomsky (1991) Observing adult language is
    insufficient for childrens development of
    grammar or a typical lexicon
  • Pinker (1994) Language learning must be innate
    a language instinct

4
Problem of induction in cognitive terms...
  • Problem of categorization
  • What is the mechanism in which concepts (cheetah,
    tigers) come to be treated as the same for some
    purpose (predators that will eat me)
  • Problem of similarity
  • How does experience combine disparate things into
    a feature identity (wing different for a bird,
    insect, bat)

5
Latent Semantic Analysis What is it?
  • Latent Semantic Analysis (LSA) is a
    mathematical/statistical technique for extracting
    and representing the similarity of meaning of
    words and passages by analysis of large bodies of
    text.
  • More simply, it is a computer model of human
    associative learning through experience
  • Does not embody human knowledge beyond its
    general learning mechanism

6
What can LSA do?
  • Performance on standard vocabulary and subject
    matter tests comparable to humans
  • Demonstrates similar mechanism for word sorting
    and category judgments
  • Processes word-word and passage-word lexical
    priming data
  • It can accurately estimate
  • Passage coherence
  • Learnability of passages by individual students
  • The quality and quantity of knowledge contained
    in essays
  • Can perform humanlike generalizations based on
    learning that isnt dependent upon primitive
    perceptual relations/representations

7
How does LSA work?
  • Definitions
  • Semantic space
  • Singular value decomposition (SVD)
  • Dimensionality
  • Procedure
  • 1) Matrix Input
  • 2) Cell Transformation
  • 3) Singular Value Decomposition
  • 4) Dimension Reduction

8
Semantic Space
  • A semantic space is a mathematical representation
    of a large body of text (e.g. Encyclopedias,
    Psychology Texts)
  • Each term or combination of terms has its own
    high-dimensional vector representation within the
    semantic space
  • Similarity between vectors for words and context
    is measured by cosine of their combined angle
  • Note Terms can only be compared within a
    semantic space, not directly between semantic
    spaces
  • If vectors were projected onto a sphere
    surrounding the semantic space, points close
    together would have closer semantic relations

9
Example of similarities within Semantic Space
  • Submitting a term/short text and receiving list
    of terms that are nearest to it in semantic space
  • Matrix comparison of multiple terms

10
Singular Value Decomposition
  • A mathematical matrix decomposition technique
    (general case of factor analysis), condenses
    large matrix of word-by-content data into smaller
    matrix
  • Smaller matrix typically has a 100-500
    dimensional representation
  • The right number of dimensions critical for
    optimal simulation

11
Dimensionality
A
  • Knowing appropriate dimensionality improves
    estimates
  • Example
  • Three separate house, ABC are arranged as
    follows A is 5 units from both B and C, and B
    and C are separated by 8 units
  • Oh, also, all on the same straight, flat road

B
C
A
B
C
12
Procedure Matrix Input
  • Rows individual word types
  • Columns meaning-bearing passages (i.e.
    sentences or paragraphs)
  • Cells frequency with which a word occurs in a
    passage

13
Procedure Cell Transformation
  • Transformation 1 Approximates standard empirical
    growth functions of simple learning
  • Taking a words appearance frequency
  • Transformation 2 makes primary association
    better represent the informative relation between
    the entities rather than co-occurrence
  • Entropy for a word

Transformation 1
Transformation 2
14
Procedure SVD Dimension Reduction
  • SVD ij ik kk jk'
  • in which ik and jk have orthonormal
    columns, kk is a diagonal matrix of singular
    values, and k lt max (i,j).
  • Dimension reduction all but the d largest
    singular values are set to zero, where d number
    of dimensions to be used

15
Word (w) x Context (c) Matrix (X)
  • m columns of W and m rows of C are linearly
    independent

Diagonal Matrix
Orthonormal Matrices
16
LSA Example
  • c1 Human machine interface for ABC computer
    applications
  • c2 A survey of user opinion of computer system
    response time
  • c3The EPS user interface management system
  • c4 System and human system engineering testing
    of EPS
  • c5 Relation of user perceived response time to
    error measurement
  • m1 The generation of random, binary, ordered
    trees
  • m2 The intersection graph pf paths in trees
  • m3 Graph minors IV Widths of trees and
    well-quasi ordering
  • m4 Graph minors A survey

17
(No Transcript)
18
r(human user) 0.94
19
Evaluating the Model
  • Four Questions to keep in mind
  • Can a simple linear model acquire knowledge of
    humanlike word meaning similarities given
    sufficient input?
  • If successful, is it dependent upon
    dimensionality of representation?
  • Is the rate of acquisition comparable to a human?
  • What degree of this knowledge is from indirect
    inferences from combinations of information
    across samples?

20
Is It Acquiring Knowledge
  • Models knowledge tested with standard
    multiple-choice synonym test
  • After training on approx. 2,000 pages of English
    text, LSA scored as well as average test-takers
    on the synonym portion of TOEFL
  • Acquired knowledge attributed to indirect
    inference as opposed to direct
    co-occurrence relations

21
Two explanations
  • 1) A substantial portion of the information
    needed to answer common vocabulary questions
    could be inferred from the contextual statistics
    of usage alone
  • 2) Model employs a means of induction-dimension
    matching that amplifies its learning ability,
    resulting in correct inference of similarity
    relations only implicit in temporal correlations
    of experience

22
Is dimensionality a factor?
  • Varied number of dimensions retained
  • Note What happens when there is no
    dimensionality reduction at all
  • Choosing optimal dimensionality approximately
    triples the number of words learned

23
Comparable rate?
  • Learning comparable to the rate at which school
    aged children improve their performance on
    similar tests as a result of reading
  • Rate of acquisition for late elementary and high
    school years estimated at 3,000 - 5,400 words per
    year (10-15 per day)

24
Calculating Comparable RateDirect Indirect
Effects
  • LSA simulations consider
  • Average number of contexts in which test word
    appeared (the parameter)
  • And the total number of other contexts, those
    that contained no words from the synonym test
    items
  • Varied by randomly replacing test words with
    nonsense words and choosing random subsamples of
    total text
  • Joint effects of direct and indirect textual
    experience

25
LSA simulation of total vocabulary gain
  • Came up with a model to fit data z a(log b
    T)(log c S)
  • T total number of text samples analyzed
  • S number of text samples containing stem word
  • r .89
  • For every word estimates were made for
  • Probability that a word of its frequency appears
    in the next sample
  • Number of times individual would have encountered
    the word previously
  • Expected increase in z with the addition of a
    passage containing the word
  • Expected increase in z with the addition of a
    passage that doesnt contain it
  • Converted z to probability correct x
    corresponding frequencies
  • Cumulated gains in number correct / all
    individuals words in the language to get the
    total vocabulary gains from reading single text
    sample

26
Conclusions from Vocabulary Simulations
  • LSA learns meanings similarities of words from
    text, amount equivalent to test scores of
    moderately competent English readers
  • Three-fourths of LSAs knowledge is a product of
    indirect induction (the exposure of text not
    containing the word)
  • Expression of hypothesis that word meanings grow
    continuously and that correct performance is a
    stochastic event governed by individual
    differences in experience
  • i.e. word meanings are constantly in flux

27
Other Considerations
  • Neurocognitive Psychological Plausibility
  • Neural net models
  • Similarity to biological models
  • Parallels with memory
  • Meaning Independent of word order?
  • Contextual Disambiguation In LSA, words have
    only one vector representation, thus only one
    meaning

28
Mathematical Machine
  • Analogy a three-layered neural net

LAYER 1 WORD TYPE
LAYER 2 CONCEPTUAL REPRESENTATIONS
LAYER 3 TEXT WINDOW
29
Neural Net Analogy
  • Network is symmetrical can run in either
    direction
  • Different computations made to assess similarity
    between two episodes, event types, or an episode
    and an event type

30
Similarity to Biological Models
  • Interneuronal communication
  • Vector multiplication between axons, dendrites
    and cell bodies
  • Excitation is proportional to dot product of
    output and sensitivities of surrounding neurons
  • Single-cell recordings
  • Population effects described as vector averages
    of individual direction representations

31
Word-versus-context differenceAnalogy to
Episodic Semantic Memories
  • Word representations are semantic, meanings
    abstracted and averaged from many experiences
  • Context representations are episodic, unique
    combinations that occurred only once ever
  • Both words and episodes represented by same
    defining dimensions, and relation to one another
    is still retained

32
Word-versus-context difference Analogy to
Explicit Implicit Memories
  • Retrieving a context vector brings past happening
    to mind - explicit memory
  • Retrieving a word vector instantiates abstraction
    of many happenings brought together - implicit
    memory

33
Meaning independent of word order?
  • Text segments treated as bags of words
  • LSA makes no use of word order, syntax or grammar
  • Despite assertions that scrambled sentences
    would be worthless context for vocabulary
    instruction (Durkin,1983), LSA acquires 100 of
    its knowledge via scrambled sentences and still
    performs relatively well at deciphering meaning

34
Expertise
  • LSA account of knowledge brings new perspective
    for expertise
  • Simulated expert learns four times more about an
    item per exposure than the simulated novice
  • LSA suggests that great masses of knowledge
    contribute to superior performance by
  • Direct application of stored knowledge to a
    problem
  • Greater ability to add new knowledge to long term
    memory
  • To infer indirect relations among bits of
    knowledge and to generalize from instances and
    experience

35
Contextual Disambiguation
  • Frequency-weighted average of predicted usages
  • Acceptable for words that generate only one or a
    few closely related meanings (majority of words)
  • Balanced homographs such as bear result in an
    LSA vector that doesnt resemble any of their
    major meanings
  • While LSAs single-vector representation cant
    account for multiple word-meaning phenomena at
    this stage, it is not a fatal flaw (local context
    will aid in disambiguation)

36
Text Comprehension An LSA Interpretation of
Construction-Integration Theory
  • Research in which individual word senses arent
    represented, but overall meaning of
    phrases/sentences/paragraphs is constructed from
    linear combination of their words
  • Vector average reflects overall topic or meaning
    or passage

37
Criticisms/ Further Issues
  • Remember SVD is just one possible, simple case
    for a model
  • Assumption All necessary semantic information is
    gleaned from a words context (ex. love)
  • Linguistic structures (i.e. syntax) which show
    obvious importance for derivation of meaning
    should be incorporated

38
Educational Applications of LSA
  • Performance on college exams
  • Scoring the content of an essay
  • Selecting most appropriate text for learners with
    different levels of background knowledge
  • Assisting students to summarize material

39
Performance on College Exams
40
Essay Grading
41
Demonstrations Write to Learn
  • Promotes writing skills and reading comprehension

42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
Demonstrations Intelligent Essay Assessor (IEA)
  • Assesses and critiques electronically submitted
    essays
  • Provides assessment and feedback

47
(No Transcript)
48
DemonstrationSummary Street
  • Web-based reading comprehension and writing
    instruction tool
  • Compares student summaries to each section of
    text and provides feedback

49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
Demonstration Super Manual
  • Program that allows one to identify, develop, and
    test better ways to organize and present
    information customized to individual maintainers'
    level of expertise

53
Educational Text Selection
  • Predicts how much readers will learn from texts
    based on estimated conceptual knowledge of topic
    and information present in the text they read

54
DemonstrationState the Essence!
  • LSA provides evaluations to student summaries of
    text
  • Guides students toward content that had been
    noted by experts to consider most significant
  • A way to measure reading comprehension
  • Summary writing requires construction of mental
    representations that joins elements of text
    information with each other and elements of prior
    knowledge

55
Summary
  • People appear to know significantly more than
    they could have learned from temporally local
    experiences
  • Proposed induction method dependant on
    reconstruction of system of multiple similarity
    relations in high dimensional space
  • Implemented dimensionality-optimizing induction
    though SVD matrix decomposition
  • Model scored as well as the mean scores of
    foreign students on TOEFL exams
  • Model learned at a rate similar to
    school-children and through induction from data
    about other words
  • Because LSA didnt have access to word-similarity
    information based on spoken language, morphology,
    syntax, logic or perceptual word knowledge,
    concluded that induction method is sufficient to
    account for Platos paradox, at least in domain
    of knowledge measured by synonym tests

56
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com