Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition - PowerPoint PPT Presentation

1 / 56

About This Presentation

Title:

Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition

Description:

Paul Fillmore & Stefanie Wong. Overview. The question of interest ... Calculating Comparable Rate: Direct & Indirect Effects. LSA simulations consider ... love' ... – PowerPoint PPT presentation

Number of Views:79

Avg rating:3.0/5.0

Slides: 57

Provided by: pf84

Learn more at: https://steyvers.socsci.uci.edu

Category:

more less

Transcript and Presenter's Notes

Title: Latent Semantic Analysis: A Model of Inductive Knowledge Acquisition

1
Latent Semantic AnalysisA Model of Inductive
Knowledge Acquisition

Paul Fillmore Stefanie Wong

2
Overview

The question of interest
The Problem
The Proposed Solution LSA
Latent Semantic Analysis
What is it?
What can it do?
How does it do it?

Evaluation of the model
Additional Considerations
Demonstrations of LSA

3
The Problem of Induction

Platos problem the poverty of the stimulus
How do people acquire as much knowledge as they
do based on the little information they get?
Example Language Acquisition
Chomsky (1991) Observing adult language is
insufficient for childrens development of
grammar or a typical lexicon
Pinker (1994) Language learning must be innate
a language instinct

4
Problem of induction in cognitive terms...

Problem of categorization
What is the mechanism in which concepts (cheetah,
tigers) come to be treated as the same for some
purpose (predators that will eat me)
Problem of similarity
How does experience combine disparate things into
a feature identity (wing different for a bird,
insect, bat)

5
Latent Semantic Analysis What is it?

Latent Semantic Analysis (LSA) is a
mathematical/statistical technique for extracting
and representing the similarity of meaning of
words and passages by analysis of large bodies of
text.
More simply, it is a computer model of human
associative learning through experience
Does not embody human knowledge beyond its
general learning mechanism

6
What can LSA do?

Performance on standard vocabulary and subject
matter tests comparable to humans
Demonstrates similar mechanism for word sorting
and category judgments
Processes word-word and passage-word lexical
priming data
It can accurately estimate
Passage coherence
Learnability of passages by individual students
The quality and quantity of knowledge contained
in essays
Can perform humanlike generalizations based on
learning that isnt dependent upon primitive
perceptual relations/representations

7
How does LSA work?

Definitions
Semantic space
Singular value decomposition (SVD)
Dimensionality
Procedure
1) Matrix Input
2) Cell Transformation
3) Singular Value Decomposition
4) Dimension Reduction

8
Semantic Space

A semantic space is a mathematical representation
of a large body of text (e.g. Encyclopedias,
Psychology Texts)
Each term or combination of terms has its own
high-dimensional vector representation within the
semantic space
Similarity between vectors for words and context
is measured by cosine of their combined angle
Note Terms can only be compared within a
semantic space, not directly between semantic
spaces
If vectors were projected onto a sphere
surrounding the semantic space, points close
together would have closer semantic relations

9
Example of similarities within Semantic Space

Submitting a term/short text and receiving list
of terms that are nearest to it in semantic space
Matrix comparison of multiple terms

10
Singular Value Decomposition

A mathematical matrix decomposition technique
(general case of factor analysis), condenses
large matrix of word-by-content data into smaller
matrix
Smaller matrix typically has a 100-500
dimensional representation
The right number of dimensions critical for
optimal simulation

11
Dimensionality
A

Knowing appropriate dimensionality improves
estimates
Example
Three separate house, ABC are arranged as
follows A is 5 units from both B and C, and B
and C are separated by 8 units
Oh, also, all on the same straight, flat road

B
C
A
B
C
12
Procedure Matrix Input

Rows individual word types
Columns meaning-bearing passages (i.e.
sentences or paragraphs)
Cells frequency with which a word occurs in a
passage

13
Procedure Cell Transformation

Transformation 1 Approximates standard empirical
growth functions of simple learning
Taking a words appearance frequency
Transformation 2 makes primary association
better represent the informative relation between
the entities rather than co-occurrence
Entropy for a word

Transformation 1
Transformation 2
14
Procedure SVD Dimension Reduction

SVD ij ik kk jk'
in which ik and jk have orthonormal
columns, kk is a diagonal matrix of singular
values, and k lt max (i,j).
Dimension reduction all but the d largest
singular values are set to zero, where d number
of dimensions to be used

15
Word (w) x Context (c) Matrix (X)

m columns of W and m rows of C are linearly
independent

Diagonal Matrix
Orthonormal Matrices
16
LSA Example

c1 Human machine interface for ABC computer
applications
c2 A survey of user opinion of computer system
response time
c3The EPS user interface management system
c4 System and human system engineering testing
of EPS
c5 Relation of user perceived response time to
error measurement
m1 The generation of random, binary, ordered
trees
m2 The intersection graph pf paths in trees
m3 Graph minors IV Widths of trees and
well-quasi ordering
m4 Graph minors A survey

17
(No Transcript)
18
r(human user) 0.94
19
Evaluating the Model

Four Questions to keep in mind
Can a simple linear model acquire knowledge of
humanlike word meaning similarities given
sufficient input?
If successful, is it dependent upon
dimensionality of representation?
Is the rate of acquisition comparable to a human?
What degree of this knowledge is from indirect
inferences from combinations of information
across samples?

20
Is It Acquiring Knowledge

Models knowledge tested with standard
multiple-choice synonym test
After training on approx. 2,000 pages of English
text, LSA scored as well as average test-takers
on the synonym portion of TOEFL
Acquired knowledge attributed to indirect
inference as opposed to direct
co-occurrence relations

21
Two explanations

1) A substantial portion of the information
needed to answer common vocabulary questions
could be inferred from the contextual statistics
of usage alone
2) Model employs a means of induction-dimension
matching that amplifies its learning ability,
resulting in correct inference of similarity
relations only implicit in temporal correlations
of experience

22
Is dimensionality a factor?

Varied number of dimensions retained
Note What happens when there is no
dimensionality reduction at all
Choosing optimal dimensionality approximately
triples the number of words learned

23
Comparable rate?

Learning comparable to the rate at which school
aged children improve their performance on
similar tests as a result of reading

Rate of acquisition for late elementary and high
school years estimated at 3,000 - 5,400 words per
year (10-15 per day)

24
Calculating Comparable RateDirect Indirect
Effects

LSA simulations consider
Average number of contexts in which test word
appeared (the parameter)
And the total number of other contexts, those
that contained no words from the synonym test
items
Varied by randomly replacing test words with
nonsense words and choosing random subsamples of
total text
Joint effects of direct and indirect textual
experience

25
LSA simulation of total vocabulary gain

Came up with a model to fit data z a(log b
T)(log c S)
T total number of text samples analyzed
S number of text samples containing stem word
r .89
For every word estimates were made for
Probability that a word of its frequency appears
in the next sample
Number of times individual would have encountered
the word previously
Expected increase in z with the addition of a
passage containing the word
Expected increase in z with the addition of a
passage that doesnt contain it
Converted z to probability correct x
corresponding frequencies
Cumulated gains in number correct / all
individuals words in the language to get the
total vocabulary gains from reading single text
sample

26
Conclusions from Vocabulary Simulations

LSA learns meanings similarities of words from
text, amount equivalent to test scores of
moderately competent English readers
Three-fourths of LSAs knowledge is a product of
indirect induction (the exposure of text not
containing the word)
Expression of hypothesis that word meanings grow
continuously and that correct performance is a
stochastic event governed by individual
differences in experience
i.e. word meanings are constantly in flux

27
Other Considerations

Neurocognitive Psychological Plausibility
Neural net models
Similarity to biological models
Parallels with memory
Meaning Independent of word order?
Contextual Disambiguation In LSA, words have
only one vector representation, thus only one
meaning

28
Mathematical Machine

Analogy a three-layered neural net

LAYER 1 WORD TYPE
LAYER 2 CONCEPTUAL REPRESENTATIONS
LAYER 3 TEXT WINDOW
29
Neural Net Analogy

Network is symmetrical can run in either
direction
Different computations made to assess similarity
between two episodes, event types, or an episode
and an event type

30
Similarity to Biological Models

Interneuronal communication
Vector multiplication between axons, dendrites
and cell bodies
Excitation is proportional to dot product of
output and sensitivities of surrounding neurons
Single-cell recordings
Population effects described as vector averages
of individual direction representations

31
Word-versus-context differenceAnalogy to
Episodic Semantic Memories

Word representations are semantic, meanings
abstracted and averaged from many experiences
Context representations are episodic, unique
combinations that occurred only once ever
Both words and episodes represented by same
defining dimensions, and relation to one another
is still retained

32
Word-versus-context difference Analogy to
Explicit Implicit Memories

Retrieving a context vector brings past happening
to mind - explicit memory
Retrieving a word vector instantiates abstraction
of many happenings brought together - implicit
memory

33
Meaning independent of word order?

Text segments treated as bags of words
LSA makes no use of word order, syntax or grammar
Despite assertions that scrambled sentences
would be worthless context for vocabulary
instruction (Durkin,1983), LSA acquires 100 of
its knowledge via scrambled sentences and still
performs relatively well at deciphering meaning

34
Expertise

LSA account of knowledge brings new perspective
for expertise
Simulated expert learns four times more about an
item per exposure than the simulated novice
LSA suggests that great masses of knowledge
contribute to superior performance by
Direct application of stored knowledge to a
problem
Greater ability to add new knowledge to long term
memory
To infer indirect relations among bits of
knowledge and to generalize from instances and
experience

35
Contextual Disambiguation

Frequency-weighted average of predicted usages
Acceptable for words that generate only one or a
few closely related meanings (majority of words)
Balanced homographs such as bear result in an
LSA vector that doesnt resemble any of their
major meanings
While LSAs single-vector representation cant
account for multiple word-meaning phenomena at
this stage, it is not a fatal flaw (local context
will aid in disambiguation)

36
Text Comprehension An LSA Interpretation of
Construction-Integration Theory

Research in which individual word senses arent
represented, but overall meaning of
phrases/sentences/paragraphs is constructed from
linear combination of their words
Vector average reflects overall topic or meaning
or passage

37
Criticisms/ Further Issues

Remember SVD is just one possible, simple case
for a model
Assumption All necessary semantic information is
gleaned from a words context (ex. love)
Linguistic structures (i.e. syntax) which show
obvious importance for derivation of meaning
should be incorporated

38
Educational Applications of LSA

Performance on college exams
Scoring the content of an essay
Selecting most appropriate text for learners with
different levels of background knowledge
Assisting students to summarize material

39
Performance on College Exams
40
Essay Grading
41
Demonstrations Write to Learn

Promotes writing skills and reading comprehension

42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
Demonstrations Intelligent Essay Assessor (IEA)

Assesses and critiques electronically submitted
essays
Provides assessment and feedback

47
(No Transcript)
48
DemonstrationSummary Street

Web-based reading comprehension and writing
instruction tool
Compares student summaries to each section of
text and provides feedback

49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
Demonstration Super Manual

Program that allows one to identify, develop, and
test better ways to organize and present
information customized to individual maintainers'
level of expertise

53
Educational Text Selection

Predicts how much readers will learn from texts
based on estimated conceptual knowledge of topic
and information present in the text they read

54
DemonstrationState the Essence!

LSA provides evaluations to student summaries of
text
Guides students toward content that had been
noted by experts to consider most significant
A way to measure reading comprehension
Summary writing requires construction of mental
representations that joins elements of text
information with each other and elements of prior
knowledge

55
Summary

People appear to know significantly more than
they could have learned from temporally local
experiences
Proposed induction method dependant on
reconstruction of system of multiple similarity
relations in high dimensional space
Implemented dimensionality-optimizing induction
though SVD matrix decomposition
Model scored as well as the mean scores of
foreign students on TOEFL exams
Model learned at a rate similar to
school-children and through induction from data
about other words
Because LSA didnt have access to word-similarity
information based on spoken language, morphology,
syntax, logic or perceptual word knowledge,
concluded that induction method is sufficient to
account for Platos paradox, at least in domain
of knowledge measured by synonym tests