Title: 1 Introduction
1(No Transcript)
21 Introduction
- ICA proposed as a useful technique for
findingmeaningful directions in multivariate
data - The objective function affects the form of
potential structurediscovered - Here, the problem is partitioning and analysis of
sparse multivariate data - Prior knowledge is used to derive a
computationally inexpensive ICA
32 Introduction, continued
- Two complementary architectures
- Skewness (asymmetry) is the right objective to
optimize - The two tasks will be unified in a single
algorithm - Result - fast convergence -
computational cost linear in training points
separate
Observeddocuments
Documentprototypes
separate
Observedwords
Topic-features
43 Data Representation
- Vector space representation document t1,
t2, . . . , tT T - T number of words in the dictionary (tens of
thousands) - elements are binary indicators or frequencies
sparse representation - D term ? document matrix (T ? N, N number
of documents)
54 Preprocessing
- Assumption observations noisy expansion of
some denser group oflatent topics - Number of clusters or topics set a priori
- K-dimensional LSA space USED AS
topic-concepts subspace - PCA may lose important data componentssparsity
infrequent, meaningful correlation
less concern - Reconstruction D DKUEVT
65 Prototype Documents from
a Corpus
- Assumption documents noisy linear mixture
of (independent) document prototypes - N. of prototypes n. of topics
prototypes reside in LSA-space (K
dimensions) - Data projection onto right eigenvectors
variance normalization X(1)E-1VT(DT)UT
(K ? T matrix) - Task find mixing matrix W(1), source documents
S(1) so that X(1)W(1)TS(1) (S(1) K ?
T matrix)
76 Prototype Documents from a
Corpus, continued
- Basis vectors of topic space assumed different
to separate prototypes, find independent
componentsWords in documents are distributed in
a positively skewed way - Search restricted to skewed (perhaps asymmetric)
distributions - LSA unmixing matrix must be orthogonal (
W(1)-1W(1)T )
W(1)E-1VT
87 Prototype Documents from a Corpus,
continued
- Objective Skewness measure
Fisher-skewness - Prior knowledge small component mean
projection variance restricted to unity
Simplified objective G(s) ( 3rd order moment) - Prevent degenerate solutions Restrict
wTw1 for stationary points - Solve with gradient methods or iteratively
98 Prototype Documents from a Corpus,
continued
- Sources positive ? is positive (output
sign is relevant!) - K orthonormal projection directions matrix
iteration - Similar to approximate Newton-Raphson
optimization(FastICA type derivation
small additional term) - Computational complexity O(2K2T KT 4K3)
109 Topic Features from
Word Features
- Assumption terms noisy linear expansion of
(independent) concepts (topics) - Data compression X(2)E-1UT(D)VT
(K ? N matrix) - Task find unmixing matrix W(2), topic features
S(2) so that X(2)W(2)TS(2) (S(2) K
? N matrix) - This time, use a Clustering criterion
1110 Topic Features from Word
Features, continued
W(2)E-1UT
- Objective function (zkn indicate class of xn)
- Stochastic minimization EM-type algorithm
1211 Topic Features from Word
Features, continued
- Comparison approach set of binary classifiers
algorithm - Maximizes skewed, monotonic increasing
function of topic sk skewed prior is
appropriate - Variance normalized after LSA, independent
topics source components aligned to
orthonormal axes - Similar to previous architechture
1312 Combining the
Tasks
- Joint optimization problem
- Information from linear outputs and from weights
are complementary Topic clustering weight
peaks representative words projections
clustering information Document
weight peaks clustering information
prototype search projections index
terms - Review the separating weights on
D W(2)TE-1UT
1413 Combining the Tasks,
continued
- Whitening allows inspection but isn't practical
normalize variance along the K principal
directions! D' UE-1UTD - Find new unmixing matrix to maximize
W(2') G(W(2')TUTD') ...
G(W(2')TX(2)) W(2') W(2) - Solve the relation W(2)TUTS(1) W(1)TUTS(
1) - Rewrite objective concatenate data UT,
VT
W(1)W(2)W
1514 Combining the Tasks,
continued
- Resultant algorithm O(2K2(T N) K(T N)
4K3) Inputs D, K 1. Decompose D with
Lanczos algorithm. Retain K
first singular values. Obtain U, E, V. 2. Let X
UT, VT 3. Iterate until convergence O
utputs S?RK?(TN) , W?RK?K - S T document prototypes N topic-features, W
structure information of identified
topics in the corpus
1615 Simulations
1. Newsgroup data ('sci.crypt', 'sci.med',
'sci.space', 'soc.religion.christian')
kei effect space
peopl encrypt year nasa
christian system call
orbit god chip peopl
dai rutger secur medic
year thing govern question
system church clipper ve
high bibl public
doctor launch question
peopl find man part
escrow patient scienc
find comput studi engin
christ
medic god space kei
patient christian nasa
encrypt year peopl orbit
secur effect rutger
launch govern diseas thing
dai system doctor bibl
mission clipper studi christ
flight chip health
understand engin public call
church shuttl escrow
test point system de
physician question scienc
law
10 most representative words 10 most frequent
wordsselected by algorithm conformal with
human labeling
people god dai
space sex church man
year system issu group life
nasa shuttl term thing
love moon design sexual
year christian jpl
research basi find live earth
cost respons question jesu
orbit human homosexu bibl
christ part discuss refer
read rutger gov
launch fornic faith human ron
dr intercours issu save
venu station law
Simulation 2.10 most representative words,using
5 topics and 2 document classes('sci.space',
'soc.religion.christian')
I II III IV V
1716 Conclusions
Dependency structure of the splitting in
simulation 2
sci.space soc.religion.christian space
shuttle space shuttle christian
christian christian design (IV) mission
(III) church (I) religion (II)
morality (V)
- Clustering and keyword identification by ICA
variant that maximizes skewness - Key assumption asymmetrical latent prior
- Joint problem solved (D and DT)
'spatio-temporal' ICA - Algorithm is linear in number of documents,
O(K2N) - Fast convergence (3 - 8 steps)
- Potential number of topics can be greater than
indicated bya human labeler discover
subtopics - Hierarchical partitioning possible (recursive
binary splits)
1817 Further Work
x'sci.crypt', o'sci.space', ?'sci.med',
'soc.religion.christian'
1
2
3
- Study links with other methods improve
flexibility - Or develop a mechanismto allow more
structuredrepresentation, in a mixed or
hierarchical manner
- For example build in model-estimation to the
algorithm - Relax equal wk norm assumption