PPT – FINA PowerPoint presentation | free to view

About This Presentation

Title:

FINA

Description:

Number of Views:39

Avg rating:3.0/5.0

Slides: 12

Provided by: cseOhi

Category:

Tags: fina | calculation

Transcript and Presenter's Notes

Title: FINA

1

2
Problems addressed this week

LSI Program
SVD Processing Time
Memory requirements
Similarity metric calculation between documents
Big numbers for all results
Implement Non-Linearized document matrix to
compare results with Linearized Version

3
Resolved SVD bottleneck

Tried different implementations of SVD
one in Java that promised sparse matrix
operation, but the svd was using dense matrix
implementation
one in Java that was importing a package from
Fortran and had not much documentation
one in C that had a rather difficult syntax
Choose SVDLIB tool to compute SVD (C
implementation)
very fast
implementation optimized for sparse matrices
calculation can be done for only a fixed number
of singular values desired (in our case, 200)
easy to use

4
Speed Memory Improvements in LSA program

5
SVD statistics

SVDLIB times and statistics
First matrix (3462 x 1312)
MATRIX DENSITY 3.36
MAX. NO. OF EIGENPAIRS 200
ELAPSED CPU TIME 9.52 sec.
(9.59 sec. with double format In S the
difference is in only one element, at the 5th
decimal. Very smaller differences in the V,U
matrices also.)
10K Matrix (9367 x 10047)
MATRIX DENSITY 1.46
MAX. NO. OF EIGENPAIRS 200
ELAPSED CPU TIME 38.77 sec.
(for all the eigenvalues approx 15 hours)

6
Linearization of the Document Matrix

The DS matrix, which represents the documents
vectors has initially values between -1,1 (the
vectors have norm1)
We decided to keep short int data types, which
are much faster and take less space
each value DS(i,j)in the matrix has been
transform
(int )((DS(i,j)126)128))
New values are between 2 , 254
With this linearization the cosine is not just
the inner product

7
COS calculation
8
Linearized vs. Non-linearized Document Matrix

We implemented both Linearized and Non
Linearized searches and compared the results

9
How the search is executed in FINA

10
Linearized vs. Non-linearized Document Matrix

11
Future plans

Split the document feature vector into 2 tables
each of 100 dimensions and see whether execution
time improves
Attempt to implement relevance feedback
Change the interface and adapt it to the comments
and new features

Write a Comment

User Comments (0)