Numerical Linear Algebra in the Streaming Model - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Numerical Linear Algebra in the Streaming Model

Description:

Given n x d matrix A and n x d' matrix B, we want estimators for. The matrix ... (singular value decomposition), where C and D are orthogonal matrices U and VSk ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 33

Provided by: IBMU328

Category:

more less

Transcript and Presenter's Notes

Title: Numerical Linear Algebra in the Streaming Model

1
Numerical Linear Algebra in the Streaming Model

Ken Clarkson - IBM
David Woodruff - IBM

2
The Problems

Given n x d matrix A and n x d matrix B, we want
estimators for
The matrix product AT B
The matrix X minimizing AX-B
A slightly generalized version of linear
regression
Given integer k, the matrix Ak of rank k
minimizing A-Ak
We consider the Frobenius matrix norm square
root of sum of squares

3
General Properties of Our Algorithms

1 pass over matrix entries, given in any order
(allow multiple updates)
Maintain compressed versions or sketches of
matrices
Do small work per entry to maintain the sketches
Output result using the sketches
Randomized approximation algorithms
Since we minimize space complexity, we restrict
matrix entries to be O(log nc) bits, or O(log nd)
bits

4
Matrix Compression Methods

In a line of similar efforts
Element-wise sampling AM01, AHK06
Row / column sampling pick small random subset
of the rows, columns, or both DK01, DKM04,
DMM08
Sketching / Random Projection maintain a small
number of random linear combinations of rows or
columns S06
Usually more than 1 pass
Here sketching

5
Outline

Matrix Product
Linear Regression
Low-rank approximation

6
An Optimal Matrix Product Algorithm

A and B have n rows, and a total of c columns,
and we want to estimate ATB, so that Est-ATB
eAB
Let S be an n x m sign (Rademacher) matrix
Each entry is 1 or -1 with probability ½
m small, set to O(log 1/ d) e-2
Entries are O(log 1/d)-wise independent
Observation
EATSSTB/m ATESSTB/m ATB
Wouldnt it be nice if all the algorithm did was
maintain STA and STB, and output ATSSTB/m?

7
An Optimal Matrix Product Algorithm

This does work, and we are able to improve the
previous dependence on m
New Tail Estimate for d, e gt 0, there is m
O(log 1/ d) e-2 so that
PrATSSTB/m-ATB gt e A B d
(again C Si, j Ci, j21/2)
Follows from bounding O(log 1/d)-th moment of
ATSSTB/m-ATB

8
Efficiency

Easy to maintain sketches given updates
O(m) time/update, O(mc log(nc)) bits of space for
STA and STB
Improves Sarlos algorithm by a log c factor.
Sarlos algorithm based on JL Lemma
JL preserves all entries of ATB up to an additive
error, whereas we only preserve overall error
Can compute ATSSTB/m via fast rectangular
matrix multiplication

9
Matrix Product Lower Bound

Our algorithm is space-optimal for constant d
a new lower bound
Reduction from a communication game
Augmented Indexing, players Alice and Bob
Alice has random x 2 0,1s
Bob has random i 2 1, 2, , s
also xi1, , xs
Alice sends Bob one message
Bob should output xi with probability at least
2/3
Theorem MNSW Message must be ?(s) bits on
average

10
Lower Bound Proof

Set s (ce-2log cn)
Alice makes matrix U
Uses x1xs
Bob makes matrix U and B
Uses i and xi1, , xs
Alg input will be AUU and B
A and B are n x c/2
Alice
Runs streaming matrix product Alg on U
Sends Alg state to Bob
Bob continues Alg with A U U and B
ATB determines xi with probability at least 2/3
By choice of U, U, B
Solving Augmented Indexing
So space of Alg must be ?(s) ?(ce-2log cn) bits

11
Lower Bound Details

U U(1) U(2) , U(log (cn)) 0s
Each U(k) is an (e-2) x c/2 submatrix with
entries in
-10k, 10k
U(k)i, j 10k if matched entry of x is 0, else
U(k)i, j -10k
Bobs index i corresponds to U(k)i, j
U is such that A UU U(1) U(2) , U(k)
0s
U is determined from xi1, , xs
ATB is i-th row of U(k)
A ¼ U(k) since the entries of A are
geometrically increasing
e2A2 B2, the squared error, is small, so
most entries of the approximation to ATB have the
correct sign

12
Outline

Matrix Product
Linear Regression
Low-rank approximation

13
Linear Regression

The problem minX AX-B
X minimizing this has X A-B, where A- is the
pseudo-inverse of A
Every matrix A USVT using singular value
decomposition (SVD)
If A is n x d of rank k, then
U is n x k with orthonormal columns
S is k x k diagonal matrix, diagonal is positive
V is d x k with orthonormal columns
A- VS-1UT
Normal Equations ATA X ATB for optimal X

14
Linear Regression

Let S be an n x m sign matrix, m
O(de-1log(1/d))
The algorithm is
Maintain STA and STB
Return X solving minX ST(AX-B)
Space is O(d2 e-1log(1/d)) words
Improves Sarlos space by log c factor
Space is optimal via new lower bound
Main claim With probability at least 1- d,
AX-B (1e)AX-B
That is, relative error for X is small

15
Regression Analysis

Why should X solving minX ST(AX-B) be good?
ST approximately preserves AX-B for fixed X
If this worked for all X, were done
ST must preserve norms even for X, chosen using
S
First reduce to showing that A(X-X) is
small
Use normal equation ATAX ATB
Implies AX-B2 AX-B2 A(X-X)2
Bounding A(X-X)2 equivalent to bounding
UTA(X-X)2, where A USVT, from SVD, and U
is an orthonormal basis of the columnspace of A

16
Regression Analysis Continued

Bounding 2 UTA(X-X)2
UTSSTU/m UTSSTU/m-
Normal equations in sketch space imply
(STA)T(STA)X (STA)T(STB)
UTSSTU UTSSTA(X-X)
UTSSTA(X-X) UTSST(B-AX)
UTSST(B-AX)
UTSSTU/m UTSST(B-AX)/m
(e/k)1/2UB-A
X (new tail estimate)
e1/2 B-AX

17
Regression Analysis Continued

Hence, 2 UTA(X-X)2
UTSST/m UTSSTU/m-
e1/2 B-AX UTSSTU/m-
Recall the spectral norm A2 supx
Ax/x
Implies CD C2 D
UTSSTU/m- UTSSTU/m-I 2
Subspace JL for m ?(k log(1/d)), ST
approximately preserves lengths of all vectors in
a k-space
UTSSTU/m- /2
2e1/2AX-B
AX-B2 AX-B2 2 AX-B2
4eAX-B2

18
Regression Lower Bound

Tight ?(d2 log (nd) e-1) space lower bound
Again a reduction from augmented indexing
This time more complicated
Embed log (nd) e-1 independent regression
sub-problems into hard instance
Uses deletions and geometrically growing
property, as in matrix product lower bound
Choose the entries of A and b so that the
algorithms output x encodes some entries of A

19
Regression Lower Bound

Lower bound of ?(d2) already tricky because of
bit complexity
Natural approach
Alice has random d x d sign matrix A-1
b is a standard basis vector ei
Alice computes A (A-1)-1 and puts it into the
stream. Solution x to minx Axb is i-th
column of A-1
Bob can isolate entries of A-1, solving indexing
Wrong A has entries that can be exponentially
small!

20
Regression Lower Bound

We design A and b together (Aug. Index)

1 A1, 2 A1, 3 A1, 4 A1,5 0 1 A2, 3
A2, 4 A2,5 0 0 1 A3, 4
A3,5 0 0 0 1 A4,5 0 0
0 0 1
x1 x2 x3 x4 x5
0 A2,4 A3,4 1 0

x5 0, x4 1, x3 0, x2 0, x1 -A1,4
21
Outline

Matrix Product
Regression
Low-rank approximation

22
Best Low-Rank Approximation

For any matrix A and integer k, there is a matrix
Ak of rank k that is closest to A among all
matrices of rank k.
Since rank of Ak is k, it is the product CDT of
two k-column matrices C and D
Ak can be found from the SVD (singular value
decomposition), where C and D are orthogonal
matrices U and VSk
This is a good compression of A
LSI, PCA, recommendation systems, clustering

23
Best Low-Rank Approximation

Previously, nothing was known for 1-pass low-rank
approximation and relative error
Even for k 1, best upper bound O(nd log (nd))
bits
Problem 28 of Mut can one get sublinear space?
We get 1-pass and O(ke-2(nde-2)log(nd)) space
Update time is O(ke-4), so total work is
O(Nke-4), where N is the number of non-zero
entries of A
New space lower bound shows optimal up to 1/e

24
Best Low-Rank Approximation and STA

The sketch STA holds information about A
In particular, there is a rank k matrix Ak in
the rowspace of STA nearly as close to A as the
closest rank k matrix Ak
The rowspace of STA is the set of linear
combinations of its rows
That is, A-Ak (1e)A-Ak
Why is there such an Ak?

25
Low-Rank Approximation via Regression

Apply the regression results with A ! Ak, B ! A
The X minimizing ST(AkX-A) has
AkX-A (1 e)Ak X-A
But here X I, and X (ST Ak)- STA
So the matrix AkX Ak(STAk)-STA
Has rank k
In the rowspace of STA
Within 1e of smallest distance of any rank-k
matrix

26
Low-Rank Approximation in 2 Passes

Cant use Ak(STAk)- STA without finding Ak
Instead maintain STA
Can show that if GT has orthonormal rows, then
the best rank-k approximation to A in the
rowspace of GT is AkGT, where Ak is the best
rank-k approximation to AG
After 1st pass, compute orthonormal basis GT for
rowspace of STA
In 2nd pass, maintain AG
Afterwards, compute Ak and AkGT

27
Low-Rank Approximation in 2 Passes

Ak is best rank-k approximation to AG
For any rank-k matrix Z,
AGGT AkGT AG-Ak
AG-Z
AGGT ZGT
For all Y (AGGT-YGT) (A-AGGT)T 0, so we can
apply Pythagorean Theorem twice
A AkG A-AGGT AGGT AkGT
A-AGGT AGGT ZGT
A ZGT

28
1 Pass Algorithm

With high probability,() AX-B
(1e)AX-B, where
X minimizes AX-B
X minimizes STAX-STB
Apply () with A ! AR and B ! A and X minimizing
ST(ARX-A)
So X (STAR)-STA has ARX-A (1e)minX
ARX-A
Columnspace of AR contains a (1e)-approximation
to Ak
So, ARX-A (1e)minX ARX-A (1e)2
minX A-Ak
Key idea ARX AR(STAR)-STA is
easy to compute in 1-pass with small space and
fast update time
behaves like A (similar to SVD)
use it instead of A in our 2-pass algorithm!

29
1 Pass Algorithm

Algorithm
Maintain AR and STA
Compute AR(STAR)-STA
Let GT be an orthonormal basis for the rowspace
of STA, as before
Output the best rank-k approximation to
AR(STAR)-STA in the rowspace of STA
Same as 2-pass algorithm except we dont need a
second pass to project A onto the rowspace of STA
Analysis is similar to that for regression

30
A Lower Bound
binary string x
matrix A
index i and xi1, xi2,
k e-1 columns per block
0s
k rows
-1000, -1000 -1000, 1000 1000, 1000

0, 0 0, 0 0, 0
10000, -10000 -10000, -10000 -10000, -10000
0, 0 0, 0 0, 0
0 0
10, -10 -10, 10 -10,-10
-100, -100 100, 100 100, 100
n-k rows
Error now dominated by block of interest Bob
also inserts a k x k identity submatrix into
block of interest
31
Lower Bound Details
Block of interest
k e-1 columns
k rows
0s
0s
PIk
n-k rows

Bob inserts k x k identity submatrix, scaled by
large value P Show any rank-k approximation must
err on all of shaded region So good rank-k
approximation likely has correct sign on Bobs
entry
32
Concluding Remarks