Title: Numerical Linear Algebra in the Streaming Model
1Numerical Linear Algebra in the Streaming Model
- Ken Clarkson - IBM
- David Woodruff - IBM
2The Problems
- Given n x d matrix A and n x d matrix B, we want
estimators for - The matrix product AT B
- The matrix X minimizing AX-B
- A slightly generalized version of least-squares
regression - Given integer k, the matrix Ak of rank k
minimizing A-Ak - We consider the Frobenius matrix norm root of
sum of squares
3General Properties of Our Algorithms
- 1 pass over matrix entries, given in any order
(allow multiple updates) - Maintain compressed versions or sketches of
matrices - Do small work per entry to maintain the sketches
- Output result using the sketches
- Randomized approximation algorithms
4Matrix Compression Methods
- In a line of similar efforts
- Element-wise sampling AM01, AHK06
- Sketching / Random Projection maintain a small
number of random linear combinations of rows or
columns S06 - Row / column sampling DK01, DKM04, DMM08
- Usually more than 1 pass
- Here sketching
5Outline
- Matrix Product
- Regression
- Low-rank approximation
6Approximate Matrix Product
- A and B have n rows, we want to estimate ATB
- Let S be an n x m sign (Rademacher) matrix
- Each entry is 1 or -1 with probability ½
- m is small, to be specified
- Entries are O(log 1/d)-wise independent
- Sketches are STA and STB
- Our estimate of ATB is ATSSTB/m
- Easy to maintain sketches given updates
- O(m) time per update, O(mc log(nc)) bits of space
for STA and STB - Output ATSSTB/m using fast rectangular matrix
multiplication
7Expected Error and a Tail Estimate
- Using linearity of expectation,
- EATSSTB/m ATESSTB/m ATB
- Moreover, for d, e gt 0, there is m O(log 1/ d)
e-2 so that - PrATSSTB/m-ATB gt e A B d
- (again C Si, j Ci, j21/2)
- This tail estimate seems to be new
- Follows from bounding O(log 1/d)-th moment of
ATSSTB/m-ATB - Improves space of SarlÓs 1-pass algorithm by a
log c factor
8Matrix Product Lower Bound
- Our algorithm is space-optimal for constant d
- a new lower bound
- Reduction from a communication game
- Augmented Indexing, players Alice and Bob
- Alice has random x 2 0,1s
- Bob has random i 2 1, 2, , s
- also xi1, , xs
- Alice sends Bob one message
- Bob should output xi with probability at least
2/3 - Theorem MNSW Message must be ?(s) bits on
average
9Lower Bound Proof
- Set s (ce-2log cn)
- Alice makes matrix U
- Uses x1xs
- Bob makes matrix U and B
- Uses i and xi1, , xs
- Alg input will be AUU and B
- A and B are n x c/2
- Alice
- Runs streaming matrix product Alg on U
- Sends Alg state to Bob
- Bob continues Alg with A U U and B
- ATB determines xi with probability at least 2/3
- By choice of U, U, B
- Solving Augmented Indexing
- So space of Alg must be ?(s) ?(ce-2log cn) bits
10Lower Bound Proof
- U U(1) U(2) , U(log (cn)) 0s
- U(k) is an (e-2) x c/2 submatrix with entries in
-10k, 10k - U(k)i, j 10k if matched entry of x is 0, else
U(k)i, j -10k - Bobs index i corresponds to U(k)i, j
- U is such that A UU U(1) U(2) , U(k)
0s - U is determined from xi1, , xs
- ATB is i-th row of U(k)
- A ¼ U(k) since the entries of A grow
geometrically - e2A2 B2, the squared error, is small, so
most entries of the approximation to ATB have the
correct sign
11Linear Regression
- The problem minX AX-B
- X minimizing this has X A-B, where A- is the
pseudo-inverse of A - The algorithm is
- Maintain STA and STB
- Return X solving minX ST(AX-B)
- Main claim if A has rank k, there is an
- m O(ke-1log(1/d)) so that with probability
at least 1- d, AX-B (1e)AX-B - That is, relative error for X is small
- A new lower bound (omitted here) shows our space
is optimal
12Regression Analysis
- Why should X be good?
- First reduce to showing that A(X-X) is
small - Use normal equation ATAX ATB
- Implies AX-B2 AX-B2 A(X-X)2
13Regression Analysis Continued
- Bound A(X-X)2 using several tools
- Normal equation (STA)T(STA)X (STA)T(STB)
- Our tail estimate for matrix product
- Subspace JL for m O(ke-1log(1/d)), ST
approximately preserves lengths of all vectors in
a k-space - Overall, A(X-X)2 O(e)AX-B2
- But AX-B2 AX-B2 A(X-X)2
14Best Low-Rank Approximation
- For any matrix A and integer k, there is a matrix
Ak of rank k that is closest to A among all
matrices of rank k. - LSI, PCA, recommendation systems, clustering
- The sketch STA holds information about A
- There is a rank k matrix Ak in the rowspace of
STA so that A-Ak (1e)A-Ak
15Low-Rank Approximation via Regression
- Why is there such an Ak? Apply the regression
results with A ! Ak, B ! A - The X minimizing ST(AkX-A) has
- AkX-A (1 e)Ak X-A
- But here X I, and X (ST Ak)- STA
- So the matrix AkX Ak(STAk)-STA
- Has rank k
- Is in the rowspace of STA
- Is within 1 e of the smallest distance of rank-k
matrix - Problem seems to require 2 passes
161 Pass Algorithm
- Suppose R is a d x m sign matrix. By regression
results transposed, the columnspace of AR
contains a good rank-k approximation to A - X minimizing ARX-A has ARX-A (1
e)A-Ak - Apply regression results with A ! AR and B ! A
and X minimizing ST(ARX-A) - So X(STAR)-STA has
- ARX-A (1 e)ARX-A (1e)2A-Ak
- Algorithm maintains AR and STA, and computes
- ARX AR(STAR)-STA
- Compute best rank-k approximation to ARX in the
columnspace of AR (ARX has rank ke-2)
17Concluding Remarks
- Space bounds are tight for product, regression
- Get 1-pass and O(ke-2(nde-2)log(nd)) space for
low-rank approximation. - First sublinear 1-pass streaming algorithm with
relative error - Show our space is almost optimal via new lower
bound (omitted) - Total work is O(Nke-4), where N is the number of
non-zero entries of A - In next talk, NDT give a faster algorithm for
dense matrices - Improve the dependence on e
- Look at tight bounds for multiple passes
18A Lower Bound
binary string x
matrix A
index i and xi1, xi2,
k e-1 columns per block
0s
k rows
-1000, -1000 -1000, 1000 1000, 1000
0, 0 0, 0 0, 0
10000, -10000 -10000, -10000 -10000, -10000
0, 0 0, 0 0, 0
0 0
10, -10 -10, 10 -10,-10
-100, -100 100, 100 100, 100
n-k rows
Error now dominated by block of interest Bob
also inserts a k x k identity submatrix into
block of interest
19Lower Bound Details
Block of interest
k e-1 columns
k rows
0s
0s
PIk
n-k rows
Bob inserts k x k identity submatrix, scaled by
large value P Show any rank-k approximation must
err on all of shaded region So good rank-k
approximation likely has correct sign on Bobs
entry