Numerical Linear Algebra in the Streaming Model - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Numerical Linear Algebra in the Streaming Model

Description:

Given n x d matrix A and n x d' matrix B, we want estimators for. The matrix ... (singular value decomposition), where C and D are orthogonal matrices U and VSk ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 33
Provided by: IBMU328
Category:

less

Transcript and Presenter's Notes

Title: Numerical Linear Algebra in the Streaming Model


1
Numerical Linear Algebra in the Streaming Model
  • Ken Clarkson - IBM
  • David Woodruff - IBM

2
The Problems
  • Given n x d matrix A and n x d matrix B, we want
    estimators for
  • The matrix product AT B
  • The matrix X minimizing AX-B
  • A slightly generalized version of linear
    regression
  • Given integer k, the matrix Ak of rank k
    minimizing A-Ak
  • We consider the Frobenius matrix norm square
    root of sum of squares

3
General Properties of Our Algorithms
  • 1 pass over matrix entries, given in any order
    (allow multiple updates)
  • Maintain compressed versions or sketches of
    matrices
  • Do small work per entry to maintain the sketches
  • Output result using the sketches
  • Randomized approximation algorithms
  • Since we minimize space complexity, we restrict
    matrix entries to be O(log nc) bits, or O(log nd)
    bits

4
Matrix Compression Methods
  • In a line of similar efforts
  • Element-wise sampling AM01, AHK06
  • Row / column sampling pick small random subset
    of the rows, columns, or both DK01, DKM04,
    DMM08
  • Sketching / Random Projection maintain a small
    number of random linear combinations of rows or
    columns S06
  • Usually more than 1 pass
  • Here sketching

5
Outline
  • Matrix Product
  • Linear Regression
  • Low-rank approximation

6
An Optimal Matrix Product Algorithm
  • A and B have n rows, and a total of c columns,
    and we want to estimate ATB, so that Est-ATB
    eAB
  • Let S be an n x m sign (Rademacher) matrix
  • Each entry is 1 or -1 with probability ½
  • m small, set to O(log 1/ d) e-2
  • Entries are O(log 1/d)-wise independent
  • Observation
  • EATSSTB/m ATESSTB/m ATB
  • Wouldnt it be nice if all the algorithm did was
    maintain STA and STB, and output ATSSTB/m?

7
An Optimal Matrix Product Algorithm
  • This does work, and we are able to improve the
    previous dependence on m
  • New Tail Estimate for d, e gt 0, there is m
    O(log 1/ d) e-2 so that
  • PrATSSTB/m-ATB gt e A B d
  • (again C Si, j Ci, j21/2)
  • Follows from bounding O(log 1/d)-th moment of
    ATSSTB/m-ATB

8
Efficiency
  • Easy to maintain sketches given updates
  • O(m) time/update, O(mc log(nc)) bits of space for
    STA and STB
  • Improves Sarlos algorithm by a log c factor.
  • Sarlos algorithm based on JL Lemma
  • JL preserves all entries of ATB up to an additive
    error, whereas we only preserve overall error
  • Can compute ATSSTB/m via fast rectangular
    matrix multiplication

9
Matrix Product Lower Bound
  • Our algorithm is space-optimal for constant d
  • a new lower bound
  • Reduction from a communication game
  • Augmented Indexing, players Alice and Bob
  • Alice has random x 2 0,1s
  • Bob has random i 2 1, 2, , s
  • also xi1, , xs
  • Alice sends Bob one message
  • Bob should output xi with probability at least
    2/3
  • Theorem MNSW Message must be ?(s) bits on
    average

10
Lower Bound Proof
  • Set s (ce-2log cn)
  • Alice makes matrix U
  • Uses x1xs
  • Bob makes matrix U and B
  • Uses i and xi1, , xs
  • Alg input will be AUU and B
  • A and B are n x c/2
  • Alice
  • Runs streaming matrix product Alg on U
  • Sends Alg state to Bob
  • Bob continues Alg with A U U and B
  • ATB determines xi with probability at least 2/3
  • By choice of U, U, B
  • Solving Augmented Indexing
  • So space of Alg must be ?(s) ?(ce-2log cn) bits

11
Lower Bound Details
  • U U(1) U(2) , U(log (cn)) 0s
  • Each U(k) is an (e-2) x c/2 submatrix with
    entries in
  • -10k, 10k
  • U(k)i, j 10k if matched entry of x is 0, else
    U(k)i, j -10k
  • Bobs index i corresponds to U(k)i, j
  • U is such that A UU U(1) U(2) , U(k)
    0s
  • U is determined from xi1, , xs
  • ATB is i-th row of U(k)
  • A ¼ U(k) since the entries of A are
    geometrically increasing
  • e2A2 B2, the squared error, is small, so
    most entries of the approximation to ATB have the
    correct sign

12
Outline
  • Matrix Product
  • Linear Regression
  • Low-rank approximation

13
Linear Regression
  • The problem minX AX-B
  • X minimizing this has X A-B, where A- is the
    pseudo-inverse of A
  • Every matrix A USVT using singular value
    decomposition (SVD)
  • If A is n x d of rank k, then
  • U is n x k with orthonormal columns
  • S is k x k diagonal matrix, diagonal is positive
  • V is d x k with orthonormal columns
  • A- VS-1UT
  • Normal Equations ATA X ATB for optimal X

14
Linear Regression
  • Let S be an n x m sign matrix, m
    O(de-1log(1/d))
  • The algorithm is
  • Maintain STA and STB
  • Return X solving minX ST(AX-B)
  • Space is O(d2 e-1log(1/d)) words
  • Improves Sarlos space by log c factor
  • Space is optimal via new lower bound
  • Main claim With probability at least 1- d,
  • AX-B (1e)AX-B
  • That is, relative error for X is small

15
Regression Analysis
  • Why should X solving minX ST(AX-B) be good?
  • ST approximately preserves AX-B for fixed X
  • If this worked for all X, were done
  • ST must preserve norms even for X, chosen using
    S
  • First reduce to showing that A(X-X) is
    small
  • Use normal equation ATAX ATB
  • Implies AX-B2 AX-B2 A(X-X)2
  • Bounding A(X-X)2 equivalent to bounding
    UTA(X-X)2, where A USVT, from SVD, and U
    is an orthonormal basis of the columnspace of A

16
Regression Analysis Continued
  • Bounding 2 UTA(X-X)2
  • UTSSTU/m UTSSTU/m-
  • Normal equations in sketch space imply
  • (STA)T(STA)X (STA)T(STB)
  • UTSSTU UTSSTA(X-X)
  • UTSSTA(X-X) UTSST(B-AX)
  • UTSST(B-AX)
  • UTSSTU/m UTSST(B-AX)/m
  • (e/k)1/2UB-A
    X (new tail estimate)
  • e1/2 B-AX

17
Regression Analysis Continued
  • Hence, 2 UTA(X-X)2
  • UTSST/m UTSSTU/m-
  • e1/2 B-AX UTSSTU/m-
  • Recall the spectral norm A2 supx
    Ax/x
  • Implies CD C2 D
  • UTSSTU/m- UTSSTU/m-I 2
  • Subspace JL for m ?(k log(1/d)), ST
    approximately preserves lengths of all vectors in
    a k-space
  • UTSSTU/m- /2
  • 2e1/2AX-B
  • AX-B2 AX-B2 2 AX-B2
    4eAX-B2

18
Regression Lower Bound
  • Tight ?(d2 log (nd) e-1) space lower bound
  • Again a reduction from augmented indexing
  • This time more complicated
  • Embed log (nd) e-1 independent regression
    sub-problems into hard instance
  • Uses deletions and geometrically growing
    property, as in matrix product lower bound
  • Choose the entries of A and b so that the
    algorithms output x encodes some entries of A

19
Regression Lower Bound
  • Lower bound of ?(d2) already tricky because of
    bit complexity
  • Natural approach
  • Alice has random d x d sign matrix A-1
  • b is a standard basis vector ei
  • Alice computes A (A-1)-1 and puts it into the
    stream. Solution x to minx Axb is i-th
    column of A-1
  • Bob can isolate entries of A-1, solving indexing
  • Wrong A has entries that can be exponentially
    small!

20
Regression Lower Bound
  • We design A and b together (Aug. Index)

1 A1, 2 A1, 3 A1, 4 A1,5 0 1 A2, 3
A2, 4 A2,5 0 0 1 A3, 4
A3,5 0 0 0 1 A4,5 0 0
0 0 1
x1 x2 x3 x4 x5
0 A2,4 A3,4 1 0

x5 0, x4 1, x3 0, x2 0, x1 -A1,4
21
Outline
  • Matrix Product
  • Regression
  • Low-rank approximation

22
Best Low-Rank Approximation
  • For any matrix A and integer k, there is a matrix
    Ak of rank k that is closest to A among all
    matrices of rank k.
  • Since rank of Ak is k, it is the product CDT of
    two k-column matrices C and D
  • Ak can be found from the SVD (singular value
    decomposition), where C and D are orthogonal
    matrices U and VSk
  • This is a good compression of A
  • LSI, PCA, recommendation systems, clustering

23
Best Low-Rank Approximation
  • Previously, nothing was known for 1-pass low-rank
    approximation and relative error
  • Even for k 1, best upper bound O(nd log (nd))
    bits
  • Problem 28 of Mut can one get sublinear space?
  • We get 1-pass and O(ke-2(nde-2)log(nd)) space
  • Update time is O(ke-4), so total work is
    O(Nke-4), where N is the number of non-zero
    entries of A
  • New space lower bound shows optimal up to 1/e

24
Best Low-Rank Approximation and STA
  • The sketch STA holds information about A
  • In particular, there is a rank k matrix Ak in
    the rowspace of STA nearly as close to A as the
    closest rank k matrix Ak
  • The rowspace of STA is the set of linear
    combinations of its rows
  • That is, A-Ak (1e)A-Ak
  • Why is there such an Ak?

25
Low-Rank Approximation via Regression
  • Apply the regression results with A ! Ak, B ! A
  • The X minimizing ST(AkX-A) has
  • AkX-A (1 e)Ak X-A
  • But here X I, and X (ST Ak)- STA
  • So the matrix AkX Ak(STAk)-STA
  • Has rank k
  • In the rowspace of STA
  • Within 1e of smallest distance of any rank-k
    matrix

26
Low-Rank Approximation in 2 Passes
  • Cant use Ak(STAk)- STA without finding Ak
  • Instead maintain STA
  • Can show that if GT has orthonormal rows, then
    the best rank-k approximation to A in the
    rowspace of GT is AkGT, where Ak is the best
    rank-k approximation to AG
  • After 1st pass, compute orthonormal basis GT for
    rowspace of STA
  • In 2nd pass, maintain AG
  • Afterwards, compute Ak and AkGT

27
Low-Rank Approximation in 2 Passes
  • Ak is best rank-k approximation to AG
  • For any rank-k matrix Z,
  • AGGT AkGT AG-Ak
  • AG-Z
  • AGGT ZGT
  • For all Y (AGGT-YGT) (A-AGGT)T 0, so we can
    apply Pythagorean Theorem twice
  • A AkG A-AGGT AGGT AkGT
  • A-AGGT AGGT ZGT
  • A ZGT

28
1 Pass Algorithm
  • With high probability,() AX-B
    (1e)AX-B, where
  • X minimizes AX-B
  • X minimizes STAX-STB
  • Apply () with A ! AR and B ! A and X minimizing
    ST(ARX-A)
  • So X (STAR)-STA has ARX-A (1e)minX
    ARX-A
  • Columnspace of AR contains a (1e)-approximation
    to Ak
  • So, ARX-A (1e)minX ARX-A (1e)2
    minX A-Ak
  • Key idea ARX AR(STAR)-STA is
  • easy to compute in 1-pass with small space and
    fast update time
  • behaves like A (similar to SVD)
  • use it instead of A in our 2-pass algorithm!

29
1 Pass Algorithm
  • Algorithm
  • Maintain AR and STA
  • Compute AR(STAR)-STA
  • Let GT be an orthonormal basis for the rowspace
    of STA, as before
  • Output the best rank-k approximation to
  • AR(STAR)-STA in the rowspace of STA
  • Same as 2-pass algorithm except we dont need a
    second pass to project A onto the rowspace of STA
  • Analysis is similar to that for regression

30
A Lower Bound
binary string x
matrix A
index i and xi1, xi2,
k e-1 columns per block
0s
k rows
-1000, -1000 -1000, 1000 1000, 1000

0, 0 0, 0 0, 0
10000, -10000 -10000, -10000 -10000, -10000
0, 0 0, 0 0, 0
0 0
10, -10 -10, 10 -10,-10
-100, -100 100, 100 100, 100
n-k rows
Error now dominated by block of interest Bob
also inserts a k x k identity submatrix into
block of interest
31
Lower Bound Details
Block of interest
k e-1 columns
k rows
0s
0s
PIk
n-k rows

Bob inserts k x k identity submatrix, scaled by
large value P Show any rank-k approximation must
err on all of shaded region So good rank-k
approximation likely has correct sign on Bobs
entry
32
Concluding Remarks
  • Space bounds are tight for product, regression
  • Sharpen prior upper bounds
  • Prove optimal lower bounds
  • Space bounds off by a factor of e-1 for low-rank
    approximation
  • First sub-linear (and near-optimal) 1-pass
    algorithm
  • We have better upper bounds for restricted cases
  • Improve the dependence on e in the update time
  • Lower bounds for multi-pass algorithms?
Write a Comment
User Comments (0)
About PowerShow.com