Title: Sparse Matrix Methods
1Sparse Matrix Methods
- Day 1 Overview
- Day 2 Direct methods
- Day 3 Iterative methods
-
- The conjugate gradient algorithm
- Parallel conjugate gradient and graph
partitioning - Preconditioning methods and graph coloring
- Domain decomposition and multigrid
- Krylov subspace methods for other problems
- Complexity of iterative and direct methods
2SuperLU-dist Iterative refinement to improve
solution
- Iterate
- r b Ax
- backerr maxi ( ri / (Ax b)i )
- if backerr lt e or backerr gt lasterr/2 then
stop iterating - solve LUdx r
- x x dx
- lasterr backerr
- repeat
- Usually 0 3 steps are enough
3(Matlab demo)
4Convergence analysis of iterative refinement
Let C I A(LU)-1 so A (I C)(LU)
x1 (LU)-1b r1 b Ax1 (I
A(LU)-1)b Cb dx1 (LU)-1 r1 (LU)-1Cb x2
x1dx1 (LU)-1(I C)b r2 b Ax2
(I (I C)(I C))b C2b . . . In general,
rk b Axk Ckb Thus rk ? 0 if
largest eigenvalue of C lt 1.
5The Landscape of Sparse Axb Solvers
D
6Conjugate gradient iteration
x0 0, r0 b, p0 r0 for k 1, 2,
3, . . . ak (rTk-1rk-1) / (pTk-1Apk-1)
step length xk xk-1 ak pk-1
approx solution rk rk-1 ak
Apk-1 residual ßk
(rTk rk) / (rTk-1rk-1) improvement pk
rk ßk pk-1
search direction
- One matrix-vector multiplication per iteration
- Two vector dot products per iteration
- Four n-vectors of working storage
7Conjugate gradient Krylov subspaces
- Eigenvalues Au ?u ?1, ?2 ,
. . ., ?n - Cayley-Hamilton theorem
- (A ?1I)(A ?2I) (A ?nI) 0
- Therefore S ciAi 0 for some ci
- so A-1 S (ci/c0) Ai1
- Krylov subspace
- Therefore if Ax b, then x A-1 b and
- x ? span (b, Ab, A2b, . . ., An-1b) Kn (A, b)
0 ? i ? n
1 ? i ? n
8Conjugate gradient Orthogonal sequences
- Krylov subspace Ki (A, b) span (b, Ab, A2b, .
. ., Ai-1b) - Conjugate gradient algorithm for i 1, 2, 3,
. . . find xi ? Ki (A, b) such that ri
(Axi b) ? Ki (A, b) - Notice ri ? Ki1 (A, b), so ri ? rj for all
j lt i - Similarly, the directions are
A-orthogonal (xi xi-1 )TA (xj xj-1 ) 0 - The magic Short recurrences. . . A is symmetric
gt can get next residual and direction from
the previous one, without saving them all.
9Conjugate gradient Convergence
- In exact arithmetic, CG converges in n steps
(completely unrealistic!!) - Accuracy after k steps of CG is related to
- consider polynomials of degree k that are equal
to 1 at 0. - how small can such a polynomial be at all the
eigenvalues of A? - Thus, eigenvalues close together are good.
- Condition number ?(A) A2 A-12
?max(A) / ?min(A) - Residual is reduced by a constant factor by
O(?1/2(A)) iterations of CG.
10(Matlab demo)
- CG on grid5(15) and bcsstk08
- n steps of CG on bcsstk08
11Conjugate gradient Parallel implementation
- Lay out matrix and vectors by rows
- Hard part is matrix-vector product
y Ax - Algorithm
- Each processor j
- Broadcast x(j)
- Compute y(j) A(j,)x
- May send more of x than needed
- Partition / reorder matrix to reduce communication
12(Matlab demo)
- 2-way partition of eppstein mesh
- 8-way dice of eppstein mesh
13Preconditioners
- Suppose you had a matrix B such that
- condition number ?(B-1A) is small
- By z is easy to solve
- Then you could solve (B-1A)x B-1b instead of Ax
b - B A is great for (1), not for (2)
- B I is great for (2), not for (1)
- Domain-specific approximations sometimes work
- B diagonal of A sometimes works
- Or, bring back the direct methods technology. . .
14(Matlab demo)
- bcsstk08 with diagonal precond
15Incomplete Cholesky factorization (IC, ILU)
- Compute factors of A by Gaussian elimination,
but ignore fill - Preconditioner B RTR ? A, not formed explicitly
- Compute B-1z by triangular solves (in time
nnz(A)) - Total storage is O(nnz(A)), static data structure
- Either symmetric (IC) or nonsymmetric (ILU)
16(Matlab demo)
17Incomplete Cholesky and ILU Variants
- Allow one or more levels of fill
- unpredictable storage requirements
- Allow fill whose magnitude exceeds a drop
tolerance - may get better approximate factors than levels of
fill - unpredictable storage requirements
- choice of tolerance is ad hoc
- Partial pivoting (for nonsymmetric A)
- Modified ILU (MIC) Add dropped fill to
diagonal of U or R - A and RTR have same row sums
- good in some PDE contexts
18Incomplete Cholesky and ILU Issues
- Choice of parameters
- good smooth transition from iterative to direct
methods - bad very ad hoc, problem-dependent
- tradeoff time per iteration (more fill gt more
time) vs of iterations (more
fill gt fewer iters) - Effectiveness
- condition number usually improves (only) by
constant factor (except MIC for some problems
from PDEs) - still, often good when tuned for a particular
class of problems - Parallelism
- Triangular solves are not very parallel
- Reordering for parallel triangular solve by graph
coloring
19(Matlab demo)
20Sparse approximate inverses
- Compute B-1 ? A explicitly
- Minimize B-1A I F (in parallel, by
columns) - Variants factored form of B-1, more fill, . .
- Good very parallel
- Bad effectiveness varies widely
21Support graph preconditioners example
Vaidya
G(A)
G(B)
- A is symmetric positive definite with negative
off-diagonal nzs - B is a maximum-weight spanning tree for A (with
diagonal modified to preserve row sums) - factor B in O(n) space and O(n) time
- applying the preconditioner costs O(n) time per
iteration
22Support graph preconditioners example
G(A)
G(B)
- support each edge of A by a path in B
- dilation(A edge) length of supporting path in B
- congestion(B edge) of supported A edges
- p max congestion, q max dilation
- condition number ?(B-1A) bounded by pq (at most
O(n2))
23Support graph preconditioners example
G(A)
G(B)
- can improve congestion and dilation by adding a
few strategically chosen edges to B - cost of factorsolve is O(n1.75), or O(n1.2) if A
is planar - in recent experiments Chen Toledo, often
better than drop-tolerance MIC for 2D problems,
but not for 3D.
24Domain decomposition (introduction)
B 0 E
0 C F
ET FT G
A
- Partition the problem (e.g. the mesh) into
subdomains - Use solvers for the subdomains B-1 and C-1 to
precondition an iterative solver on the
interface - Interface system is the Schur complement
S G ET B-1E FT C-1F - Parallelizes naturally by subdomains
25(Matlab demo)
- grid and matrix structure for overlapping 2-way
partition of eppstein
26Multigrid (introduction)
- For a PDE on a fine mesh, precondition using a
solution on a coarser mesh - Use idea recursively on hierarchy of meshes
- Solves the model problem (Poissons eqn) in
linear time! - Often useful when hierarchy of meshes can be
built - Hard to parallelize coarse meshes well
- This is just the intuition lots of theory and
technology
27Other Krylov subspace methods
- Nonsymmetric linear systems
- GMRES for i 1, 2, 3, . . . find xi ? Ki
(A, b) such that ri (Axi b) ? Ki (A,
b)But, no short recurrence gt save old vectors
gt lots more space - BiCGStab, QMR, etc.Two spaces Ki (A, b) and Ki
(AT, b) w/ mutually orthogonal basesShort
recurrences gt O(n) space, but less robust - Convergence and preconditioning more delicate
than CG - Active area of current research
- Eigenvalues Lanczos (symmetric), Arnoldi
(nonsymmetric)
28The Landscape of Sparse Axb Solvers
29Complexity of direct methods
Time and space to solve any problem on any
well-shaped finite element mesh
2D 3D
Space (fill) O(n log n) O(n 4/3 )
Time (flops) O(n 3/2 ) O(n 2 )
30Complexity of linear solvers
Time to solve model problem (Poissons equation)
on regular mesh
2D 3D
Sparse Cholesky O(n1.5 ) O(n2 )
CG, exact arithmetic O(n2 ) O(n2 )
CG, no precond O(n1.5 ) O(n1.33 )
CG, modified IC O(n1.25 ) O(n1.17 )
CG, support trees O(n1.20 ) O(n1.75 )
Multigrid O(n) O(n)