Title: Column Cholesky Factorization: ARTR
1Column Cholesky Factorization ARTR
- for j 1 n
- for k 1 j-1
- cmod(j,k)
- for i j n
- A(i,j) A(i,j) A(i,k)A(j,k)
- end
- end
- cdiv(j)
- A(j,j) sqrt(A(j,j))
- for i j1 n
- A(i,j) A(i,j) / A(j,j)
- end
- end
- Column j of A becomes column j of RT
2Data structures
- Full
- 2-dimensional array of real or complex numbers
- (nrowsncols) memory
- Sparse
- compressed column storage
- about (1.5nzs .5ncols) memory
D
3Sparse Column Cholesky Factorization ARTR
- for j 1 n
- for k lt j with A(j,k) nonzero
- sparse cmod(j,k)
- A(jn, j) A(jn, j) A(jn, k)A(j, k)
- end
- sparse cdiv(j)
- A(j,j) sqrt(A(j,j))
- A(j1n, j) A(j1n, j) / A(j,j)
- end
- Column j of A becomes column j of RT
4Graphs and Sparse Matrices Cholesky
factorization
Fill new nonzeros in factor
Symmetric Gaussian elimination for j 1 to n
add edges between js higher-numbered
neighbors
G(A)chordal
G(A)
5Sparse Cholesky factorization ARTR
- Preorder
- Independent of numerics
- Symbolic Factorization
- Elimination tree
- Nonzero counts
- Supernodes
- Nonzero structure of R
- Numeric Factorization
- Static data structure
- Supernodes use BLAS3 to reduce memory traffic
- Triangular Solves
6Incomplete Cholesky factorization (IC, ILU)
- Compute factors of A by Gaussian elimination,
but ignore fill - Preconditioner B RTR ? A, not formed explicitly
- Compute B-1z by triangular solves (in time
nnz(A)) - Total storage is O(nnz(A)), static data structure
- Either symmetric (IC) or nonsymmetric (ILU)
7Incomplete Cholesky and ILU Variants
- Allow one or more levels of fill
- unpredictable storage requirements
- Allow fill whose magnitude exceeds a drop
tolerance - may get better approximate factors than levels of
fill - unpredictable storage requirements
- choice of tolerance is ad hoc
- Partial pivoting (for nonsymmetric A)
- Modified ILU (MIC) Add dropped fill to
diagonal of U or R - A and RTR have same row sums
- good in some PDE contexts
8Incomplete Cholesky and ILU Issues
- Choice of parameters
- good smooth transition from iterative to direct
methods - bad very ad hoc, problem-dependent
- tradeoff time per iteration (more fill gt more
time) vs of iterations (more
fill gt fewer iters) - Effectiveness
- condition number usually improves (only) by
constant factor (except MIC for some problems
from PDEs) - still, often good when tuned for a particular
class of problems - Parallelism
- Triangular solves are not very parallel
- Reordering for parallel triangular solve by graph
coloring
9Matrix permutations Symmetric direct methods
- Goal is to reduce fill, making Cholesky
factorization cheaper. - Minimum degree
- Eliminate row/col with fewest nzs, add fill,
repeat - Theory can be suboptimal even on 2D model
problem - Practice often wins for medium-sized problems
- Nested dissection
- Find a separator, number it last, proceed
recursively - Theory approx optimal separators gt approx
optimal fill and flop count - Practice often wins for very large problems
- Banded orderings (Reverse Cuthill-McKee, Sloan, .
. .) - Try to keep all nonzeros close to the diagonal
- Theory, practice often wins for long, thin
problems - Best orderings for direct methods are ND/MD
hybrids.
10Fill-reducing permutations in Matlab
- Nonsymmetric approximate minimum degree
- p colamd(A)
- column permutation lu(A(,p)) often sparser
than lu(A) - also for QR factorization
- Symmetric approximate minimum degree
- p symamd(A)
- symmetric permutation chol(A(p,p)) often
sparser than chol(A) - Reverse Cuthill-McKee
- p symrcm(A)
- A(p,p) often has smaller bandwidth than A
- similar to Sparspak RCM
D
11Matrix permutations for iterative methods
- Symmetric matrix permutations dont change the
convergence of unpreconditioned CG - Symmetric matrix permutations affect the quality
of an incomplete factorization poorly
understood, controversial - Often banded (RCM) is best for IC(0) / ILU(0)
- Often min degree nested dissection are bad for
no-fill incomplete factorization but good if some
fill is allowed - Some experiments with orderings that use matrix
values - e.g. minimum discarded fill order
- sometimes effective but expensive to compute
12Nonsymmetric matrix permutations for iterative
methods
- Dynamic permutation ILU with row or column
pivoting - E.g. ILUTP (Saad), Matlab luinc
- More robust but more expensive than ILUT
- Static permutation Try to increase diagonal
dominance - E.g. bipartite weighted matching (Duff Koster)
- Often effective no theory for quality of
preconditioner - Field is not well understood, to say the least
13Row permutation for heavy diagonal Duff,
Koster
1
5
2
3
4
1
2
3
4
5
A
- Represent A as a weighted, undirected bipartite
graph (one node for each row and one node for
each column) - Find matching (set of independent edges) with
maximum product of weights - Permute rows to place matching on diagonal
- Matching algorithm also gives a row and column
scaling to make all diag elts 1 and all
off-diag elts lt1
14Sparse approximate inverses
- Compute B-1 ? A explicitly
- Minimize A B-1 I F (in parallel, by
columns) - Variants factored form of B-1, more fill, . .
- Good very parallel, seldom breaks down
- Bad effectiveness varies widely
15Complexity of linear solvers
Time to solve model problem (Poissons equation)
on regular mesh
16Complexity of direct methods
Time and space to solve any problem on any
well-shaped finite element mesh