Column Cholesky Factorization: ARTR

About This Presentation

Title:

Column Cholesky Factorization: ARTR

Description:

Compute B-1z by triangular solves (in time nnz(A) ... Triangular solves are not very parallel. Reordering for parallel triangular solve by graph coloring ... – PowerPoint PPT presentation

Number of Views:196

Avg rating:3.0/5.0

Slides: 17

Provided by: JohnGi84

Category:

more less

Transcript and Presenter's Notes

Title: Column Cholesky Factorization: ARTR

1
Column Cholesky Factorization ARTR

for j 1 n
for k 1 j-1
cmod(j,k)
for i j n
A(i,j) A(i,j) A(i,k)A(j,k)
end
end
cdiv(j)
A(j,j) sqrt(A(j,j))
for i j1 n
A(i,j) A(i,j) / A(j,j)
end
end

Column j of A becomes column j of RT

2
Data structures

Full
2-dimensional array of real or complex numbers
(nrowsncols) memory

Sparse
compressed column storage
about (1.5nzs .5ncols) memory

D
3
Sparse Column Cholesky Factorization ARTR

for j 1 n
for k lt j with A(j,k) nonzero
sparse cmod(j,k)
A(jn, j) A(jn, j) A(jn, k)A(j, k)
end
sparse cdiv(j)
A(j,j) sqrt(A(j,j))
A(j1n, j) A(j1n, j) / A(j,j)
end

Column j of A becomes column j of RT

4
Graphs and Sparse Matrices Cholesky
factorization
Fill new nonzeros in factor
Symmetric Gaussian elimination for j 1 to n
add edges between js higher-numbered
neighbors
G(A)chordal
G(A)
5
Sparse Cholesky factorization ARTR

Preorder
Independent of numerics
Symbolic Factorization
Elimination tree
Nonzero counts
Supernodes
Nonzero structure of R
Numeric Factorization
Static data structure
Supernodes use BLAS3 to reduce memory traffic
Triangular Solves

6
Incomplete Cholesky factorization (IC, ILU)

Compute factors of A by Gaussian elimination,
but ignore fill
Preconditioner B RTR ? A, not formed explicitly
Compute B-1z by triangular solves (in time
nnz(A))
Total storage is O(nnz(A)), static data structure
Either symmetric (IC) or nonsymmetric (ILU)

7
Incomplete Cholesky and ILU Variants

Allow one or more levels of fill
unpredictable storage requirements
Allow fill whose magnitude exceeds a drop
tolerance
may get better approximate factors than levels of
fill
unpredictable storage requirements
choice of tolerance is ad hoc
Partial pivoting (for nonsymmetric A)
Modified ILU (MIC) Add dropped fill to
diagonal of U or R
A and RTR have same row sums
good in some PDE contexts

8
Incomplete Cholesky and ILU Issues

Choice of parameters
good smooth transition from iterative to direct
methods
bad very ad hoc, problem-dependent
tradeoff time per iteration (more fill gt more
time) vs of iterations (more
fill gt fewer iters)
Effectiveness
condition number usually improves (only) by
constant factor (except MIC for some problems
from PDEs)
still, often good when tuned for a particular
class of problems
Parallelism
Triangular solves are not very parallel
Reordering for parallel triangular solve by graph
coloring

9
Matrix permutations Symmetric direct methods

Goal is to reduce fill, making Cholesky
factorization cheaper.
Minimum degree
Eliminate row/col with fewest nzs, add fill,
repeat
Theory can be suboptimal even on 2D model
problem
Practice often wins for medium-sized problems
Nested dissection
Find a separator, number it last, proceed
recursively
Theory approx optimal separators gt approx
optimal fill and flop count
Practice often wins for very large problems
Banded orderings (Reverse Cuthill-McKee, Sloan, .
. .)
Try to keep all nonzeros close to the diagonal
Theory, practice often wins for long, thin
problems
Best orderings for direct methods are ND/MD
hybrids.

10
Fill-reducing permutations in Matlab

Nonsymmetric approximate minimum degree
p colamd(A)
column permutation lu(A(,p)) often sparser
than lu(A)
also for QR factorization
Symmetric approximate minimum degree
p symamd(A)
symmetric permutation chol(A(p,p)) often
sparser than chol(A)
Reverse Cuthill-McKee
p symrcm(A)
A(p,p) often has smaller bandwidth than A
similar to Sparspak RCM

D
11
Matrix permutations for iterative methods

Symmetric matrix permutations dont change the
convergence of unpreconditioned CG
Symmetric matrix permutations affect the quality
of an incomplete factorization poorly
understood, controversial
Often banded (RCM) is best for IC(0) / ILU(0)
Often min degree nested dissection are bad for
no-fill incomplete factorization but good if some
fill is allowed
Some experiments with orderings that use matrix
values
e.g. minimum discarded fill order
sometimes effective but expensive to compute

12
Nonsymmetric matrix permutations for iterative
methods

Dynamic permutation ILU with row or column
pivoting
E.g. ILUTP (Saad), Matlab luinc
More robust but more expensive than ILUT
Static permutation Try to increase diagonal
dominance
E.g. bipartite weighted matching (Duff Koster)
Often effective no theory for quality of
preconditioner
Field is not well understood, to say the least

13
Row permutation for heavy diagonal Duff,
Koster
1
5
2
3
4
1
2
3
4
5
A

Represent A as a weighted, undirected bipartite
graph (one node for each row and one node for
each column)
Find matching (set of independent edges) with
maximum product of weights
Permute rows to place matching on diagonal
Matching algorithm also gives a row and column
scaling to make all diag elts 1 and all
off-diag elts lt1

14
Sparse approximate inverses

Compute B-1 ? A explicitly
Minimize A B-1 I F (in parallel, by
columns)
Variants factored form of B-1, more fill, . .
Good very parallel, seldom breaks down
Bad effectiveness varies widely

15
Complexity of linear solvers
Time to solve model problem (Poissons equation)
on regular mesh
16
Complexity of direct methods
Time and space to solve any problem on any
well-shaped finite element mesh

Write a Comment

User Comments (0)

About PowerShow.com

Column Cholesky Factorization: ARTR - PowerPoint PPT Presentation

Column Cholesky Factorization: ARTR

Compute B-1z by triangular solves (in time nnz(A) ... Triangular solves are not very parallel. Reordering for parallel triangular solve by graph coloring ... – PowerPoint PPT presentation