Title: Conjugate Gradient
1Conjugate Gradient
20. History
- Why iterate?
- Direct algorithms require O(n³) work.
- 1950 n20
- 1965 n200
- 1980 n2000
- 1995 n20000
dimensional increase 103 computer hardware 109
30. History
- If matrix problems could be solved in O(n²) time,
matrices could be 30 bigger. - There are direct algorithms that run in about
O(n2.4) time, but their constant factors are to
big for practicle use. - For certain matrices, iterative methods have the
potential to reduce computation time to O(m²).
41. Introduction
- CG is the most popular method for solving large
systems of linear equations Ax b. - CG is an iterative method, suited for use with
sparse matrices with certain properties. - In practise, we generally dont find dense
matrices of a huge dimension, since the huge
matrices often arise from discretisation of
differential of integral equations.
52. Notation
- Matrix A, with components Aij
- Vector, n x 1 matrix x, with components xi
- Linear equation Axbwith components S Aij xj
bi
62. Notation
- Transponation of a matirx (AT)ij Aji
- Inner product of two vectors xTy S xiyi
- If xTy 0, then x and y are orthogonal
73. Properties of A
- A has to be an n x n matrix.
- A has to be positive definite, xTAx gt 0
- A has to be symmetric, AT A
84. Quadratic Forms
- A QF is a scalar quadratic function of a vector
- Example
94. Quadratic Forms
- Gradient Points to the greatest increase of f(x)
104. Quadratic Forms
positive definite xT A x gt 0
negative definite xT A x lt 0
positive indefinite xT A x 0
indefinite
115. Steepest Descent
- Start at an arbitrary point and slide down to the
bottom of the paraboloid. - Steps x(1), x(2), in the direction f(xi)
- Error e(i) x(i) x
- Residual r(i) b Ax(i)
r(i) - Ae(i) r(i) - f(x(i))
125. Steepest Descent
- x(i1) x(i) a r(i) , but how big is a?
- f(x(i1)) orthogonal to r(i)
search line a r(i)
135. Steepest Descent
The algorithm above requires two matrix
multiplications per iteration. One can be
eliminated by multiplying the last equation by
A.
145. Steepest Descent
This sequence is generated without any feedback
of x(i). Therefore, floatingpoint roundoff errors
may accumulate and the sequence could converge at
some point near x. This effect can be avoided by
periodically recomputing the correct residual
using x(i).
156. Eigenvectors
- v is an eigenvector of A, if a scalar ? so that
A v ? v - ? is then called an eigenvalue.
- A symmetric n x n matrix always has n independent
eigenvectors which are orthogonal. - A positive definite matrix has positive
eigenvalues.
167. Convergence of SD
- Convergence of SD requires the error e(i) to
vanish. To measure e(i), we use the A- norm - Some math now yields
177. Convergence of SD
Spectral condition number
.
An upper bound for ? is found by setting
.
We therefore have instant convergence if all the
eigenvalues of A are the same.
187. Convergence of SD
large ? small µ
large ? large µ
small ? large µ
small ? small µ
198. Conjugate Directions
- Steepest Descent often takes steps in the same
direction as earlier steps. - The solution is to take a set of A-orthogonal
search directions d(0), d(1), , d(n-1) and take
exactly one step of the right length in each
direction.
208. Conjugate Directions
A-orthogonal
orthogonal
218. Conjugate Directions
- Demanding dT(i) to be A-orthogonal on the next
error e(i1), we get
. - Generating search directions by Gram-Schmidt
Conjugation. Problem O (n³)
228. Conjugate Directions
- CD chooses , so thatis
minimized. - The error term is therefore A-orthogonal to all
the old search directions.
239. Conjugate Gradient
- The residual is orthogonal to the previous search
directions. - Krylov subspace
249. Conjugate Gradient
r(i1) is A-orthogonal to Di
- Gram-Schmidt conjugation becomes easy, because
r(i1) is already A-orthogonal to all the
previous search directions except d(i).
259. Conjugate Gradient
2611. Preconditioning
- Improving the condition number of the matrix
before the calculation. Example - Attempt to strech the quadratic form to make it
more spherical. - Many more sophisticated preconditioners have been
developed and are nearly always used.
2712. Outlook
- CG can also be used to solve
- To solve non-linear Problems with CG, one has to
make changes in the algorithm. There are several
possibilities, and the best choice is still under
research.
.
2812. Outlook
In non-linear problems, there may be several
local minima to which CG might converge. It is
therefore hard to determine the right step size.
2912. Outlook
- There are other algorithms in numerical linear
algebra closely related to CG. - They all use Krylov subspaces.