Title: Cache Aware Multigrid on AMR Hierarchies
1Cache Aware Multigrid on AMR Hierarchies
Danny Thorne Computer Science Department Universit
y of Kentucky thorne_at_ccs.uky.edu http//www.ccs.uk
y.edu/thorne
Supported in part by Sandia National Laboratories
and the National Science Foundation.
2Definition of the Problem
Solve elliptic boundary value problems
(BVPs). using adaptive mesh refinement
(AMR) and multigrid (MG) with cache aware (CA)
optimizations.
3Motivation for AMR
- Problems with variations in scale, e.g.
combustion.
CRF Example
4Motivation for Cache Aware
http//www.surriel.com/lectures/hierarchy.gif
5Outline
- Background.
- Tools for AMRMG.
- AMRMG.
- Cache aware smoothers.
- Cache aware AMRMG.
- Numerical Results.
- Future work.
6Discretization of PDEs 101
From PDE to Linear System (in 1D).
7Iterative Methods
For Solving Linear Systems, .
Alternately solve for x1 and x2 as though the
other were correct, and watch the result converge
to u
8Iterative Methods
For Solving Linear Systems, .
For the following special cases,
9Smoothers
- Iterative methods are very slow to converge for
life sized problems. - They damp (smooth) the high frequency components
of the error in the approximation quickly,
though. - Multigrid (next slide) exploits this smoothing
effect to achieve fast convergence. - A smoother is typically a couple (e.g. 2 to 4)
sweeps of an iterative method like Gauss-Seidel.
10Cell Centered Multigrid
Illustration in 1D with four grid levels
Smooth A4u4f4.
Smooth A4u4f4.
Smooth A3u3f3.
Smooth A3u3f3.
Smooth A2u2f2.
Smooth A2u2f2.
Solve A1u1f1 directly.
11Definition of a Grid Hierarchy
12Transfer Operators
13Nesting
We require strict nesting of the grids
This requirement can be written in terms of the
domains as
and
14Internal Patch Boundaries
a.k.a. Coarse/Fine Interfaces
C1 continuity guarantees consistency which is
important because consistency stability
convergence.
Note We use cell-centered grids so that coarse
and fine grids wont share grid points at the
interfaces.
15Stencils for Constant Coefficients
16Stencils for Variable Coefficients
17How to Interpolate Ghost Points
- First, interpolate values at the blue xs using
coarse grid values. - Then interpolate the red points using fine grid
values along with the values at the blue xs. - The red points are the ghost points that will be
used by the discrete operator.
18Flux Matching
1
2
The same approximation for the derivative across
the coarse/fine interface is used for the grid
points on both sides of the interface, so C1
continuity is preserved across the interface.
Notice that these fluxes match the fluxes used in
the stencils for the boundary points on the fine
grid side. Hence, flux matching.
19Flux Matching versus Simple Taylor Series Flux
Approximation
20Flux Matching Formula by Taylors Series
213D Ghost Point Interpolation
Hard to illustrate in 3D
Interpolate the green points using coarse grid
values and then interpolate the blue points using
the green points.
Interpolate the red points using fine grid values
and the blue points. The red points are the
ghost points.
A piece of a coarse grid thats adjacent to a
fine grid
22Flux Matching in 3D
Hard to illustrate in 3D
At the interface, average the fluxes across the
four fine grid cell walls using the interpolated
values denoted by the red points.
233D GP Interp, Step 1
243D GP Interp, Step 2
253D Flux Matching
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36Flux Matching in 3D
Image created with Povray (http//www.povray.org)
37AMRMG
An Illustration in 1D
38Preparation For Illustration
- Assume the hierarchy is already constructed by a
calling AMR code. - The following sequence of slides illustrates our
AMR MG algorithm on the smallest possible example
(in 1D) that captures all the aspects of the
algorithm - Three levels,
- One refinement patch per level,
- Four grid points per level.
- Problem
- Residual
- Correction
Fasten your seatbelts, grasp the safety bar,
place head firmly against head rest
39Illustration in 1D
The dashed lines and dots represent the
completion of the composite grid on each level.
The algorithm is patch-based. The solid lines
represent the domains of the patches, and the
dark dots represent the grid points in the
patches.
40Illustration in 1D
Start with an initial guess for the solution on
the composite grid.
Represent the grid points with a 2-by-2 array of
spaces to keep track of the status of the
different variables during the algorithm. See
the key to the left.
Solution
Correction
Residual
Temporary
41Illustration in 1D
Interpolate ghost point(s) so that regular
stencils can be applied at patch boundary points.
Checkerboard pattern denotes values that were
updated in previous steps.
Solution
Correction
Residual
Temp
Temporary
42Illustration in 1D
Compute the residual on the finest grid level
(using the ghost point(s) to complete the
stencil(s) on the patch boundary points).
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
43Illustration in 1D
Smooth to get a correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
44Illustration in 1D
Store temporarily corrected solution, needed for
acquisition of ghost points for the residual
computation on the next level. (The permanently
corrected solution for this level of this V cycle
will be computed on the other side of the V
cycle.)
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
45Illustration in 1D
Project the residual onto the part of the coarser
grid that is covered by the finer grid.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
46Illustration in 1D
Interpolate ghost point(s) for the temporarily
corrected solution. This is needed for residual
computation on the uncovered part of the next
level.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
47Illustration in 1D
Interpolate ghost point(s) so that regular
stencils can be applied at patch boundary points.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
48Illustration in 1D
Compute the residual on the uncovered nodes. Use
fine grid data (including ghost points) to do
flux matching at the interface nodes. Use the
newly interpolated ghost points to apply a
regular stencil at the boundary points.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
49Illustration in 1D
Smooth to get a correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
50Illustration in 1D
Store temporarily corrected solution, needed for
acquisition of ghost points for the residual
computation on the next level. (The permanently
corrected solution for this level of this V cycle
will be computed on the other side of the V
cycle.)
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
51Illustration in 1D
Project the residual onto the part of the coarser
grid that is covered by the finer grid.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
52Illustration in 1D
Compute ghost point(s) for the temporarily
corrected solution.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
53Illustration in 1D
Compute the residual on the uncovered nodes. Use
fine grid data (including ghost points) to do
flux matching at the interface nodes.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
54Illustration in 1D
Solve for the correction on the coarsest level.
(We use geometric multigrid here.)
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
55Illustration in 1D
Correct the solution on the uncovered part of the
coarsest level.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
56Illustration in 1D
Update the correction on the next finer level
with interpolated values from the coarse level.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
57Illustration in 1D
Update the residual.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
58Illustration in 1D
Smooth to get a correction for the correction.
We avoid smoothing the current correction because
that would require interpolating ghost points at
the patch boundaries.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
59Illustration in 1D
Update the original correction with the newly
computed correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
60Illustration in 1D
Now update the solution on the uncovered nodes
with the latest correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
61Illustration in 1D
Update the correction on the next finer level
with interpolated values from the coarser level.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
62Illustration in 1D
Update the residual.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
63Illustration in 1D
Smooth to get a correction for the correction.
We avoid smoothing the current correction because
that would require interpolating ghost points at
the patch boundaries.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
64Illustration in 1D
Update the original correction with the newly
computed correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
65Illustration in 1D
Now update the solution on this level with the
latest correction.
Checkerboard denotes values that were updated in
previous steps.
Solution
Correction
Residual
Temp
Temporary
66Illustration in 1D
Final solution on the composite grid after one V
cycle.
Can do more V cycles by starting the procedure
again from the beginning with the new solution as
the initial guess.
67AMR MG Algorithm
Solve on the coarsest grid.
Compute residual on finest grid.
Smooth a correction.
Store temporarily corrected solution.
Project part of the residual.
Compute the other part of the residual.
Recursively apply to coarser levels.
Interpolate and correct.
Update the residual.
Smooth an intermediate correction.
Update the final correction.
Apply the final correction to the solution on
this level.
68Cache Aware Smoothers
Preparation for Illustration
- Want data to pass through cache only once per
application of the smoother. - Application of the smoother means multiple
(e.g. two to four) Gauss-Seidel updates per
point. - Whereas normally one sweeps the entire grid over
and over to apply successive updates, we want to
stagger the updates in such a way that we need to
sweep the grid only once. - Want the answer to be exactly the same as the
normal application of the smoother. (Bit-wise
equivalence.) - The following illustration is of a 2D grid with
12-by-4 grid points and for three updates per
point. Updates occur from left to right and
bottom to top.
69Cache Aware Smoothers
Note that updating the third row again would
require updated data in the fourth row, so we
have to wait since the fourth row hasnt been
updated yet. Similarly for the second row which
depends on a second update of row three.
Before updating new points, apply more updates to
the current points in cache where possible.
Update points in the first partition.
70Cache Aware Smoothers
71Cache Aware Smoothers
72Cache Aware Smoothers
73Cache Aware Smoothers
742D Blocking
75Cache Seams
76Tall Cache Block
To minimize the amount of seam points.
77Seams with Tall Cache Block
783D Cache Block
79Only Post-smoothing
- Apparently, the cache aware smoothing technique
will give better performance with greater numbers
of updates (within a limit imposed by the cache
size, anyway). - When doing both pre- and post- smoothing, it is
common to do two pre-smoothing updates and two
post-smoothing updates. - Q What if we instead do four post-smoothing
updates and no pre-smoothing? - A Better cache effects.
- In addition, doing only post-smoothing
simplifies our AMR MG algorithm, as illustrated
on the next slide, and is more conducive to
making multigrid cache aware.
80Only Post-Smoothing
- No pre-smoothing.
- No need for a temporarily corrected solution.
- Just project/compute the residual.
- Only need to compute ghost points once.
- No residual update.
- Can avoid intermediate correction by
interpolating ghost points (good for
implementation of cache aware multigrid). - Number of smoothing updates here should equal
the number of pre-smoothing updates plus the
number of post-smoothing updates used in the
original implementation.
81Cache Aware Multigrid
82Numerical Results
- The following slides contain graphs of numerical
results. - The results are for the 2D case with a variety
of refinement patterns
- Laplaces equation with solution function
- Variable function
- Dirichlet boundary conditions.
83Numerical Results
For V(0,4)
gt50, gt100
84Numerical Results
(Average Speedup per Vcycle)
gt50, gt100
85Numerical Results
(Speedups for Run Until Convergence)
gt400, gt500
86Future Work
- 3D // Have the 3D code and am working on cache
aware algorithms for it. Much harder to get good
speedups in the 3D AMR environment than in the 2D
AMR environment - Parallel // Rudimentary parallel mechanisms in
the code are waiting to be extended and
generalized. Have incorporated Zoltan for load
balancing. - Clustering Algorithms // To make the code a
standalone app, need to incorporate clustering
algorithms for generating the AMR hierarchies
from scratch instead of relying on hierarchies
provided by calling routines.
87Papers
- Papers
- C. C. Douglas, J. Hu, J. Ray, D. T. Thorne, and
R. S. Tuminaro, Fast, Adaptively Refined
Computational Elements in 3D, in Proceedings of
the International Conference on Computational
Science, Amsterdam, 2002, Springer-Verlag,
Berlin, 2002, 10 pages. (Refereed) - C. C. Douglas and D. T. Thorne, A Note on Cache
Memory Methods for Multigrid in Three Dimensions,
11 pages, to appear, American Mathematics
Society, 2002-2003. - Cache Aware Multigrid for Variable Coefficient
Elliptic Problems on Adaptive Mesh Refinement
Hierarchies, C. C. Douglas, J. Hu, J. Ray, D. T.
Thorne, and R. Tuminaro, to appear in Journal
Numerial Linear Algrebra and Applications, 2004.
88Thank You!