Title: Presentation for Introduction of FISC
1Description of ScaleME
W.C. Chew, L. Hastriter, and S.
Velamparambil Center for Computational
Electromagnetics Dept. of Electrical and Computer
Engineering University of Illinois at
Urbana-Champaign
2ScaleME
- ScaleME is a parallel implementation of the
Multi-level Fast Multipole Algorithm (MLFMA). - It has been tested up to 10 million unknowns on
the SGI Origin 2000 with 128 Processors at NCSA. - It has been demonstrated to have better parallel
efficiency compared to FISC. - It is about 7 times faster than FISC for
performing a matrix vector product. - It is suppose to be portable.
3Essential Ideas
- A simple way to parallelize MLFMA, which is a
tree code, is to split the workload according to
te workload at each node. - However, this gives rise to exorbitant
communcation cost. - Hence, a two prong approach is usedthe bottom
part of the tree is split according to workload
at each node, but the top is split according to
message length being passed from nodes to nodes.
4Essential Ideas - Illustrated
- We called the top levels of the tree shared
levels. - At the shared levels, the same tree is replicated.
- Each processor gets half the radiation/receiving
patterns of the boxes numbered 1, 2 and 3.
5Examples
- We will show the scaling property of ScaleME for
increasing number of processors.
6Matrix-Vector Products Sphere 6?
- Total number of levels 5.
- There is an initial improvement in parallel
efficiency when the code goes - For small problem size, the use of more shared
levels makes the problem less efficient.
7Matrix-Vector Products Sphere 12?
- Total number of levels 6.
- For a larger problem, the use of more shared
levels enhances parallel efficiency, but
eventually, parallel efficiency is lost with too
many shared levels.
8Matrix-Vector Products Pencil at 8GHz
Number of levels 9
1.2 million unknowns Length 3.17 meters Radius
0.1 meters f 8 GHz 5 GB of RAM 300
s/iteration 1 proc 10 s/iteration 32 proc
9Matrix-Vector products Pencil at 4 GHz
Number of levels 8
- Carefully chosen shared level results in
impressive scaling properties
10Matrix-Vector Products VFY-218
- Full scale model
- Realistic target
- Has many geometric features which do not allow
easy load balancing - Tested for frequencies from 500 MHz to 8 GHz
11Matrix-Vector Products VFY-218 at 500 MHz
- Total number of levels 6
- For such a complex structure, communication cost
is high. - When no shared levels are used, parallel
efficiency is poor.
12Matrix-Vector products VFY-218 at 1 GHz
When the problem size gets larger, the use of
shared levels can greatly enhance the parallel
efficiency.
13Matrix-Vector Products Scaling with Size
VFY-218
14Scaling of RCS computations
- As a result of the efficient parallel FMM, RCS
evaluation also becomes scalable
15Parallel RCS Computation Time
- Actual time required for evaluating bistatic RCS
for 1800 angles on the VFY-218
16Very Large Scale Problem - Sphere
- N 10,002,828 Number of Levels 9
- Time for matrix-vector products 34 s on 126
processors - Total solution time 2 hrs, 5 mins
17Very Large Scale Problem VFY-218
- Frequency 8 GHz N 10,186,446
- Time for matrix-vector products 119 s on 126
processors - Total solution time 7 hrs and 25 mins ( 2 rhs)
18Conclusions
- The objective of the paper was to summarize our
efforts at developing a scalable MLFMA-based fast
solver for electromagnetic scattering
calculations - Presented the essential ideas in the
parallelization of dynamic MLFMA, which has
exorbitant communication cost at the coarse level
for a naĂŻve parallelization - Demonstrated the performance of the method with
several examples - Demonstrated the ability of the code to handle
extremely large scale problems by solving
problems involving more than 10 million unknowns