Presentation for Introduction of FISC - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Presentation for Introduction of FISC

Description:

Dept. of Electrical and Computer Engineering. University of Illinois at Urbana-Champaign ... ScaleME is a parallel implementation of the Multi-level Fast ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 19

Provided by: Jimin3

Category:

more less

Transcript and Presenter's Notes

Title: Presentation for Introduction of FISC

1
Description of ScaleME
W.C. Chew, L. Hastriter, and S.
Velamparambil Center for Computational
Electromagnetics Dept. of Electrical and Computer
Engineering University of Illinois at
Urbana-Champaign
2
ScaleME

ScaleME is a parallel implementation of the
Multi-level Fast Multipole Algorithm (MLFMA).
It has been tested up to 10 million unknowns on
the SGI Origin 2000 with 128 Processors at NCSA.
It has been demonstrated to have better parallel
efficiency compared to FISC.
It is about 7 times faster than FISC for
performing a matrix vector product.
It is suppose to be portable.

3
Essential Ideas

A simple way to parallelize MLFMA, which is a
tree code, is to split the workload according to
te workload at each node.
However, this gives rise to exorbitant
communcation cost.
Hence, a two prong approach is usedthe bottom
part of the tree is split according to workload
at each node, but the top is split according to
message length being passed from nodes to nodes.

4
Essential Ideas - Illustrated

We called the top levels of the tree shared
levels.
At the shared levels, the same tree is replicated.

Each processor gets half the radiation/receiving
patterns of the boxes numbered 1, 2 and 3.

5
Examples

We will show the scaling property of ScaleME for
increasing number of processors.

6
Matrix-Vector Products Sphere 6?

Total number of levels 5.
There is an initial improvement in parallel
efficiency when the code goes
For small problem size, the use of more shared
levels makes the problem less efficient.

7
Matrix-Vector Products Sphere 12?

Total number of levels 6.
For a larger problem, the use of more shared
levels enhances parallel efficiency, but
eventually, parallel efficiency is lost with too
many shared levels.

8
Matrix-Vector Products Pencil at 8GHz
Number of levels 9
1.2 million unknowns Length 3.17 meters Radius
0.1 meters f 8 GHz 5 GB of RAM 300
s/iteration 1 proc 10 s/iteration 32 proc
9
Matrix-Vector products Pencil at 4 GHz
Number of levels 8

N 291,774
lt 1.5 GB RAM

Carefully chosen shared level results in
impressive scaling properties

10
Matrix-Vector Products VFY-218

Full scale model
Realistic target
Has many geometric features which do not allow
easy load balancing
Tested for frequencies from 500 MHz to 8 GHz

11
Matrix-Vector Products VFY-218 at 500 MHz

Total number of levels 6
For such a complex structure, communication cost
is high.
When no shared levels are used, parallel
efficiency is poor.

12
Matrix-Vector products VFY-218 at 1 GHz
When the problem size gets larger, the use of
shared levels can greatly enhance the parallel
efficiency.
13
Matrix-Vector Products Scaling with Size
VFY-218
14
Scaling of RCS computations

As a result of the efficient parallel FMM, RCS
evaluation also becomes scalable

15
Parallel RCS Computation Time

Actual time required for evaluating bistatic RCS
for 1800 angles on the VFY-218

16
Very Large Scale Problem - Sphere

N 10,002,828 Number of Levels 9
Time for matrix-vector products 34 s on 126
processors
Total solution time 2 hrs, 5 mins

17
Very Large Scale Problem VFY-218

Frequency 8 GHz N 10,186,446
Time for matrix-vector products 119 s on 126
processors
Total solution time 7 hrs and 25 mins ( 2 rhs)

18
Conclusions

The objective of the paper was to summarize our
efforts at developing a scalable MLFMA-based fast
solver for electromagnetic scattering
calculations
Presented the essential ideas in the
parallelization of dynamic MLFMA, which has
exorbitant communication cost at the coarse level
for a naïve parallelization
Demonstrated the performance of the method with
several examples
Demonstrated the ability of the code to handle
extremely large scale problems by solving
problems involving more than 10 million unknowns