Title: Parallel CFD Simulation using Systolic CommunicationComputation Overlap
1Parallel CFD Simulation using Systolic
Communication-Computation Overlap
- Kurokawa, Motoyoshi School of I.S., JAIST
- Matsuzawa, Teruo Center for I.S., JAIST
- Himeno, Ryutaro Institute of Physical
and Chemical Research - Shigetani, Takayuki Institute of Physical and
- Chemical Research
2Outline
- Introduction
- Parallel CFD simulation
- Systolic communication computation overlap
- Flow result and CFD performance result
- Benchmark test
- Benchmark performance result
- Conclusions
3Introduction
- Popularization of Utility PC Cluster
- Main stream is Intel CPU,100Base or Giga bit
- For high performance of parallel CFD simulations
on the PC Cluster with low speed network - Possible to obtain high performance in large
scale problem - Small scale problem and highly parallelization ?
- Use overlapping of communication computation
overlap - The pipeline method and the systolic
communication-computation overlap method - We show effectiveness in CFD simulation using MAC
method
4Parallel CFD Simulation (1/2)
- The CFD simulation solver is the MAC method
- The MAC Method is the separate method of pressure
and velocity - Parallelization is the domain decomposition
method - Mostly computational load is the Poisson equation
for pressure - Most important point of high performance is the
Poisson equation solver - We use the systolic communication-computation
overlap to the Poisson equation solver
5Parallel CFD Simulation (2/2)
- Design for the systolic communication-computation
overlap in the parallel CFD simulation - In this simulation, the Poisson equation solver
is Jacobi method - Data dependency of the interior region and the
boundary region - Correlate the data exchange communication of
boundary region and the computation of interior
region
6Computational Data Dependency in the Poisson
Solver
- Data dependency in the Poisson equation solver is
only adjacent grid point - The data used to compute interior region does not
depend with the data used to compute the boundary
region. - In this case, it is possible to use the systolic
communication-computation overlap
7Systolic communication-computation overlap
- Concurrency of interior computation and boundary
communication - Communication time or computation time is
decreased - In many CFD simulation, this situations appear
frequently - Especially, based on a finite difference method
8Procedure systolic overlap processing
- Compute boundary region
- Overlap processing
- Asynchronous exchange boundary region data to
adjacent overlap region - compute interior region
- Wait processing
- Asynchronous data exchange
- Compute interior region
9CFD simulation
- Computational model is the three dimensional
lid-driven cavity flow - Governing equations
- Continuity equationNavier-Stokes(NS) equation
- Finally
- Poisson equationNS equation
10Computational Condition (1/2)
- Discretization
- Spacial difference accuracy
- advective term is third order upwind difference
- other terms are second order central difference.
- Time marching accuracy
- Time term is first order explicit method
- Poisson equation solver is Jacobi method
- Reynolds number is 100
- Stopping criterion is maximum error (1.d0-5)
- Performance measurement is 100 time step
11Computational Condition (2/2)
- Velocity boundary condition
- Top wall(lid) is u1.0 v,w0.0
- Other wall is no slip wall(u,v,w0.0)
- Pressure boundary condition
- All wall is gradient 0.0
- Grid system is a general coordinate system
- Not use Cartesian coordinate system
- Use transformation coordinate system
- Grid size TypeA TypeB
12Parallel Computer specification
PC Cluster
RS/6000 SP
- CPU PowerPC 604e (332MHz)
- Node 64 (Used 32 nodes)
- MEMORY 512MByte
- Network SP-Switch (Giga bit)
- OS AIX 4.3
- Compiler XL Fortran 5.0
- MPI provide IBM
- CPU P-4 1.5 GHz
- Node 8(1)
- MEMORY 512MByte
- Network 100Base
- OS Linux 2.4.0
- Compiler PGI Compiler
- MPI include PGI Cluster kit
13Flow result
matched
14Speed-up Ratio
PC Cluster
RS/6000 SP
15Benchmark Test (for the future)
- We obtained good performance in the overlap
method - Convergence performance of Jacobi method is low
- We considered other high convergence performance
solver - Mostly computational load is the Poisson equation
for pressure - Evaluation of CFD simulation cost is possible in
only the Poisson equation solver - Focused on SOR method
- Benchmark test of SOR solver
- Multi-color SOR method (8-color SOR method)
- Modified SOR method (parallel SOR(PSOR) method)
- Comparison performance of the pipeline SSOR method
16Benchmark test condition(1/2)
- PSOR and 8-color SOR method are similar algorithm
above Jacobi method - Difference point is receiving to temporary
buffer. - After wait processing, Copy from temporary buffer
to overlap region - PSOR method changes convergence speed for
parallelization, because the computational order
changes.
17Benchmark test condition(2/2)
- Pipeline SSOR method is similar algorithm NAS
Parallel Benchmark LU - Initial condition of benchmark test used initial
condition of above CFD simulation - Stopping criterion is maximum error (1.d0-5)
- Grid size 66x66x66
Forward
Backward
18Result of Overlapped SOR
PC Cluster
RS/6000 SP
19Conclusions
- The systolic communication-computation overlap is
effective for CFD simulation on the Cluster
systems - Especially, Small and middle problem size is
effective - PSOR method is better method in this study
- PSOR change the convergence speed and the
solution is not necessarily obtained - We should consider the method of uniting the
convergence performance and computational
performance