Title: Role of spectral turbulence simulations in developing HPC systems
1Role of spectral turbulence simulations in
developing HPC systems
- YOKOKAWA, Mitsuo
- Next-Generation Supercomputer RD Center
- RIKEN
2Background
- Experience of developing the Earth Simulator
- 40Tflops vector-type distributed-memory
supercomputer system - A simulation code for box turbulence flow was
used in the final adjustment of the system - Large simulation on box turbulence flow was
carried out. -
- A Peta-flops supercomputer project
3Contents
- Simulations on the Earth Simulator
- A Japanese peta-scale supercomputer project
- Trends of HPC system
- Summary
4Simulations on the Earth Simulator
5The Earth Simulator
- It was completed in 2002.
- 35.86Tflops sustained in LINPACK benchmark was
achieved. - It was chosen as one of 2002 best inventions by
TIME.
6Why I did?
- It is important to make performance evaluation of
the Earth Simulator at the final adjustment
phase. - Suitable codes should be chosen
- To evaluate performance of vector processor,
- To measure performance all-to-all communication
among compute-nodes through a crossbar switch, - To make an operation of the Earth Simulator
stable. - Candidates
- LINPACK Benchmark?
- Atmospheric general circulation model (AGCM)?
- Any other code?
7Why I did? (contd)
- Spectral turbulence simulation code
- Intensive computational kernel a lot of data
communications - Simple code
- Significance to computational science.
- One of the grand challenges in computational
science and high performance computing - A new spectral code for the Earth Simulator
- Fourier spectral method for spatial
discretization - Some techniques (mode truncation and phase shift
techniques) for aliasing error in calculating
nonlinear terms - Fourth-order Runge-Kutta method for time
integration
8Points of coding
- Optimization to the Earth Simulator
- Coordinated assignment of calculation to
three-level of parallelism (vector processing,
micro-tasking, and MPI parallelization) - Higher-radix FFT
- B/F (data transfer rate between CPU and memories
vs. operation performance) - Removal of redundant processes and variables
9Calculation for one time step
100
30.7sec
10
3.21sec
Wall time
1
0.1
0.01
64
128
256
512
Number of nodes
10Performance
100
16.4Tflops
50 of the peak (single precision analytical
FLOP number)
Tflops
10
1
64
128
256
512
Number of PNs
11Achievement of box turbulence flow simulations
1283
Jimenez et al.(1993) Caltech Delta machine
K I Y (2002) Earth Simulator
Kerr(1985) Cray-1S NCAR
5123
643
20483, 40963
Siggia(1981) Cray-1 NCAR
10243
GotohFukayama(2001) VPP5000/56 NUCC
Number of grid points
323
Yamamoto(1994) Numerical Wind Tunnel
Orszag(1969) IBM 360-95
2403
12A Japanese Peta-Scale Supercomputer Project
13Next-Generation Supercomputer Project
- Objectives are
- to develop the world's most advanced and
high-performance supercomputer - to develop and deploy its usage technologies as
well as application software. - as one of Japan's Key Technologies of National
Importance. - Period Budget FY2006-FY2012, 1 billion US
(expected) - RIKEN (The Institute of Physical and Chemical
Research) plays the central role of the project
in developing the supercomputer under the law.
14Goals of the project
- Development and installation of the most advanced
high performance supercomputer system with
LINPACK performance of 10 petaflops. - Development and deployment of application
software, which should be made to attain the
system maximum capability, in various science and
engineering fields. - Establishment of an Advanced Computational
Science and Technology Center (tentative) as one
of the Center of Excellences for research,
personnel development and training built around
the supercomputer.
15Major applications for the system
Grand Challenges
16Configuration of the system
- The Next-Generation Supercomputer will be a
hybrid general-purpose supercomputer that
provides the optimum computing environment for a
wide range of simulations.
- Calculations will be performed in processing
units that are suitable for the particular
simulation. - Parallel processing in a hybrid configuration of
scalar and vector units will make larger and more
complex simulations possible.
17Roadmap of the project
We are here.
18Location of the supercomputer site, Kobe-City
450km (280miles) west from Tokyo
19Artists image of a building
20Photo of the site (under construction)
June 10, 2008
July 17, 2008
Aug. 20, 2008
Photo From South-Side
21Trends of HPC system
22Trends of HPC system
- It will have the large number of processors
around 1 million or more. - Each chip will be multi-core(8, 16, or 32), or
many-core(more than 64) processor. - low performance for each core
- small main memory capacity for each core
- fine-grain parallelism
- Each processor consumes low energy low power
processor - Narrow bandwidth between CPU and main memory
- Bottleneck of the number of signal pins
- Bi-sectional bandwidth among compute-nodes will
be narrow. - One-to-one connection is very expensive and
power-consuming
23Impact to spectral simulations
- High performance in LINPACK benchmark
- The more the number of processors is, the higher
the LINPACK performance is. - It is not necessary that LINPACK performance
denotes real-world application performance,
especially spectral simulations - Small memory capacity for each processor
- fine-grain decomposition of space
- increasing communication cost among parallel
compute nodes - Narrow memory bandwidth and narrow inter-node
bi-sectional bandwidth - memory wall problem and low all-to-all
communication performance - necessity of a low B/F algorithm in place of FFT
24Impact to spectral simulations (contd)
- The trend does not completely fit doing 3D-FFT,
i.e. box turbulence simulations are getting to be
difficult to perform. - We can use more and more computational resource
near future, - But finer resolution simulation by spectral
methods needs a long-time calculation time
because of extremely slow of communications among
parallel compute nodes, and we might not be able
to obtain the final results in reasonable time.
25Estimates for more than 40963 simulation
- If simulation performance with 500TFlops
sustained can be used, - 81923 simulation needs
- 7 second for one-time step
- 100TB total memory
- 8 days for 100,000 steps and 1PBytes for a
complete simulation - 163843 simulation
- 1 min for one-time step
- 800TB total memory
- 3 months for 125,000 steps and 10PB in total for
a complete simulation
26Summary
- Spectral methods is a very useful algorithm to
evaluate the HPC system. - In this sense, the trend of HPC system
architecture is going to worse. - Even if peak performance of the system is so
high - We cannot expect high sustained performance.
- It may take a long time to finish a simulation
due to very slow data transfer between nodes. - Can we discard spectral methods and change the
algorithm? Or, we have to - put strong pressure on computer architecture
community, and - think of any international collaboration for
developing the supercomputer system which fit the
turbulent study. - I would think of a HPC system as a particle
accelerator like CERN.