Title: The Parallel Models of Coronal Polarization Brightness Calculation
1The Parallel Models of Coronal Polarization
Brightness Calculation
2Outline
- Introduction
- pB Calculation Formula
- Serial pB Calculation Process
- Parallel pB Calculation Models
- Conclusion
3Part ?. Introduction
- Space weather forecast needs an accurate solar
wind model for the solar atmosphere and the
interplanetary space. The global model of corona
and heliosphere is the basis of numerical space
weather forecast, and the observation basis of
explaining various relevant relations. - Meanwhile, three-dimensional numerical
Magnetohydrodynamics (MHD) simulation is one of
the most common numerical methods to study corona
and solar wind.
4Part ?. Introduction
- Besides, calculating and converting the generated
coronal electron density to the coronal
polarization brightness (pB) is the key method of
comparing with observation results, and is
important to validate the MHD models. - Due to the massive data and the complexity of the
pB model, the computation will cost too much time
to visualize the pB data in nearly real time
while using a single CPU (or core).
5Part ?. Introduction
- According to the characteristic of CPU/GPU
computing environment, we analyze the pB
conversion algorithm, implement two parallel
models of pB calculation with MPI and CUDA, and
compares the two models efficiency.
6Part ?. pB Calculation Formula
- pB is derived from electron-scattered photosphere
radiation. It can be used in the inversion of
coronal electron density and to validate
numerical models. Taking limb darkening into
account, pB calculation formula of a small
coronal volume element is shown as followed -
(1) -
(2) -
(3)
7Part ?. pB Calculation Formula
- The polarization brightness image for comparing
with the observation of coronagraph can be
generated through integrating the electron
density along the line of sight. - Density integral Process of pB Calculation
8Part ?. Serial pB Calculation Process
- The steps of the serial model of pB calculation
on CPU with the experimental data are shown as
below. -
- The
serial process of pB calculation
9Part ?. Serial pB Calculation Process
- According to the serial process of pB calculation
above, we implement it under the environment of
G95 on Linux and Visual Studio 2005 on Windows XP
respectively. - With being measured the time cost of each step,
it is found that the most time-consuming part of
the whole program is the calculation of pB
values, accounting for 98.05 and 99.05 of the
total time cost respectively.
10Part ?. Serial pB Calculation Process
- Therefore, in order to improve the performance to
meet the command of getting coronal polarization
brightness in nearly real-time, we should
optimize the calculation part of pB values. - As the density integration of each point over
solar limb along the line of sight is
independent, the parallel computation method is
very suitable for pB calculation.
11Part ?. Parallel pB Calculation Models
- Currently, parallelized MHD numerical calculation
is mainly based on MPI. - With the development of high performance
computation, using GPU architecture to solve
intensive computation shows obvious advantages. - Based on this situation, it will be an efficient
parallel solution to implement the parallel MHD
numerical calculation using GPU. - We implement two parallel models based on MPI and
CUDA respectively.
12Part ?. Parallel pB Calculation Models
- Experiment Environment
- Experimental Data
- 424282(r, ?, f) density data(den)
- 321321481(x , y, z) cartesian coordinate grid
- 321321 pB values will be generated.
- Hardware
- Intel(R) Xeon(R) CPU, E5405 _at_ 2.00GHz(8 CPUs)
- 1GB memory
- NVIDIA Quadro FX 4600 GPU, 760MB Global Memory
GDDR3 SDRAM graphics card - (It owns G80 kernel architecture, 12 MPs and
128 SPs )
13Part ?. Parallel pB Calculation Models
- Experiment Environment
- Compiling Environment
- CUDA-based parallel model
- Visual Studio 2005 on Windows XP
- CUDA 1.1 SDK
- MPI-based parallel model
- G95 on Linux
- MPICH2
14Part ?. Parallel pB Calculation Models
- MPI-based Parallelized Implementation
- In the MPI environment, how the experiment
decomposes computing domain into sub-domains is
shown as bellow.
15Part ?. Parallel pB Calculation Models
- MPI-based Parallelized Implementation
16Part ?. Parallel pB Calculation Models
- MPI-based Parallelized Implementation
- The final result shows that MPI-based parallel
model reaches a speedup of 5.8. As the experiment
is implemented under the platform with 8 CPU
cores, the speed-up ratio of the result is closed
to its theoretical value. - Meanwhile, it is revealed that the MPI-based
parallel solution for the experiment has balanced
the utilization ratio of processors and the
communication between processors.
17Part ?. Parallel pB Calculation Models
- CUDA-based Parallelized Implementation
- According to pB serial calculation process and
the CUDA architecture, we should put the
calculation part into the Kernel function to
implement the parallel program. - Since the calculation of density interpolation
and the cumulative sum involved in every pB value
are independent, we can use multi-threads to
process the pB value calculation in the CUDA, and
each thread calculates one pB value.
18Part ?. Parallel pB Calculation Models
- CUDA-based Parallelized Implementation
- However, the pB values to be calculated is much
larger than the available thread number of GPU,
so each thread should calculate multiple pB
values. According to experimental conditions, the
thread number is setting to 256 for each block so
as to maximize the use of computing resources. - The block number depends on the ratio of pB
number and thread number. In addition, since the
access time of global memory is large, we can put
some independent data to the shared memory to
reduce data access time.
19Part ?. Parallel pB Calculation Models
- CUDA-based Parallelized Implementation
- The size of data put into shared memory is about
7KB, less than 16KB provided by GPU, so the
parallel solution is feasible. - Moreover, the data-length array is read-only and
its using frequency is very high, so the
optimized strategy that the data-length array is
migrated from shared memory into constant memory
is adopted to further improve its access
efficiency. - The CUDA-based parallel calculation process is
shown as bellow.
20Part ?. Parallel pB Calculation Models
21Part ?. Parallel pB Calculation Models
- Experiment results
- The pB calculation time of two models is shown in
Table 1. - Table 1. The pB calculation time of serial models
and parallel - models and their speed-up ratio
MPI (G95) CUDA (Visual Studio 2005)
pB calculation time of serial models(s) 32.403 48.938
pB calculation time of parallel models(s) 5.053 1.536
Speed-up ratio 6.41 31.86
22Part ?. Parallel pB Calculation Models
- Experiment results
- The total performance of two models is as shown
in Table 2. - Table 2. The total running-time of two parallel
models and the speed-up ratios - compared with their serial models
MPI (G95)(s) CUDA (Visual Studio 2005)(s) The speed-up ratio of running-time
Serial models 33.05 49.406 0.67
Parallel models 5.70 2.004 2.84
23Part ?. Parallel pB Calculation Models
- Experiment results
- Finally, we draw the coronal polarization
brightness image shown as bellow with using
calculated data.
24Conclusion
- Under the same environment, pB calculation time
of MPI-based parallel model costs 5.053 seconds
while the serial model costs 32.403 seconds. The
models speedup is 6.41. - The pB calculation time of CUDA-based parallel
model costs 1.536 seconds while the serial model
costs 48.936 seconds. The models speedup is
31.86. - The total running-time of CUDA-based model is
2.84 times than that of MPI-based model.
25Conclusion
- It finds that the CUDA-based parallel model is
more suitable for pB calculation, and it provides
a better solution for post-processing and
visualizing the MHD numerical calculation
results.
26