Title: Progress in the Parallelization of the Rmatrix I codes
1Progress in the Parallelization of the R-matrix
I codes
Darío M. Mitnik
N. R. Badnell D. C. Griffin M. S. Pindzola
2Good News
- We have an (almost) complete parallel version of
the R-matrix I package
3Glossary
4Glossary
5Glossary
6Glossary
7Glossary
8Glossary
9Glossary
10The R-matrix I package
- Inner-Region
- STG1 calculates the orbital basis and all
radial integrals - STG2 calculates LS-coupling matrix elements.
solves the N-electron problem.
sets the (N1)-electron Hamiltonian - STG3 diagonalizes the (N1)-electron
Hamiltonian in the continuum basis
11The R-matrix I package
- Outer-Region
- STGF solves the external-region coupled
equations. - STGICF calculates level-to-level collision
strengths by doing
an intermediate- coupling frame
transformation.
12Why parallel codes?
13Why parallel codes?
Size of (N1)-Hamiltonian MXMAT MZCHF x MZNR2
MZNC2
158 x 50 100 8000 512 Mb
14Four basic steps required to call a ScaLAPACK
routine
- Initialize the process grid
- Distribute the matrix on the grid
- Call ScaLAPACK routine
- Release the process grid
15Initialization of the process grid
- call BLACS_GRIDINIT(ictxt,R,nprow,npcol)
- call BLACS_GRIDINFO(ictxt,nprow,npcol,myrow,mycol)
Example nprow2 npcol3
row 0
row 1
16Matrix Distribution
Examples of data layouts
- 1-D block column distribution
- 1-D cyclic column distribution
- 1-D block-cyclic column distribution
- 2-D block-cyclic column distribution
17(No Transcript)
18(No Transcript)
19The 2-D Block-Cyclic Distribution Example
9x9 system of linear equation
20Partition the matrix A in row and column block
sizeshaving MBNB2
212) Mapping of matrix A onto a 6 process
gridProw2 Pcol3.
223) Final shape of distributed matrix A
23The LAPACK subroutine ssyevCALL SSYEV
(jobz,uplo,N,A,lda,W, work,lwork,info)
The ScaLAPACK subroutine pssyevCALL PSSYEV
(jobz,uplo,N,A,ia,ja,desca, W,Z,iz,jz,descz,work,
lwork,info)
The array descriptor DESCA CALL
DESCINIT(desca,M,N,MB,NB,rsrc,csrc,ictx,mxllda,inf
o) is equivalent to DESCA(1)1 DESCA(4)N DESCA(
7)rsrc DESCA(2)ictxt DESCA(5)MB DESCA(8)csrc D
ESCA(3)M DESCA(6)NB DESCA(9)mxllda
24Diagonalization Timing
25General Features of STG3
- MAIN
- call STG3 ! open files
- call STG3RD ! read dstg3
- call TAPERD ! read STG2H.DAT
- call RSCT ! diagonalization
- if (more) go to 10
26General Features of PSTG3 (simple approach)
- MAIN
- call STG3 ! open files
- call STG3RD ! read dstg3
- call TAPERD ! read STG2H.DAT
- call RSCT ! diagonalization
- call mpi_bcast(H)
- call diag_parallel(H)
- call printeigenv
- call readeigenv
- if (more) go to 10
27General Features of PSTG3
- MAIN
- call STG3 ! open files
- call STG3RD ! read dstg3
- call initctx
- call Hsize
- allocate H
- call TAPERD ! read STG2H.DAT
- call locH
- call RSCT ! diagonalization
- call diag_parallel(H)
- if (more) go to 10
28Surface amplitudes calculation
29Surface amplitudes calculation
- C DIAGONALIZE AND LOOP OVER ALL EIGENVECTORS (N0)
- DO 210 N01,MNP2
- ...
- C DETERMINE THE SURFACE AMPLITUDES, WMAT(I,J)
- DO 190 J1,NCHAN
- L L2P(J) 1
- SUM 0.0D0
- DO 180 I1,NOTERM
- SUM SUM X(IM)ENDS(I,L)
- CONTINUE
- M M NOTERM
- WMAT(J,N0) SUM
- 190 CONTINUE
30Surface amplitudes calculation (parallel)
- c. . . . determine the surface amplitudes,
wmat(channel,eigenvalue) - isizemax notermnchan
- do 205 irow1,irowsize
- do 200 icol1,icolsize
- c. . . . . identifying irow,icol ? I,J
- call demapping(irow,icol,I,J)
- c. . . . .wmat is constructed only from isizemax
tems - if (I.gt.isizemax) go to 200
- c. . . . . wmat(channel,eigvn) sum
vector(icont,eigvn)ends(icont,channel) - jchan (I-1)/noterm 1
- icont mod(I,noterm)
- if (icont.eq.0) icontnoterm
- l l2p(jchan) 1
- wmat(jchan,J) wmatl(jchan,J)
vector(irow,icol)ends(icont,l) - continue
- continue
c. . . . .sum the local wmatl to the global wmat
call mpi_reduce(wmatl,wmat,mzchfmxmat,m
pi_real8,
mpi_sum,0,mpi_comm_world,ierr)
31pseudoresonances elimination( T.W. Gorczyca et
al. , PRA 52 3877 (1995) )
Nonphysical pseudoresonances arise from the
(N1)-electron bound states that are routinely
kept within the close-coupling expansion. The
transformation (and reduction) of the bound
portion of the (N1)-electron basis eliminates
this class of pseudoresonances.
32pseudoresonances elimination( T.W. Gorczyca et
al. , PRA 52 3877 (1995) )
Nonphysical pseudoresonances arise from the
(N1)-electron bound states that are routinely
kept within the close-coupling expansion. The
transformation (and reduction) of the bound
portion of the (N1)-electron basis eliminates
this class of pseudoresonances.
H is a reduced (N1)-Hamiltonian
33pseudoresonances elimination( T.W. Gorczyca et
al. , PRA 52 3877 (1995) )
34pseudoresonances elimination( T.W. Gorczyca et
al. , PRA 52 3877 (1995) )
Hamiltonian size reduction
35Example
36Input File
37Parallelization of STGF
- C START ENERGY LOOP
- C
- DO 50 IE1,MXE
- ..........
- CONTINUE
38Parallelization of PSTGF
omega (global)
39Parallelization of PSTGF
40General Features of STGICF
Intermediate Coupling Frame Transformation
(ICFT) D.C. Griffin, N.R. Badnell and M.S.
Pindzola, J.Phys.B 31, 3713 (1998)
K KOO KOC KCC tan(pn)-1 KCO
S SOO SOC SCC - exp(-2ipn)-1 SCO
K physical K-matrix K unphysical K-matrices
calculated in a coarse energy mesh O
open-channels C closed-channels n effective
quantum number
41General Features of STGICF
Intermediate Coupling Frame Transformation
(ICFT) D.C. Griffin, N.R. Badnell and M.S.
Pindzola, J.Phys.B 31, 3713 (1998)
42General Features of PSTGICF
unphysical K-matrix - coarse mesh
Energy
43General Features of PSTGICF
c loop over the first nmod coarse energy mesh
points, where nmod is a multiple of the number of
processors nrmd mod(numk,nproc) nmod
numk-nrmd c first calculate the remainder of
coarse energy mesh points do 600
iek1,nmod modk mod(iek-1,nproc) if
(iam.ne.modk) go to 600 call readkmtx(iek) cal
l mpi_send( K2 ? inext) -- call mpi_recv(
K1 ? iprev) call kmtrjk(iek,iwj) -- call
kmtric(iek,iwj) c loop on fine energy
mesh do 500 ieeiee0,mxe if (emesh(iee)
outside iek,iek1 go to 500) c loop in
partial waves do 450 iwj1,npwj call
interic(iee,iwj,iek) call omgic(iee,iwj) 450
continue 500 continue 600 continue
44General Features of STGICF
Electron collision cross section
S (1 iK)-1 (1 iK)
Wij g/2 S Tij2
sij p a02/(ki2 gi) Wij
45Conclusions
- We have a working version of the R-matrix I
package - PSTG3 uses ScaLAPACK subroutines for the
diagonalization of the (N1)-Hamiltonian. - STGF is parallelized in energy points
- STGICF is parallelized in (coarse) energy points
- The codes show good scalability.
46Conclusions
- Need implementation of many other features
(photoionization, non-exchange codes, etc.) - Need test on (inner-outer) symmetry loops
- Strategy decision for future machines
- Study of other libraries/tools, like Global
Arrays, - MPI II, etc.