Progress in the Parallelization of the Rmatrix I codes - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Progress in the Parallelization of the Rmatrix I codes

Description:

However, since single columns are stored, neither Level 2 and 3 BLAS are able to be used. ... parallelism in any column, and calls the BLAS on local subarrays. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 47
Provided by: iafe
Category:

less

Transcript and Presenter's Notes

Title: Progress in the Parallelization of the Rmatrix I codes


1
Progress in the Parallelization of the R-matrix
I codes
Darío M. Mitnik
N. R. Badnell D. C. Griffin M. S. Pindzola
2
Good News
  • We have an (almost) complete parallel version of
    the R-matrix I package

3
Glossary
4
Glossary
5
Glossary
  • observed speedup

6
Glossary
7
Glossary
8
Glossary
9
Glossary
10
The R-matrix I package
  • Inner-Region
  • STG1 calculates the orbital basis and all
    radial integrals
  • STG2 calculates LS-coupling matrix elements.
    solves the N-electron problem.
    sets the (N1)-electron Hamiltonian
  • STG3 diagonalizes the (N1)-electron
    Hamiltonian in the continuum basis

11
The R-matrix I package
  • Outer-Region
  • STGF solves the external-region coupled
    equations.
  • STGICF calculates level-to-level collision
    strengths by doing
    an intermediate- coupling frame
    transformation.

12
Why parallel codes?
  • Time
  • Memory

13
Why parallel codes?
Size of (N1)-Hamiltonian MXMAT MZCHF x MZNR2
MZNC2
158 x 50 100 8000 512 Mb
14
Four basic steps required to call a ScaLAPACK
routine
  • Initialize the process grid
  • Distribute the matrix on the grid
  • Call ScaLAPACK routine
  • Release the process grid

15
Initialization of the process grid
  • call BLACS_GRIDINIT(ictxt,R,nprow,npcol)
  • call BLACS_GRIDINFO(ictxt,nprow,npcol,myrow,mycol)

Example nprow2 npcol3
row 0
row 1
16
Matrix Distribution
Examples of data layouts
  • 1-D block column distribution
  • 1-D cyclic column distribution
  • 1-D block-cyclic column distribution
  • 2-D block-cyclic column distribution

17
(No Transcript)
18
(No Transcript)
19
The 2-D Block-Cyclic Distribution Example
9x9 system of linear equation

20
Partition the matrix A in row and column block
sizeshaving MBNB2
21
2) Mapping of matrix A onto a 6 process
gridProw2 Pcol3.
22
3) Final shape of distributed matrix A

23
The LAPACK subroutine ssyevCALL SSYEV
(jobz,uplo,N,A,lda,W, work,lwork,info)
The ScaLAPACK subroutine pssyevCALL PSSYEV
(jobz,uplo,N,A,ia,ja,desca, W,Z,iz,jz,descz,work,
lwork,info)
The array descriptor DESCA CALL
DESCINIT(desca,M,N,MB,NB,rsrc,csrc,ictx,mxllda,inf
o) is equivalent to DESCA(1)1 DESCA(4)N DESCA(
7)rsrc DESCA(2)ictxt DESCA(5)MB DESCA(8)csrc D
ESCA(3)M DESCA(6)NB DESCA(9)mxllda
24
Diagonalization Timing
25
General Features of STG3
  • MAIN
  • call STG3 ! open files
  • call STG3RD ! read dstg3
  • call TAPERD ! read STG2H.DAT
  • call RSCT ! diagonalization
  • if (more) go to 10

26
General Features of PSTG3 (simple approach)
  • MAIN
  • call STG3 ! open files
  • call STG3RD ! read dstg3
  • call TAPERD ! read STG2H.DAT
  • call RSCT ! diagonalization
  • call mpi_bcast(H)
  • call diag_parallel(H)
  • call printeigenv
  • call readeigenv
  • if (more) go to 10

27
General Features of PSTG3
  • MAIN
  • call STG3 ! open files
  • call STG3RD ! read dstg3
  • call initctx
  • call Hsize
  • allocate H
  • call TAPERD ! read STG2H.DAT
  • call locH
  • call RSCT ! diagonalization
  • call diag_parallel(H)
  • if (more) go to 10

28
Surface amplitudes calculation
29
Surface amplitudes calculation
  • C DIAGONALIZE AND LOOP OVER ALL EIGENVECTORS (N0)
  • DO 210 N01,MNP2
  • ...
  • C DETERMINE THE SURFACE AMPLITUDES, WMAT(I,J)
  • DO 190 J1,NCHAN
  • L L2P(J) 1
  • SUM 0.0D0
  • DO 180 I1,NOTERM
  • SUM SUM X(IM)ENDS(I,L)
  • CONTINUE
  • M M NOTERM
  • WMAT(J,N0) SUM
  • 190 CONTINUE

30
Surface amplitudes calculation (parallel)
  • c. . . . determine the surface amplitudes,
    wmat(channel,eigenvalue)
  • isizemax notermnchan
  • do 205 irow1,irowsize
  • do 200 icol1,icolsize
  • c. . . . . identifying irow,icol ? I,J
  • call demapping(irow,icol,I,J)
  • c. . . . .wmat is constructed only from isizemax
    tems
  • if (I.gt.isizemax) go to 200
  • c. . . . . wmat(channel,eigvn) sum
    vector(icont,eigvn)ends(icont,channel)
  • jchan (I-1)/noterm 1
  • icont mod(I,noterm)
  • if (icont.eq.0) icontnoterm
  • l l2p(jchan) 1
  • wmat(jchan,J) wmatl(jchan,J)
    vector(irow,icol)ends(icont,l)
  • continue
  • continue

c. . . . .sum the local wmatl to the global wmat
call mpi_reduce(wmatl,wmat,mzchfmxmat,m
pi_real8,
mpi_sum,0,mpi_comm_world,ierr)
31
pseudoresonances elimination( T.W. Gorczyca et
al. , PRA 52 3877 (1995) )
Nonphysical pseudoresonances arise from the
(N1)-electron bound states that are routinely
kept within the close-coupling expansion. The
transformation (and reduction) of the bound
portion of the (N1)-electron basis eliminates
this class of pseudoresonances.
32
pseudoresonances elimination( T.W. Gorczyca et
al. , PRA 52 3877 (1995) )
Nonphysical pseudoresonances arise from the
(N1)-electron bound states that are routinely
kept within the close-coupling expansion. The
transformation (and reduction) of the bound
portion of the (N1)-electron basis eliminates
this class of pseudoresonances.
H is a reduced (N1)-Hamiltonian
33
pseudoresonances elimination( T.W. Gorczyca et
al. , PRA 52 3877 (1995) )
34
pseudoresonances elimination( T.W. Gorczyca et
al. , PRA 52 3877 (1995) )
Hamiltonian size reduction
35
Example
36
Input File
37
Parallelization of STGF
  • C START ENERGY LOOP
  • C
  • DO 50 IE1,MXE
  • ..........
  • CONTINUE

38
Parallelization of PSTGF
omega (global)
39
Parallelization of PSTGF
40
General Features of STGICF
Intermediate Coupling Frame Transformation
(ICFT) D.C. Griffin, N.R. Badnell and M.S.
Pindzola, J.Phys.B 31, 3713 (1998)
K KOO KOC KCC tan(pn)-1 KCO
S SOO SOC SCC - exp(-2ipn)-1 SCO
K physical K-matrix K unphysical K-matrices
calculated in a coarse energy mesh O
open-channels C closed-channels n effective
quantum number
41
General Features of STGICF
Intermediate Coupling Frame Transformation
(ICFT) D.C. Griffin, N.R. Badnell and M.S.
Pindzola, J.Phys.B 31, 3713 (1998)
42
General Features of PSTGICF
unphysical K-matrix - coarse mesh
Energy
43
General Features of PSTGICF
c loop over the first nmod coarse energy mesh
points, where nmod is a multiple of the number of
processors nrmd mod(numk,nproc) nmod
numk-nrmd c first calculate the remainder of
coarse energy mesh points do 600
iek1,nmod modk mod(iek-1,nproc) if
(iam.ne.modk) go to 600 call readkmtx(iek) cal
l mpi_send( K2 ? inext) -- call mpi_recv(
K1 ? iprev) call kmtrjk(iek,iwj) -- call
kmtric(iek,iwj) c loop on fine energy
mesh do 500 ieeiee0,mxe if (emesh(iee)
outside iek,iek1 go to 500) c loop in
partial waves do 450 iwj1,npwj call
interic(iee,iwj,iek) call omgic(iee,iwj) 450
continue 500 continue 600 continue
44
General Features of STGICF
Electron collision cross section
S (1 iK)-1 (1 iK)
Wij g/2 S Tij2
sij p a02/(ki2 gi) Wij
45
Conclusions
  • We have a working version of the R-matrix I
    package
  • PSTG3 uses ScaLAPACK subroutines for the
    diagonalization of the (N1)-Hamiltonian.
  • STGF is parallelized in energy points
  • STGICF is parallelized in (coarse) energy points
  • The codes show good scalability.

46
Conclusions
  • Need implementation of many other features
    (photoionization, non-exchange codes, etc.)
  • Need test on (inner-outer) symmetry loops
  • Strategy decision for future machines
  • Study of other libraries/tools, like Global
    Arrays,
  • MPI II, etc.
Write a Comment
User Comments (0)
About PowerShow.com