Parallel Multi-Reference Configuration Interaction on JAZZ - PowerPoint PPT Presentation

About This Presentation

Title:

Parallel Multi-Reference Configuration Interaction on JAZZ

Description:

1: Atomic-Orbital Integral Generation. 2: Orbital Optimization (MCSCF, SCF) 3: Integral Transformation. 4: MR-SDCI. 5: CI Density. 6: Properties (energy gradient, ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 31

Provided by: ronsh7

Learn more at: https://www.mcs.anl.gov

Category:

more less

Transcript and Presenter's Notes

Title: Parallel Multi-Reference Configuration Interaction on JAZZ

1
Parallel Multi-Reference Configuration
Interaction on JAZZ

Ron Shepard (CHM)Mike Minkoff (MCS)Mike Dvorak
(MCS)

2
The COLUMBUS Program System

Molecular Electronic Structure
Collection of individual programs that
communicate through external files
1 Atomic-Orbital Integral Generation2 Orbital
Optimization (MCSCF, SCF)3 Integral
Transformation4 MR-SDCI5 CI Density6
Properties (energy gradient, geometry
optimization)

3
Real Symmetric Eigenvalue Problem

Use the iterative Davidson Method for the lowest
(or lowest few) eigenpairs
Direct CI H is not explicitly constructed, wHv
are constructed in operator form
Matrix dimensions are 104 to 109
All floating point calculations are 64-bit

4
Davidson Method
Generate an initial vector x1 MAINLOOP DO n1,
NITER Compute and save wn H xn Compute
the nth row and column of G XTHX WTX
Compute the subspace Ritz pair (G ?1) c 0
Compute the residual vector r W c ? X c
Check for convergence using r, c, ?, etc.
IF (converged) THEN EXIT MAINLOOP
ELSE Generate a new expansion vector xn1
from r, ?, vXc, etc. ENDIF ENDDO MAINLOOP
5
Matrix Elements

Hmn ltm Hop ngt
ngt ?(r1) ?1 ?(r2)?2 ?(rn)?n with
?j?, ?

6
Matrix Elements

hpq and gpqrs are computed and stored as arrays
(with index symmetry)
ltmEpqngt and ltmepqrsngt are coupling
coefficients these are sparse and are recomputed
as needed

7
Matrix-Vector Products
w H x

The challenge is to bring together the different
factors in order to compute w efficiently

8
Coupling Coefficient Evaluation

Graphical Unitary Group Approach (GUGA)
Define a directed graph with nodes and arcs
Shavitt Graph
Nodes correspond to spin-coupled states
consisting of a subset of the total number of
orbitals
Arcs correspond to the (up to) four allowed spin
couplings when an orbital is added to the graph

9
Coupling Coefficient Evaluation
? graph head
Internal orbitals
?w,x,y,z
External orbitals
?graph tail
10
Coupling Coefficient Evaluation
11
Integral Types

0 gpqrs
1 gpqra
2 gpqab, gpa,qb
3 gpabc
4 gabcd

12
Original Program (1980)

Need to optimize wave functions for Ncsf105 to
106
Available memory was typically 105 words
Must segment the vectors, v and w, and partition
the matrix H into subblocks, then work with one
subblock at a time.

13
First Parallel Program (1990)

Networked workstations using TCGMSG
Each matrix subblock corresponds to a compute
task
Different tasks require different resources (pay
attention to load balancing)
Same vector segmentation for all gpqrs types
gpqrs, ltm epqrs ngt, w, and v were stored on
external shared files (file contention
bottlenecks)

14
Current Parallel Program

Eliminate shared file I/O by distributing data
across the nodes with the GA Library
Parallel efficiency depends on the vector
segmentation and corresponding H subblocking
Apply different vector segmentation for different
gpqrs types
Tasks are timed each Davidson iteration, then
sorted into decreasing order and reassigned for
the next iteration in order to optimize load
balancing
Manual tuning of the segmentation is required for
optimal performance
Capable of optimizing expansions up to Ncsf109

15
COLUMBUS-PetaflopsApplication

Mike Dvorak, Mike Minkoff
MCS Division
Ron Shepard
Chemistry Division
Argonne National Lab

16
Notes on software engineering

PCIUDG parallel code
Fortran 77/90
Compiled with Intel/Myrinet on Jazz
70k lines in PCIUDG
14 files containing 205 subroutines
Versioning system
Currently distributed in a tar file
Created a LCRC CVS repository for personal code
mods

17
Notes on Software Engineering (cont)

Homegrown preprocessing system
Uses mdcif parallel statements to
comment/uncomment parts of the code
Could/should be replaced with CPP directives
Global Arrays library
Provides global address space for matrix
computation
Used mainly for chemistry codes but applicable
for other applications
Ran with most current version --gt no perf gain
Installed on Softenv on Jazz (version 3.2.6)

18
Gprof Output

270 subroutines called
loopcalc subroutine using 20 of simulation time
Added user defined MPE states to 50 loopcalc
calls
Challenge due to large number of subroutines in
file
2 GB file size severe limiter on number of procs
Broken logging
Show actual output

19
Jumpshot/MPE Instrumentation

Live Demo of a 20 proc run

20
Using FPMPI

Relinked code with FPMPI
Tell you total number of MPE calls made
Output file size smalled (compared to other tools
i.e. Jumpshot)
Produces a histogram of message sizes
Not installed in Softenv on Jazz yet
riley/fpmpi-2.0
Problem for runs
Double Zeta C2H4 without optimizing the load
balance

21
Total Number of MPI calls
22
Max/Avg MPI Complete Time
23
Avg/Max Time MPI Barrier
24
COLUMBUS Performance Results
25
COLUMBUS Performance Data

R. Shepard, M. Dvorak, M. Minkoff

26
Timing of Steps (Sec.)
Time Basis Set Integral Time Orbital Opt. Time CI Time
QZ 388 11806 382,221
TZ 26 104 31,415
DZ 1 34 3,281
27
Walks Vs. Basis Set (Millions)
Walk Type Basis Set Z Y X W Matrix Dim.
cc-pVQZ .08 15 536 305 858
cc-pVTZ .08 7 120 69 198
cc-pVDZ .08 2 13 8 24
28
Timing of CI Iteration
29
Basic Model of PerformanceTime C1C2NC3/N
30
Constrained Linear TermC2 gt 0

Write a Comment

User Comments (0)