Parallel Multi-Reference Configuration Interaction on JAZZ - PowerPoint PPT Presentation

About This Presentation
Title:

Parallel Multi-Reference Configuration Interaction on JAZZ

Description:

1: Atomic-Orbital Integral Generation. 2: Orbital Optimization (MCSCF, SCF) 3: Integral Transformation. 4: MR-SDCI. 5: CI Density. 6: Properties (energy gradient, ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 31
Provided by: ronsh7
Learn more at: https://www.mcs.anl.gov
Category:

less

Transcript and Presenter's Notes

Title: Parallel Multi-Reference Configuration Interaction on JAZZ


1
Parallel Multi-Reference Configuration
Interaction on JAZZ
  • Ron Shepard (CHM)Mike Minkoff (MCS)Mike Dvorak
    (MCS)

2
The COLUMBUS Program System
  • Molecular Electronic Structure
  • Collection of individual programs that
    communicate through external files
  • 1 Atomic-Orbital Integral Generation2 Orbital
    Optimization (MCSCF, SCF)3 Integral
    Transformation4 MR-SDCI5 CI Density6
    Properties (energy gradient, geometry
    optimization)

3
Real Symmetric Eigenvalue Problem
  • Use the iterative Davidson Method for the lowest
    (or lowest few) eigenpairs
  • Direct CI H is not explicitly constructed, wHv
    are constructed in operator form
  • Matrix dimensions are 104 to 109
  • All floating point calculations are 64-bit

4
Davidson Method
Generate an initial vector x1 MAINLOOP DO n1,
NITER Compute and save wn H xn Compute
the nth row and column of G XTHX WTX
Compute the subspace Ritz pair (G ?1) c 0
Compute the residual vector r W c ? X c
Check for convergence using r, c, ?, etc.
IF (converged) THEN EXIT MAINLOOP
ELSE Generate a new expansion vector xn1
from r, ?, vXc, etc. ENDIF ENDDO MAINLOOP
5
Matrix Elements
  • Hmn ltm Hop ngt
  • ngt ?(r1) ?1 ?(r2)?2 ?(rn)?n with
    ?j?, ?

6
Matrix Elements
  • hpq and gpqrs are computed and stored as arrays
    (with index symmetry)
  • ltmEpqngt and ltmepqrsngt are coupling
    coefficients these are sparse and are recomputed
    as needed

7
Matrix-Vector Products
w H x
  • The challenge is to bring together the different
    factors in order to compute w efficiently

8
Coupling Coefficient Evaluation
  • Graphical Unitary Group Approach (GUGA)
  • Define a directed graph with nodes and arcs
    Shavitt Graph
  • Nodes correspond to spin-coupled states
    consisting of a subset of the total number of
    orbitals
  • Arcs correspond to the (up to) four allowed spin
    couplings when an orbital is added to the graph

9
Coupling Coefficient Evaluation
? graph head
Internal orbitals
?w,x,y,z
External orbitals
?graph tail
10
Coupling Coefficient Evaluation
11
Integral Types
  • 0 gpqrs
  • 1 gpqra
  • 2 gpqab, gpa,qb
  • 3 gpabc
  • 4 gabcd

12
Original Program (1980)
  • Need to optimize wave functions for Ncsf105 to
    106
  • Available memory was typically 105 words
  • Must segment the vectors, v and w, and partition
    the matrix H into subblocks, then work with one
    subblock at a time.

13
First Parallel Program (1990)
  • Networked workstations using TCGMSG
  • Each matrix subblock corresponds to a compute
    task
  • Different tasks require different resources (pay
    attention to load balancing)
  • Same vector segmentation for all gpqrs types
  • gpqrs, ltm epqrs ngt, w, and v were stored on
    external shared files (file contention
    bottlenecks)

14
Current Parallel Program
  • Eliminate shared file I/O by distributing data
    across the nodes with the GA Library
  • Parallel efficiency depends on the vector
    segmentation and corresponding H subblocking
  • Apply different vector segmentation for different
    gpqrs types
  • Tasks are timed each Davidson iteration, then
    sorted into decreasing order and reassigned for
    the next iteration in order to optimize load
    balancing
  • Manual tuning of the segmentation is required for
    optimal performance
  • Capable of optimizing expansions up to Ncsf109

15
COLUMBUS-PetaflopsApplication
  • Mike Dvorak, Mike Minkoff
  • MCS Division
  • Ron Shepard
  • Chemistry Division
  • Argonne National Lab

16
Notes on software engineering
  • PCIUDG parallel code
  • Fortran 77/90
  • Compiled with Intel/Myrinet on Jazz
  • 70k lines in PCIUDG
  • 14 files containing 205 subroutines
  • Versioning system
  • Currently distributed in a tar file
  • Created a LCRC CVS repository for personal code
    mods

17
Notes on Software Engineering (cont)
  • Homegrown preprocessing system
  • Uses mdcif parallel statements to
    comment/uncomment parts of the code
  • Could/should be replaced with CPP directives
  • Global Arrays library
  • Provides global address space for matrix
    computation
  • Used mainly for chemistry codes but applicable
    for other applications
  • Ran with most current version --gt no perf gain
  • Installed on Softenv on Jazz (version 3.2.6)

18
Gprof Output
  • 270 subroutines called
  • loopcalc subroutine using 20 of simulation time
  • Added user defined MPE states to 50 loopcalc
    calls
  • Challenge due to large number of subroutines in
    file
  • 2 GB file size severe limiter on number of procs
  • Broken logging
  • Show actual output

19
Jumpshot/MPE Instrumentation
  • Live Demo of a 20 proc run

20
Using FPMPI
  • Relinked code with FPMPI
  • Tell you total number of MPE calls made
  • Output file size smalled (compared to other tools
    i.e. Jumpshot)
  • Produces a histogram of message sizes
  • Not installed in Softenv on Jazz yet
  • riley/fpmpi-2.0
  • Problem for runs
  • Double Zeta C2H4 without optimizing the load
    balance

21
Total Number of MPI calls
22
Max/Avg MPI Complete Time
23
Avg/Max Time MPI Barrier
24
COLUMBUS Performance Results
25
COLUMBUS Performance Data
  • R. Shepard, M. Dvorak, M. Minkoff

26
Timing of Steps (Sec.)
Time Basis Set Integral Time Orbital Opt. Time CI Time
QZ 388 11806 382,221
TZ 26 104 31,415
DZ 1 34 3,281
27
Walks Vs. Basis Set (Millions)
Walk Type Basis Set Z Y X W Matrix Dim.
cc-pVQZ .08 15 536 305 858
cc-pVTZ .08 7 120 69 198
cc-pVDZ .08 2 13 8 24
28
Timing of CI Iteration
29
Basic Model of PerformanceTime C1C2NC3/N
30
Constrained Linear TermC2 gt 0
Write a Comment
User Comments (0)
About PowerShow.com