Performance and Productivity: NWChem - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

Performance and Productivity: NWChem

Description:

NWChem is funded by the U.S. Department of Energy, Office of Science, Office of ... and the Center for Computational Sciences at Oak Ridge National Laboratory under ... – PowerPoint PPT presentation

Number of Views:174

Avg rating:3.0/5.0

Slides: 60

Provided by: robertjh

Category:

more less

Transcript and Presenter's Notes

Title: Performance and Productivity: NWChem

1
Performance and Productivity NWChem

Robert J. Harrison
Oak Ridge National Laboratory,
University of Tennessee

NWChem is funded by the U.S. Department of
Energy, Office of Science, Office of Biological
and Environmental Research, under contract
DE-AC06-76RLO 1830 with Battelle Memorial
Institute (Pacific Northwest National Laboratory,
PNNL) as part of the Environmental Molecular
Sciences Laboratory, PNNL.
2
NWChem Citation
T.H. Dunning, Jr., D.A. Dixon, M.F. Guest

R. J. Harrison, J. A. Nichols, T. P. Straatsma,
M. Dupuis,E. J. Bylaska, G. I. Fann, T. L.
Windus, E. Apra, W. de Jong,S. Hirata, M. T.
Hackler, J. Anchell, D. Bernholdt, P.
Borowski,T. Clark, D. Clerc, H. Dachsel, M.
Deegan, K. Dyall, D. Elwood,H. Fruchtl, E.
Glendening, M. Gutowski, K. Hirao, A. Hess,J.
Jaffe, B. Johnson, J. Ju, R. Kendall, R.
Kobayashi, R. Kutteh,Z. Lin, R. Littlefield, X.
Long, B. Meng, T. Nakajima,J. Nieplocha, S. Niu,
M. Rosing, G. Sandrone, M. Stave, H. Taylor,G.
Thomas, J. van Lenthe, K. Wolinski, A. Wong, and
Z. Zhang,
"NWChem, A Computational Chemistry Package for
Parallel
Computers, Version 4.1" (2002),
Pacific Northwest National Laboratory,Richland,
Washington 99352-0999, USA.

3
NWChem Overview

Provides major new modeling and simulation
capability for molecular science
Broad range of molecules, including biomolecules
Electronic structure of molecules
(non-relativistic, relativistic,
one-/two-component, ECPs, second deriv.)
Increasingly extensive solid state capability
Molecular dynamics, molecular mechanics
Extensible and long-lived
Freely distributed installed at about 1000
sites worldwide
Performance characteristics designed for MPP
Single node performance comparable to best serial
codes
Scalability to 1000s of processors
Portable runs on a wide range of computers

4
Molecular Science Software Suite (MS3)
http//www.emsl.pnl.gov/pub/docs/ecce/
http//www.emsl.pnl.gov/pub/docs/parsoft/
http//www.emsl.pnl.gov/pub/docs/nwchem/
5
(No Transcript)
6
(No Transcript)
7
NWChem to go ...

Compaq iPAQ
Linux (Intimate)
64 Mbyte RAM,
16 Mbyte flash,
2 Gbyte
PCMCIA disk
Strongarm CPU
Sadly, no FPU

8
Higher-level composition

Modular, hierarchical design
Easy to access high level features
Easy to extend with new high level features
Standardized interfaces
Reuse of low-level functionality without side
effects
Distributed-shared memory parallel programming
model
Non-uniform memory access (NUMA) aware algorithms
Python interface
Users developers can write NWChem programs in
Python
Cool stuff now becoming available
Automatic code generation compose with
many-body theory and/or tensor expressions
(already in NWChem 4.5)
Multiresolution quantum chemistry compose with
operators and functions

9
NWChem Architecture

Object-oriented design
abstraction, data hiding, APIs
Parallel programming model
non-uniform memory access, global arrays, MPI
Infrastructure
GA, Parallel I/O, RTDB, MA, ...
Program modules
communication only through the database
persistence for easy restart

10
Issues in Parallel Computing

Expressing and managing concurrency
The memory hierarchy
Efficient sequential execution

11
The Memory Hierarchy

Non-uniform memory access - NUMA
Your workstation is NUMA - registers, cache, main
memory, virtual memory
Parallel computers just add non-local memory(s)
Unites sequential and parallel computation
Differ only in expression and management of
concurrency
Distributed data
Do not limit calculation by resources of one node
Exploit aggregate resources of the whole machine
SCF and DFT can distribute all data gt O(N)
MP2 gradients distribute all data gt O(N2)

12
Parallel Programming ModelGlobal Arrays MPI

J. Nieplocha
Supported by DOE/ASCR/MICS
Shared-memory-like model
Fast local access
NUMA aware and easy to use
MIMD and data-parallel modes
Inter-operates with MPI,
BLAS and linear algebra interface
Used by most major chemistry codes, also in
financial futures forecasting, astrophysics,
computer graphics,
Ported to major parallel machines
IBM, Cray, SGI, clusters, ...

http//www.emsl.pnl.gov/pub/docs/global
13
Non-uniform memory access model of computation
Shared Object
Shared Object
1-sided communication
1-sided communication
copy to shared object
copy to local memory
compute/update
local memory
local memory
local memory
14
O(1) programmers O(1000) nodes O(100,000)
processors O(10,000,000) threads

Expressing/managing concurrency at the petascale
It is too trite to say that the parallelism is in
the physics
Must express and discover parallelism at more
levels
Low level tools (MPI, Co-Array Fortran, UPC, )
dont discover parallelism or hide complexity or
facilitate abstraction
Management of the memory hierarchy
Sending data from one multiprocessor chip to
another will be like us taking a trip to Europe
Memory will be deeper less uniformity between
vendors
Need tools to automate and manage this, even at
runtime

15
Synthesis of High Performance Algorithms for
Electronic Structure Calculations

Sadayappan, Baumgartner, Cociorva, Pitzer (OSU)
Ramanujam (LSU)Bernholdt, Dean, White
III, Harrison (ORNL)Hirata (PNNL)Nooijen
(Waterloo)
Objective
Automate the implementation of optimized parallel
computer programs for many-electron methods
expressed as tensor contractions
Multi-disciplinary, multi-institution project
Collaboration between NSF ITR, DOE SciDAC, and
ORNL LDRD projects

16
CCSD Doubles Equation

hbara,b,i,j sumfb,cti,j,a,c,c
-sumfk,ctk,bti,j,a,c,k,c
sumfa,cti,j,c,b,c -sumfk,ctk,ati
,j,c,b,k,c -sumfk,jti,k,a,b,k
-sumfk,ctj,cti,k,a,b,k,c
-sumfk,itj,k,b,a,k -sumfk,cti,ctj
,k,b,a,k,c sumti,ctj,dva,b,c,d,c,d
sumti,j,c,dva,b,c,d,c,d
sumtj,cva,b,i,c,c -sumtk,bva,k,i,j
,k sumti,cvb,a,j,c,c
-sumtk,avb,k,j,i,k -sumtk,dti,j,c,b
vk,a,c,d,k,c,d -sumti,ctj,k,b,dvk,a,
c,d,k,c,d -sumtj,ctk,bvk,a,c,i,k,c
2sumtj,k,b,cvk,a,c,i,k,c
-sumtj,k,c,bvk,a,c,i,k,c
-sumti,ctj,dtk,bvk,a,d,c,k,c,d
2sumtk,dti,j,c,bvk,a,d,c,k,c,d
-sumtk,bti,j,c,dvk,a,d,c,k,c,d
-sumtj,dti,k,c,bvk,a,d,c,k,c,d
2sumti,ctj,k,b,dvk,a,d,c,k,c,d
-sumti,ctj,k,d,bvk,a,d,c,k,c,d
-sumtj,k,b,cvk,a,i,c,k,c
-sumti,ctk,bvk,a,j,c,k,c
-sumti,k,c,bvk,a,j,c,k,c
-sumti,ctj,dtk,avk,b,c,d,k,c,d
-sumtk,dti,j,a,cvk,b,c,d,k,c,d
-sumtk,ati,j,c,dvk,b,c,d,k,c,d
2sumtj,dti,k,a,cvk,b,c,d,k,c,d
-sumtj,dti,k,c,avk,b,c,d,k,c,d
-sumti,ctj,k,d,avk,b,c,d,k,c,d
-sumti,ctk,avk,b,c,j,k,c
2sumti,k,a,cvk,b,c,j,k,c
-sumti,k,c,avk,b,c,j,k,c
2sumtk,dti,j,a,cvk,b,d,c,k,c,d
-sumtj,dti,k,a,cvk,b,d,c,k,c,d
-sumtj,ctk,avk,b,i,c,k,c
-sumtj,k,c,avk,b,i,c,k,c
-sumti,k,a,cvk,b,j,c,k,c
sumti,ctj,dtk,atl,bvk,l,c,d,k,l,c
,d -2sumtk,btl,dti,j,a,cvk,l,c,d,k
,l,c,d -2sumtk,atl,dti,j,c,bvk,l,c,d
,k,l,c,d sumtk,atl,bti,j,c,dvk,l,c
,d,k,l,c,d -2sumtj,ctl,dti,k,a,bvk
,l,c,d,k,l,c,d -2sumtj,dtl,bti,k,a,c
vk,l,c,d,k,l,c,d sumtj,dtl,bti,k,c,
avk,l,c,d,k,l,c,d -2sumti,ctl,dtj,
k,b,avk,l,c,d,k,l,c,d sumti,ctl,at
j,k,b,dvk,l,c,d,k,l,c,d sumti,ctl,b
tj,k,d,avk,l,c,d,k,l,c,d
sumti,k,c,dtj,l,b,avk,l,c,d,k,l,c,d
4sumti,k,a,ctj,l,b,dvk,l,c,d,k,l,c,d
-2sumti,k,c,atj,l,b,dvk,l,c,d,k,l,c,d
-2sumti,k,a,btj,l,c,dvk,l,c,d,k,l,c,d
-2sumti,k,a,ctj,l,d,bvk,l,c,d,k,l,c,
d sumti,k,c,atj,l,d,bvk,l,c,d,k,l,c,d
sumti,ctj,dtk,l,a,bvk,l,c,d,k,l,c
,d sumti,j,c,dtk,l,a,bvk,l,c,d,k,l,c,
d -2sumti,j,c,btk,l,a,dvk,l,c,d,k,l,c
,d -2sumti,j,a,ctk,l,b,dvk,l,c,d,k,l,
c,d sumtj,ctk,btl,avk,l,c,i,k,l,c
sumtl,ctj,k,b,avk,l,c,i,k,l,c
-2sumtl,atj,k,b,cvk,l,c,i,k,l,c
sumtl,atj,k,c,bvk,l,c,i,k,l,c
-2sumtk,ctj,l,b,avk,l,c,i,k,l,c
sumtk,atj,l,b,cvk,l,c,i,k,l,c
sumtk,btj,l,c,avk,l,c,i,k,l,c
sumtj,ctl,k,a,bvk,l,c,i,k,l,c
sumti,ctk,atl,bvk,l,c,j,k,l,c
sumtl,cti,k,a,bvk,l,c,j,k,l,c
-2sumtl,bti,k,a,cvk,l,c,j,k,l,c
sumtl,bti,k,c,avk,l,c,j,k,l,c
sumti,ctk,l,a,bvk,l,c,j,k,l,c
sumtj,ctl,dti,k,a,bvk,l,d,c,k,l,c,d
sumtj,dtl,bti,k,a,cvk,l,d,c,k,l,c,
d sumtj,dtl,ati,k,c,bvk,l,d,c,k,l,
c,d -2sumti,k,c,dtj,l,b,avk,l,d,c,k,l
,c,d -2sumti,k,a,ctj,l,b,dvk,l,d,c,k,
l,c,d sumti,k,c,atj,l,b,dvk,l,d,c,k,l
,c,d sumti,k,a,btj,l,c,dvk,l,d,c,k,l,
c,d sumti,k,c,btj,l,d,avk,l,d,c,k,l,c
,d sumti,k,a,ctj,l,d,bvk,l,d,c,k,l,c,
d sumtk,atl,bvk,l,i,j,k,l
sumtk,l,a,bvk,l,i,j,k,l
sumtk,btl,dti,j,a,cvl,k,c,d,k,l,c,d
sumtk,atl,dti,j,c,bvl,k,c,d,k,l,c,
d sumti,ctl,dtj,k,b,avl,k,c,d,k,l,
c,d -2sumti,ctl,atj,k,b,dvl,k,c,d,
k,l,c,d sumti,ctl,atj,k,d,bvl,k,c,d
,k,l,c,d sumti,j,c,btk,l,a,dvl,k,c,d,
k,l,c,d sumti,j,a,ctk,l,b,dvl,k,c,d,
k,l,c,d -2sumtl,cti,k,a,bvl,k,c,j,k,l
,c sumtl,bti,k,a,cvl,k,c,j,k,l,c
sumtl,ati,k,c,bvl,k,c,j,k,l,c
va,b,i,j

17
TCE Components
Sequence of Matrix Products Element-wise Matrix
Operations Element-wise Function Eval.
Tensor Expressions
Algebraic Transformations

Algebraic Transformations
Minimize operation count
Memory Minimization
Reduce intermediate storage
Space-Time Transformation
Trade storage for recomputation
Storage Management and Data Locality Optimization
Optimize use of storage hierarchy
Data Distribution and Partitioning
Optimize parallel layout

System Memory Specification
No soln fits disk
Memory Minimization
No soln fits disk
Soln fits disk, not mem.
Soln fits mem.
Space-Time Trade-Offs
Storage and Data Locality Management
Soln fits mem.
Data Distribution and Partitioning
Performance Model
Parallel Code Fortran/C/ OpenMP/MPI/Global Arrays
18
Multiresolution Quantum Chemistry Robert J.
Harrison, George I. Fann, Takeshi Yanai,
Zhengting GanOak Ridge National Laboratory
andUniversity of Tennessee, KnoxvilleandGregory
BeylkinUniversity of Coloradoharrisonrj_at_ornl.
gov
19
The funding

This work is funded by the U.S. Department of
Energy, the division of Basic Energy Science,
Office of Science, under contract
DE-AC05-00OR22725 with Oak Ridge National
Laboratory. This research was performed in part
using
the Molecular Science Computing Facility in the
Environmental Molecular Sciences Laboratory at
the Pacific Northwest National Laboratory under
contract DE-AC06-76RLO 1830 with Battelle
Memorial Institute,
resources of the National Energy Scientific
Computing Center which is supported by the Office
of Energy Research of the U.S. Department of
Energy under contract DE-AC03-76SF0098,
and the Center for Computational Sciences at Oak
Ridge National Laboratory under contract
DE-AC05-00OR22725 .
ORNL LDRD

20
Outline

Brief introduction to methodology
Practical computation in higher dimensions
Separated form for operators
Analytic derivatives
Initial results
Accuracy, timing and scaling
MP2
Path to basis set limit results?

21
Objectives

Complete elimination of the basis error
One-electron models (e.g., HF, DFT)
Pair models (e.g., MP2, CCSD, )
Correct scaling of cost with system size
General approach
Readily accessible by students and researchers
Much smaller computer code than Gaussians
No two-electron integrals replaced by fast
application of integral operators
Fast algorithms with guaranteed precision

22
References

The (multi)wavelet methods in this work are
primarily based upon
Alpert, Beylkin, Grimes, Vozovoi (J. Comp. Phys.,
in press)
B. Alpert (SIAM Journal on Mathematical Analysis
24, 246-262, 1993).
Beylkin, Coifman, Rokhlin (Communications on Pure
and Applied Mathematics, 44, 141-183, 1991.)
The following are useful further reading
Daubechies, Ten lectures on wavelets
Walnut, An introduction to wavelets
Meyer, Wavelets, algorithms and applications
Burrus et al, Wavelets and Wavelet transforms

23
Linear Combination of Atomic Orbitals (LCAO)

Molecules are composed of (weakly) perturbed
atoms
Use finite set of atomic wave functions as the
basis
Hydrogen-like wave functions are exponentials
E.g., hydrogen molecule (H2)
Smooth function ofmolecular geometry
MOs cusp at nucleuswith exponential decay

24
LCAO

A fantastic success, but
Basis functions have extended support
causes great inefficiency in high accuracy
calculations
origin of non-physical density matrix
Basis set superposition error (BSSE)
incomplete basis on each center leads to
over-binding as atoms are brought together
Linear dependence problems
accurate calculations require balanced approach
to a complete basis on every atom
Must extrapolate to complete basis limit
unsatisfactory and not feasible for large systems

25
Why think multiresolution?

It is everywhere in nature/chemistry/physics
Core/valence high/low frequency short/long
range smooth/non-smooth atomic/nano/micro/macro
scale
Common to separate just two scales
E.g., core orbital heavily contracted, valence
flexible
More efficient, compact, and numerically stable
Multiresolution
Recursively separate all length/time scales
Computationally efficient and numerically stable
Coarse-scale models that capture fine-scale detail

26
How to think multiresolution

Consider a ladder of function spaces
E.g., increasing quality atomic basis sets, or
finer resolution grids,
Telescoping series
Instead of using the most accurate
representation, use the difference between
successive approximations
Representation on V0 small/dense differences
sparse
Computationally efficient possible insights

27
Scaling Function Basis

Divide domain into 2n pieces (level n)
Adaptive sub-division (local refinement)
lth sub-interval l2-n,(l1)2-n l0,,n-1
In each sub-interval define a polynomial basis
First k Legendre polynomials
Orthonormal, disjoint support

28
Scaling Function Basis - II
i1
i0
i3
i2
29
Multiwavelet Basis

Space of polynomials on level n is Vn
Wavelets - an orthonormal basis to span
Currently use Alperts basis
Vanishing moments
Critically important property
Since Wn is orthogonal to Vn the first k moments
of functions in Wn vanish, i.e.,
Sparse representations of many physically
important kernels

30
Some Consequences of Vanishing Moments

Compact representation of smooth functions
Consider Taylor series the first k terms vanish
and smooth implies higher order terms are small
Compact representation of integral operators
E.g., 1/r-s
Consider double Taylor series or multipole
expansion
Interaction between wavelets decays as r-2k-1
Derivatives at origin vanish in Fourier space
Diminishes effect of singularities at that point

Slice thru grid used to represent the nuclear
potential for H2 using k7 to a precision of
10-5.
Automatically adapts it does not know a priori
where the nuclei are.
Nuclei at dyadic points on level 5 refinement
stops at level 8
If were at non-dyadic points refinement
continues (to level ??) but the precision is
still guaranteed.
In future will unevenly subdivide boxes to force
nuclei to dyadic points.

32
(No Transcript)
33
Integral Formulation

E.g., used by Kalos, 1962

34
Integral operators in 3D

Non-standard matrix elements easy to evaluate
from compressed form of kernel K(x)
Application in 1-d is fairly efficient
O(Nboxk2) operations
In 3-d seems to need O(Nboxk6) operations
Prohibitively expensive
Separated form
Beylkin, Cramer, Mohlenkamp, Monzon
O(Nboxk4) or better in 3D

35
Low Separation Rank Representation

Many functions/operators have short expansions
Different from low operator rank
E.g., identity has full operator rank, but unit
separation rank.

36
Separated form for integral operators

Approach in current prototype code
Represent the kernel over a finite range as a sum
of Gaussians
Only need compute 1D transition matrices (X,Y,Z)
SVD the 1-D operators (low rank away from
singularity)
Apply most efficient choice of low/full rank 1-D
operator
Even better algorithms not yet implemented

37
Accurate Quadratures

Trapezoidal quadrature
Geometric precision for periodic functions with
sufficient smoothness.

The kernel for x1e-4,1e-3,1e-2,1e-,1e0. The
curve for x1e-4 is the rightmost
38
Automatically generated representations
of exp(-30r)/r accurate to 1e-10, 1e-8, 1e-6,
1e-4, and 1e-2 (measured by the weighted error
r(exp(-30r)/r - fit(r))) for r in 1e-8,1 were
formed with 92, 74, 57, 39 and 21 terms,
respectively. Note logarithmic dependence
upon precision.
39
Smoothed Nuclear Potential

u(r/c)/c shifts error to rltc
e0.00435Z5c3
ltVgt accurate
ltTgt main source of error

40
Translational Invariance

Dyadic
10-3 -75.9139
10-5 -75.913564
10-7 -75.91355634

Non-dyadic
-75.9139
-75.913564
-75.91355635

Uncontracted aug-cc-pVQZ 75.913002
Solving with e1e-3, 1e-5, 1e-7 (k7,9,11)
Demonstrates translation invariance and that
forcing to dyadic points is only an optimization
and does not change the obtained precision.
Average orbital sizes 1.6Mb, 8Mb, 56Mb

41
Analytic Derivatives

Hellman-Feynman theorem applies

42
N2 Hartree-Fock R2.0 a.u.

Basis Grad.Err. EnergyErr.
cc-pVDZ 5e-2 4e-2
aug-cc-pVDZ 5e-2 4e-2
cc-pVTZ 7e-3 1e-2
aug-cc-pVTZ 6e-3 9e-3
cc-pVQZ 8e-4 2e-3
aug-cc-pVQZ 9e-4 2e-3
cc-pV5Z 1e-4 4e-4
aug-cc-pV5Z 2e-5 2e-4
k5 6e-3 1e-2
k7 4e-5 2e-5
k9 3e-7 -2e-7
k11 0.0 0.0
0.026839623 -108.9964232

43
Sources of error in the gradient

Partially converged orbitals
Same as for conventional methods
Smoothed potential
Numerical errors in the density/potential
Higher-order convergence except where the
functions are not sufficiently smooth
Inadequate refinement (clearly adequate for the
energy, but not necessarily for other properties)
Exacerbated by nuclei at non-dyadic points
Gradient measures loss of spherical symmetry
around the nucleus the large value of the
derivative potential amplifies small errors

44
Dependence on potential smoothing parameter
(c) Absolute errors ofderivatives for
diatomics with the nuclei at dyadic points. For
energy accuracyof 1e-6 H 0.039 Li 0.0062 B 0.0026
N 0.0015 O 0.0012 F 0.00099
45
Dependence on potential smoothing parameter
(c) Absolute errors ofderivatives for
diatomics with the nuclei at non-dyadic
points. For energy accuracyof
1e-6 H 0.039 Li 0.0062 B 0.0026 N 0.0015 O 0.0012
F 0.00099
46
Comparison with NUMOL and aug-cc-pVTZ

H2, Li2, LiH, CO, N2, Be2, HF, BH, F2, P2, BH3,
CH2, CH4, C2H2, C2H4, C2H6, NH3, H2O, CO2, H2CO,
SiH4, SiO, PH3, HCP
NUMOL, Dickson Becke JCP 99 (1993) 3898
Dyadic points (0.001a.u.) Newton correction
Agrees with NUMOL to available precision
LDA (k7,0.002 k9, 0.0006)
k9 vs. aug-cc-pVTZ rms error
Hartree-Fock 0.004 a.u. (0.019 SiO)
LDA 0.003 a.u. (0.018 SiO)

47
High-precision Hartree-Fock geometry for water

Pahl and Handy Mol. Phys. 100 (2002) 3199
Plane waves polynomials for the core
Finite box (L18) requires extrapolation
Estimated error 3mH, 1e-5 Angstrom
k11, conv.tol1e-8,e1e-9, L40
Max. gradient 3e-8, RMS step5e-8
Difference to Pahl 10mH, 4e-6 Angstrom, 0.0012
Basis OH HOH Energy
k11 0.939594 106.3375 -76.06818006
Pahl 0.939598 106.3387 -76.068170
cc-pVQZ 0.93980 106.329 -76.066676

48
Energy Timing

Water LDA with energy error of 1e-5
Initial prototype code with lots of Python
overhead
450s on 2.4 GHz Pentium IV processor
Current version (revised tensor class, integral
operators)
96s on 2.4 GHz Pentium IV processor
Predicted future performance
lt 30s with known algorithmic improvements
faster still with better representations of the
separated operators, alternative basis sets,
improved iterative solution

49
Asymptotic Scaling

Current implementation
Based upon canonical orbitals O(N) to O(N2)
currently dominant ( O(N3) linear algebra)
Density matrix/spectral projector
Well established O(Natomlogm(e)) to any finite
precision (Goedecker, Beylkin, )
This is not possible with conventional AO
Gaussians
Need separated representation for efficiency
Gradient
each dV/dx requires O(-log(e)log(vol.)) terms
All gradients evaluated in O(-Natomlog(e)log(vol.)
)

50
Water dimer LDAaug-cc-pVTZ geometry, kcal/mol.
51
Benzene dimer LDAaug-cc-pVDZ geometry, kcal/mol.
52
Benzene dimer timings(Sequential Pentium IV 2.4
GHz)
53
Benzene monomer, dimer and trimer

(aug-cc-pVDZ LDA geometry)
Dimer binding energy -0.96 kcal/mol.
Trimer -1.67 kcal/mol.
Single processor times for k9 energy(energy
accurate to about 1e-6).
Monomer 56 minutes
Dimer 200 (3.6x 21.84)
Trimer 457 (2.3x (3/2)2.05)

54
Also working

Takeshi Yanai
Analytic derivatives
Fast (O(N)) Hartree-Fock exchange
TDDFT within Tamm-Damcoff approximation
GGA
Abelian point group symmetry (D2h subgroups)
Thanks also to
So Hirata for guidance with TDDFT
Edo Apra for insights into DFT

55
(No Transcript)
56
Putting it all together A path to O(N) exact
MP2

HF provably O(N) to arbitrary finite precision
Based upon the density matrix
Localized orbitals also possible (Bernholc)
Need an MP2 scheme based upon density matrices

57
The Resolvent has low separation rank

Already known Almlöf Laplace factorization

58
Density matrix form of MP2
59
Summary

Multiresolution provides a general framework for
computational chemistry
Accurate and efficient with a very small code
Multiwavelets provide high-order convergence and
accommodate singularities
Familiar orthonormal basis (Legendre polynomials)
Compression and reconstruction (c.f., FFT)
Fast integral operators (c.f., FMM)
Separated form for operators and functions
Critical for efficient computation in higher
dimension
Expect speed competitive to Gaussians in near
future
Optimal separated forms for kernels, multi-scale
non-linear solver, better implementation
Real impact will be application to many-body
models