Title: Optimizing Matrix Multiply
1The U.S. DOE Advanced CompuTational Software
(ACTS) Collection
Tony Drummond Lawrence Berkeley National
Laboratory LADrummond_at_lbl.gov
2OUTLINE
- Motivation
- Introduction to the DOE ACTS Collection
- Interfaces to the ACTS Collection
- Software Sustainability Requirements
- References
3Where are the applications?
Development of High End Computer Simulations
- Accelerator Science
- Astrophysics
- Biology
- Chemistry
- Earth Sciences
- Materials Science
- Nanoscience
- Plasma Science
-
- Commonalities
- Major advancements in Science
- Increasing demands for computational power
- Rely on available computational systems,
- languages, and software tools
4Software Development and Evolution
mintime_to_first_solution (prototype)
- Outlive Complexity
- Increasingly sophisticated models
- Model coupling
- Interdisciplinary
- Sustained Performance
- Increasingly complex algorithms
- Increasingly diverse architectures
- Increasingly demanding applications
5OUTLINE
- Motivation
- Introduction to the DOE ACTS Collection
- Interfaces to the ACTS Collection
- Software Sustainability Requirements
- References
6THE U.S. DOE ACTS COLLECTION
Goal The Advanced CompuTational Software
Collection (ACTS) makes reliable and efficient
software tools more widely used, and more
effective in solving the nations engineering and
scientific problems.
- References
- L.A. Drummond, O. Marques An Overview of the
Advanced CompuTational Software (ACTS)
Collection. ACM Transactions on Mathematical
Software Vol. 31 pp. 282-301, 2005 - http//acts.nersc.gov
7The Advanced CompuTational Software Collection
(ACTS)
- Components
- Solid Base non-commercial and open source tools
developed at DOE laboratories and universities. - Independent Tool Evaluations and Consultation
provided through acts-support_at_nersc.gov - High Level User Support problem identification,
tool and interface selection, specific tuning
parameter configurations, installation,
documentation, etc. - Training and Dissemination workshops, lectures,
active conference participation (acts.nersc.gov. - Collaborations with HPC centers, computational
sciences research centers (national and
international level), and software and computer
vendors.
8(No Transcript)
9Software Sustainability
Algorithmic Implementations
I/O
Application Data Layout
Control
Tuned and machine Dependent modules
10Software Sustainability
USER's APPLICATION CODE (Main Control)
Compilers Expert Drivers Support
AVAILABLE
AVAILABLE
Algorithmic Implementations
AVAILABLE
I/O
Application Data Layout
LIBRARIES
LIBRARIES PACKAGES
LIBRARIES PACKAGES
Tuned and machine Dependent modules
11Critical Path for HPC Software Stack
General Purpose Libraries
Hardware - Middleware - Firmware
12Critical Path for HPC Software Stack
General Purpose Libraries
Hardware - Middleware - Firmware
13ACTS Numerical Tools Functionality
14ACTS Numerical Tools Functionality
15Structure of PETSc
16Hypre Conceptual Interfaces
17Hypre Conceptual Interfaces to Solvers
List of Solvers and Preconditioners per
Conceptual Interface
18ACTS Numerical Tools Functionality
19ACTS Numerical Tools Functionality
20ACTS Numerical Tools Functionality
21ACTS Numerical Tools Functionality
22ACTS Numerical Tools Functionality
23TAO - Interface with PETSc
24OPT Interfaces
- Four major classes of problems available
- NLF0(ndim, fcn, init_fcn, constraint)
- Basic nonlinear function, no derivative
information available - NLF1(ndim, fcn, init_fcn, constraint)
- Nonlinear function, first derivative information
available - FDNLF1(ndim, fcn, init_fcn, constraint)
- Nonlinear function, first derivative information
approximated - NLF2(ndim, fcn, init_fcn, constraint)
- Nonlinear function, first and second derivative
information available
25ACTS Numerical Tools Functionality
26ACTS Numerical Tools Functionality
27ACTS Tools Functionality
28ACTS Tools Functionality
29OUTLINE
- Motivation
- Introduction to the DOE ACTS Collection
- Interfaces to the ACTS Collection
- Software Sustainability Requirements
- References
30 CALL BLACS_GET( -1, 0, ICTXT ) CALL
BLACS_GRIDINIT( ICTXT, 'Row-major', NPROW, NPCOL
) CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL,
MYROW, MYCOL ) CALL PDGESV( N, NRHS, A, IA,
JA, DESCA, IPIV, B, IB, JB, DESCB,
INFO )
Language Calls
Command lines
Problem Domain
31Tool to Tool Interoperability
One Side Interoperability
TOOL B
TOOL C
TOOL A
TOOL F
TOOL E
TOOL D
32High-level User Interfaces to the ACTS Collection
Ax b
View_field(T1)
User
PAWS
OPT
CUMULVS
TAU
Globus
Chombo
AZTEC
Hypre
Global Arrays
PETSc
ScaLAPACK
PVODE
SuperLU
TAO
Overture
33PyACTS
Vicente Galiano Miguel Hernandez University
Tony Drummond Lawrence Berkeley National
Laboratory
Violeta Migallón and José Penadés University of
Alicante
Goal Provide a didactical tool to the ACTS
collection. Provide a Python based interface to
the ACTS Collection.
- References
- L. A. Drummond, V. Galiano, O. Marques, V.
Migallon, J.Penades PyACTS A High-level
Framework for Fast Development of High
Performance Applications. Lecture Notes in
Computer Sciences, Vol. 4395, pp 417-425, 2007.
34PyACTS
PyACTS
PyScaLAPACK
PySuperLU
PyACTS Wrappers
ScaLAPACK Wrappers
SuperLU Wrappers
Python World
PyMPI
NumPy
. . .
ScaLAPACK
SuperLU
Python
35PyACTS Basic Services
- BASIC Services Creation and modification of
different data objects and parallel environment
specifications (matrices, data layouts, ctx,) - I/O Services Parallel read/write. Currently
supported ASCII and NetCDF. - Verification and Validation Predicates and
parameter type checking. - Data Conversion. Interoperable objects between
libraries.
36PyACTS Motivation
PyClimate (J. Saenz et al,Univ. Basque Country)
- Support to common tasks during the analysis of
climate variability data. - Simple IO operations
- Operations with COARDS-compliant NetCDF files
- Empirical Orthogonal Function (EOF) analysis,
- Canonical Correlation Analysis (CCA)
- Singular Value Decomposition (SVD) analysis of
coupled datasets - Some linear digital filters
- Kernel based probability-density function
estimation and - access to DCDFLIB.C library from Python.
37PyACTS Performance in PyClimate EOF calculations
Empirical Orthogonal Function (Day calc)
38PyScaLAPACK pvgesvd Performance
39PyACTS Performance
- gt from PyACTS import
- gt import PyACTS.PyPBLAS as PyPBLAS
- gt import time
- gt n500
- gt ACTS_lib1 ScaLAPACK library
- gt PyACTS.gridinit() grid initialization
- gt alphaScal2PyACTS(2,ACTS_lib) convert scalar
- to PyACTS scalar
- gt betaScal2PyACTS(3,ACTS_lib)
- gt aRand2PyACTS(n,n,ACTS_lib) generate a random
- PyACTS array
- gt bRand2PyACTS(n,n,ACTS_lib)
- gt cRand2PyACTS(n,n,ACTS_lib)
- gt cPyPBLAS.pvgemm(alpha,a,b,beta,c) call level
3 - PBLAS routine
- gt PyACTS.gridexit()
cPyPBLAS.pvgemm(alpha,a,b,beta,c)
40OUTLINE
- Motivation
- Introduction to the DOE ACTS Collection
- Interfaces to the ACTS Collection
- Software Sustainability Requirements
- References
41Problem Statement Software Sustainability
- THE GOOD
- Many successful HPC stories have induced major
advances in science and engineering - We have successful run and scale applications on
100000 processors - THE BAD
- Portability Across Platforms is Still An
Outstanding Issue - Readiness
- Performance
- Robustness and Correctness
- THE UGLY
- Multi-Core and Many Core Era is knocking at the
HPC door
42Problem Statement Software Sustainability
- THE GOOD
- Many successful HPC stories have induced major
advances in science and engineering - We have successful run and scale applications on
100000 processors - THE BAD
- Portability Across Platforms is Still An
Outstanding Issue - Readiness
- Performance
- Robustness and Correctness
- THE UGLY
- Multi-Core and Many Core Era is knocking at the
HPC door
43Problem Statement Software Sustainability
- THE GOOD
- Many successful HPC stories have induced major
advances in science and engineering - We have successful run and scale applications on
100000 processors - THE BAD
- Portability Across Platforms is Still An
Outstanding Issue - Readiness
- Performance
- Robustness and Correctness
- THE UGLY
- Multi-Core and Many Core Era is knocking at the
HPC door
44Software Quality Assurance
- Robustness
- Scalability
- Extensibility
- Interoperability
- User Friendliness
- Documentation
- Periodic test and evaluations
- (test engines and dependency graphs)
45ScaLAPACKs Software Structure
46BLAS Basic Linear Algebra Subroutines
BLAS LEVELS
- Level 1 BLAS vector-vector
-
- Level 2 BLAS matrix-vector
- Level 3 BLAS matrix-matrix
- Design Considerations
- Portability
- Performance development of blocked algorithms is
important for performance!
47ScaLAPACK Data Layouts
- 1D block and column distributions
- 1D block-cycle column and 2D block-cyclic
distribution - 2D block-cyclic distribution used in ScaLAPACK
for dense matrices
48Astrophysics Applications
Cosmic Microwave Background Analysis, BOOMERanG
collaboration, MADCAP code (Apr. 27, 2000).
- The statistics of the tiny variations in the CMB
(the faint echo of the Big Bang) allows the
determination of the fundamental parameters of
cosmology to the percent level or better. - MADCAP (Microwave Anisotropy Dataset
Computational Analysis Package) - Makes maps from observations of the CMB and then
calculates their angular power spectra. (See
http//crd.lbl.gov/borrill). - Calculations are dominated by the solution of
linear systems of the form MA-1B for dense nxn
matrices A and B scaling as O(n3) in flops.
MADCAP uses ScaLAPACK for those calculations.
49PETSc
Image Provided by PETSc Development Team, ANL)
50Basic Conjugate Gradient Algorithm
Scalars ?, ?, y ?
Vectors x, r, p ( search direction), and q
51Preconditioning Matrices
Gauss-Seidel M D-E Uses lower triangular
part of matrix A Jacobi M D Uses diagonal
of A SOR M 1/?(D- ?E), Uses lower
triangular part of A SSOR M 1/?(2- ?) (D-
?E)D-1(D- ?F) Uses the whole matrix A
52PETSc Matrix Distribution
M8,N8,m3,nk1 rstart0,rend4
proc 1
M8,N8,m3,nk2 rstart3,rend6
proc 2
M8,N8,m2,n k3 rstart6,rend8
proc 3
53Software Dependency Graph
- Software Dependency Tree
- ScaLAPACK PBLAS, LAPACK
- LAPACK BLAS
- PBLAS BLACS, MPI
- Computational Platform Dependency
- ScaLAPCK compilescompiler-list
- optionscompile-options
- Software Testing
Python-base scripts
ScaLAPACK testsdir-list
54Software Sustainability
Software Testing Engines (automatic)
Errors/Problems
No
End
yes
Fix/Report and Document
User Reported Problems
55Software Sustainability
Performance and Scalability
- Profiling and Tracing Tools TAU
- Auto-Tuning (OSKI, ATLAS like)
56Software Sustainability Requirement
57ACTS Software Sustainability Center
t8
Sustainable Software Support
t8
58Open Challenges - Multi-core
- Improve interactions between Tool-Compilers-Hardwa
re - Software Distribution and Installation
- Automatic Tuning and Profiling (TAU, IPM, etc)
- Automatic Code Generators (ATLAS-like)
- Debugging tools
- Tools and Language Interoperability
59References
- ACTS Information Center http//acts.nersc.gov
- Two Upcoming Journal Issues dedicated to ACTS
- Ninth ACTS Collection Workshop, August 19-22,
2008
IJHPCA
ACM TOMS