Title: PETSc Tutorial
1- PETSc Tutorial
- Ehtesham Hayder
- Rice University
2Tutorial Objective
- We will show how to write a parallel implicit
solver using the Portable, Extensible Toolkit for
Scientific Computations (PETSc).
3Topics we will discuss
- Introduction to PETSc
- PETSc components
- Brief survey of numerical methods
- Sample PETSc codes
- Vectors and matrices
- Linear Solvers
- Nonlinear Solvers
4PETSc philosophy
- Writing hand-parallelized (message-passing or
distributed shared memory) application codes from
scratch is extremely difficult and time consuming - PETSc can ease the development of parallel
application codes by providing a general purpose,
parallel numerical PDE library. - PETSc provides data encapsulation and context
variables. - It provides identical interface for uniprocessor
and parallel usage
5Main Routine
Matrix
Nonlinear Solver (SNES)
Vector
PETSc
Linear Solver (SLES)
DA
PC
KSP
Function Evaluation
Post-processing
Initialization
Jacobian Evaluation
6Level of abstraction
- Application-specific interface Programmer
manipulates objects associated with the
application - High-level mathematical interface Programmer
manipulates mathematical objects, such as PDEs
and boundary condition - Algorithmic and discrete mathematics interface
Programmer manipulates mathematical objects
(sparse matrices, nonlinear equations),
algorithmic objects (solvers) and discrete
geometry (grids) - Low-level computational kernels e.g., BLAS-type
operations
7PETSc numerical components
- Numerical solvers
- Newton based methods
- line search
- trust region
- other
- Time steppers
- Euler, Backward Euler, etc.
- Krylov subspace methods
- GMRES, CG, CGS, BI-CG-STAB, TFQMR, Richardson,
Chebychev, other
8PETSc numerical components
- Preconditioners
- Additive Schwarz, Block Jacobi, ILU, LU, other
- Matrices
- Compressed Sparse Row, Blocked Compressed Sparse
Row, Block diagonal, Dense, Sparse - Vectors
- Index Sets
- Indices, Block Indices, Strides, other
9PETSc objects
- Data objects
- vector
- matrices
- grid, stencils
- Context objects
- nonlinear solvers
- krylov space methods
- preconditioners
10Context objects
- Each category of routines has its own context
that contains - Local state
- Interface routines
- Option values
- Work space
11Parallelism in PETSc
- Assumption No memory directly shared among
processors. - Hide within parallel objects the details of the
communications - User orchestrates communication at a higher
abstract level than message passing - Usually SPMD programs
12Languages
13How can you get PETSc ?
- It is freely available at
- http//www.mcs.anl.gov/petsc/petsc.html
-
14Portability
- SGI/Origin
- Cray T3D/T3E
- IBM AIX and SP
- Intel Paragon
- HP Exemplar
- Sun OS, Solaris
- Window NT/95
- and many more
15History
- Effort began in 1991
- PETSc 1.0 released in 1992
- PETSc 2.0 released in 1995
16Components of PDE solvers
- Grid generation
- Initial discretization
- Timestepping
- nonlinear solution
- linear solution
- Solution reduction
- Visualization
17Grid choices
- Structured
- determine neighbor relationship purely from
logical I, J, K coordinates - simpler code, often vectorizable
- Higher order numerical methods
- Unstructured
- do not explicitly use logical I, J, K coordinates
- Flexible
- complicated data structures
- Semi-structured
- In well defined regions, determine neighbor
relationships purely from logical I, J, K
coordinates
18Solver choices
- Explicit Field variable are updated using
neighbor information (no global solves) - Implicit Most or all variables are updated in a
single global linear solve - Semi-implicit Some subset of variables are
updated with global solves
19PETSc focus
- On implicit methods, however, no direct support
for - ADI-type methods
- Spectral methods
- Particle-type methods
20Using the PETSc Solvers
- Solver classes
- linear
- nonlinear
- timetepping
- Usage concepts
- context variables
- solver options
- callback routines
- customization
21Context variables
- Context variables are the key to solver
organization - Contains complete state of an algorithm
including - parameters (e.g. convergence tolerance)
- functions that run the algorithm (e.g.,
convergence monitoring routine) - information about the current state (e.g.,
iteration number)
22Sample application context
- typedef struct
- double lidvelocity, prandtl, grashof
- int mx, my / discretization
in x, y directions / - int mc / components in
unknown vector / - Vec localX, localF / ghosted local
vector / - DA da / distributed
array data structure (unknowns) / - int rank / processor rank
/ - int size / number of
processors / - MPI_Comm comm / MPI
communicator / - int icycle, ncycles / current/total
cycles through problem / - Vec x_old / old solution
vector / - char label / labels for
components / - int print_grid / flag - 1
indicates printing grid info / - int print_vecs / flag - 1
indicates printing vectors / - int draw_contours / flag - 1
indicates drawing contours / - AppCtx
23Creating SLES context
- FORTRAN version
- call SLESCreate(MPI_COMM_WORLD,sles,ierr)
- C/C version
- ierr SLESCreate(MPI_COMM_WORLD,sles)
- Provides an identical user interface for all
linear solvers - uniprocessor and parallel
- real and complex number
24Numerical methods paradigmin PETSc
- Encapsulate the latest numerical algorithms in a
consistent, application-friendly manner - Use mathematical and algorithmic objects, not low
level programming language objects - Application code focuses on mathematics of the
global problem, not parallel programming
25Parallel fields and matrices
- Vectors
- primary vector focus field data arising in
nonlinear PDE - Matrices
- primary matrix focus linear operators arising in
nonlinear PDEs (i.e., Jacobians)
26Vectors
- VecCreateMPI(.... )
- MPI_COMM - processors that share the vector
- number of elements local to this processor
- total number of elements
- VecSetValues(... )
- number of entries to insert/add
- indices of entries
- values to add
- mode INSERT_VALUES, ADD_VALUES
- VecAssemblyBegin(Vec)
- VecAssemblyEnd(Vec)
27Matrix and vector assembly
- Processors may generate any entries in vectors
and matrices - Entries need not be generated on the processor on
which they ultimately will be stored - PETSc automatically moves data during the
assembly process if necessary
28Sparse matrices
- MatCreateMPIAIJ( )
- MPI_COMM - processors that share the matrix
- number of local rows and columns
- number of global rows and columns
- optional pre-allocation information
- MATSetValues( )
- number of rows to insert/add
- indices of rows and columns
- number of columns to insert/add
- values to add
- mode INSERT_VALUES, ADD_VALUES
- MatAssemblyBegin(Mat)
- MatAssenblyEnd(Mat)
29Parallel Matrix Distribution
- Each processor locally owns a submatrix of
contiguously numbered global rows - MatGetOwnershipRange ( MAT A, int rstart, int
rend) - rstart first locally owned row of global matrix
- rend-1 last locally owned row of global matrix
30Vec example
- program ex2f
- implicit none
- include "include/FINCLUDE/petsc.h"
- include "include/FINCLUDE/vec.h"
- Vec x
- integer N, ierr, rank, i
- Scalar one
- call PetscInitialize(PETSC_NULL_CHARACTER,
ierr) - one 1.0
- call MPI_Comm_rank(PETSC_COMM_WORLD,rank,i
err) - call VecCreateMPI(PETSC_COMM_WORLD,rank1,
- PETSC_DECIDE,x,ierr)
31- call VecGetSize(x,N,ierr)
- call VecSet(one,x,ierr)
- do 100 i0, N-rank-1
- call VecSetValues(x,1,i,one,ADD_VALUES,ie
rr) - 100 continue
- call VecAssemblyBegin(x,ierr)
- call VecAssemblyEnd(x,ierr)
- call VecView(x,VIEWER_STDOUT_WORLD,ierr)
- call VecDestroy(x,ierr)
- call PetscFinalize(ierr)
- end
32Linear solvers
- Goal Support the solution of linear system,
Ax b - particularly for sparse, parallel problems
arising within PDE-based models - User provides code to evaluate A and b
33Main Routine
PETSc
Linear Solver (SLES)
Solve Axb
PC
KSP
Initialization
Evaluation of A and b
Post-processing
34Linear solvers (SLES)
- Application code interface
- Choosing the solver
- Providing a different preconditioner matrix
- Matrix-free solvers
- Determining and monitoring convergence
35Linear solvers in PETSc 2.0
- Krylov Methods (KSP)
- Conjugate Gradient
- GMRES
- CG-Squares
- Bi-CG-Stab
- Transpose-free QMR
- Preconditioners (PC)
- Block Jacobi
- Overlapping additive schwarz
- ICC, ILU via BlockSolve95
- ILU (k), LU sequential only
36Basic linear solver (FORTRAN)
- SLES sles
- MAT A
- Vec x, b
- integer n,its,ierr
- call MatCreat(MPI_COMM_WORLD,n,n,A,ierr)
- call VecCreate(MPI_COMM_WORLD,n,x,ierr)
- call VecDuplicate(x,b,ierr)
- call SLESCreate(MPI_COMM_WORLD,sles,ierr)
- call SLESSetOperators (sles,A,A,
DIFFERENT_NONZERO_PATTERN, ierr) - call SLESSetFromOptions(sles,ierr)
- call SLESSolve(sles,b,x,its,ierr)
- call MATDestroy(A,ierr)
- call VecDestroy(b,ierr)
- call SLESDestroy(sles,ierr)
37Basic linear solver (C/C)
- SLES sles
- MAT A
- Vec x,b
- integer n,its,ierr
- MatCreat(MPI_COMM_WORLD,n,n,A,ierr)
- VecCreate(MPI_COMM_WORLD,n,x,ierr)
- VecDuplicate(x,b,ierr)
- SLESCreate(MPI_COMM_WORLD,sles,ierr)
- SLESSetOperators (sles,A,A, DIFFERENT_NONZERO_PATT
ERN, ierr) - SLESSetFromOptions(sles,ierr)
- SLESSolve(sles,b,x,its,ierr)
- MATDestroy(A,ierr)
- VecDestroy(b,ierr)
- SLESDestroy(sles,ierr)
38Customization options
- Procedural Interface
- provides a great deal of control on a
usage-by-usage basis inside a single code - Gives full flexibility inside an application
- Command line interface
- Applies same rule to all queries via database
- Enables the user to have complete control at
runtime, with no extra coding
39Setting solver options within code
- SLESGetKSP(SLES sles, KSP ksp)
- KSPSteType(KSP ksp, KSPType type)
- KSPSteTolerances(KSP ksp, double rtol, double
atol, double dtol, int maxits) - SLESGetPC(SLES sles, PC pc)
- PCSSetType(PC pc, PCType)
- PCASMSetOverlap(PC pc, int overlap)
40Setting solver options at runtime
- -ksp_type cg, gmres, bcgs, ....
- -ksp_max_it ltmax_itresgt
- -ksp_gmres_restart ltrestartgt
- -pc_type lu, ilu, jacobi, sor, ..
41Monitoring convergence
- -ksp_monitor prints preconditioned residual
norm - -ksp_xmonitor plots preconditioned residual
norm - -ksp_truemonitor prints true residual b
-Ax - user-defined monitoring routines using callbacks
42SLES basic usage
- SLESCreate() - Create SLES context
- SLESSetOperators() - Set linear operators
- SLESSetFromOptions() - set runtime options
- SLESSolve() - Run linear solver
- SLESView() - View solver options used at runtime
- SLESDestroy() - Destroy solver
43SLES preconditioner options
- Preconditioner type PCSetType()
-pc_type .... - Level of fill for ILU PCILULevels()
-pc_ilu_levels ltgt - SOR iterations PCSORSetiterations()
-pc_sor_its ltgt - SOR parameter PCSORSetOmega()
-pc_sor_omega ltgt - Additive schwarz PCASMSetType()
-pc_asm_type .... - And many more
44Krylov method options
- Set krylov method
- Set convergence tolerances
- Set monitoring
- Set GMRES restart parameter
- etc.
45SLES example
- program main
- implicit none
- include "include/FINCLUDE/petsc.h"
- include "include/FINCLUDE/vec.h"
- include "include/FINCLUDE/mat.h"
- include "include/FINCLUDE/pc.h"
- include "include/FINCLUDE/ksp.h"
- include "include/FINCLUDE/sles.h"
- include "include/FINCLUDE/sys.h"
- double precision norm
- integer i, j, II, JJ, ierr, m, n
- integer rank, size, its, Istart, Iend, flg
- Scalar v, one, neg_one
- Vec x, b, u
46- Mat A
- SLES sles
- KSP ksp
- PetscRandom rctx
- call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
- m 3
- n 3
- one 1.0
- neg_one -1.0
- call OptionsGetInt(PETSC_NULL_CHARACTER,'-m',m,flg
,ierr) - call OptionsGetInt(PETSC_NULL_CHARACTER,'-n',n,flg
,ierr) - call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
- call MPI_Comm_size(PETSC_COMM_WORLD,size,ierr)
- call MatCreate(PETSC_COMM_WORLD,mn,mn,A,ierr)
- call MatGetOwnershipRange(A,Istart,Iend,ierr)
47- do 10, IIIstart,Iend-1
- v -1.0
- i ll/n
- j ll -in
- if(i.gt.0)then
- jj ll -n
- call MatSetValues(A,1,II,1,JJ,v,ADD_VALUES,ie
rr) - endif
- if ( i.lt.m-1 ) then
- JJ II n
- call MatSetValues(A,1,II,1,JJ,v,ADD_VALUES
,ierr) - endif
48- if ( j.gt.0 ) then
- JJ II - 1
- call MatSetValues(A,1,II,1,JJ,v,ADD_VALU
ES,ierr) - endif
- if ( j.lt.n-1 ) then
- JJ II 1
- call MatSetValues(A,1,II,1,JJ,v,ADD_VALU
ES,ierr) - endif
- v 4.0
- call MatSetValues(A,1,II,1,II,v,ADD_VALUES,ierr
) - 10 continue
- call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,i
err) - call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ier
r)
49- call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DEC
IDE,mn,u,ierr) - call VecDuplicate(u,b,ierr)
- call VecDuplicate(b,x,ierr)
- call VecSet(one,u,ierr)
- call MatMult(A,u,b,ierr)
- call SLESCreate(PETSC_COMM_WORLD,sles,ierr)
- call SLESSetOperators(sles, A, A,
- DIFFERENT_NONZERO_PATTERN, ierr)
- call SLESGetKSP(sles,ksp,ierr)
- call SLESSetFromOptions(sles,ierr)
- call SLESSolve(sles,b,x,its,ierr)
- call VecAXPY(neg_one,u,x,ierr)
- call VecNorm(x,NORM_2,norm,ierr)
50- if (rank .eq. 0) then
- if (norm .gt. 1.e-12) then
- write(6,100) norm, its
- else
- write(6,110) its
- endif
- endif
- 100 format('Norm of error ',e10.4,' iterations
',i5) - 110 format('Norm of error lt 1.e-12, iterations
',i5) - call SLESDestroy(sles,ierr)
- call VecDestroy(u,ierr)
- call VecDestroy(x,ierr)
- call VecDestroy(b,ierr)
- call MatDestroy(A,ierr)
- call PetscFinalize(ierr)
- end
51Nonlinear solvers
- Goal For problem arising from PDEs, support the
general solution of F(u) 0 - user provides
- Code to evaluate F(u)
- Code to evaluate Jacobian of F(u) (optional)
- OR
- Use sparse finite difference approximation
52Main Routine
Nonlinear Solver (SNES)
Linear Solver (SLES)
PETSc
PC
KSP
Function Evaluation
Post-processing
Initialization
Jacobian Evaluation
53Nonlinear solver (SNES)
- Newton-based methods, including
- line search strategies
- trust region approaches
- pseudo-transient continuation
- matrix-free variants
- User can customize all phases of the solution
process
54Solvers based on callbacks
- User provides routine to perform actions that the
library requires. For example, - SNESSetFunction(SNES,.....)
- uservector - vector to store function values
- userfunction - name of the users function
- usercontext - pointer to private data for the
users function - Now, whenever the library needs to evaluate the
users nonlinear function, the solver may call
the application code directly with its own local
state. - usercontext serves as an application context
object. Data are handled through such opaque
objects the library never sees irrelevant
application data
55Basic nonlinear solver (FORTRAN)
- SLES snes
- MAT A
- Vec x,F
- integer n,its,ierr
- call MatCreat(MPI_COMM_WORLD,n,n,J,ierr)
- call VecCreate(MPI_COMM_WORLD,n,x,ierr)
- call VecDuplicate(x,F,ierr)
- call SNESCreate(MPI_COMM_WORLD,
SNES_NONLINEAR_EQUATIONS,snes,ierr) - call SNESSetFunction(snes,F,EvaluateFunction,PETSC
_NULL,ierr) - call SNESSetJacobian(snes,J,EvaluateJacobian,PETSC
_NULL,ierr) - call SNESSetFromOptions(sles,ierr)
- call SNESSolve(sles,b,x,its,ierr)
- call SNESDestroy(snes,ierr)
56SNES basic usage
- SNESCreate() - Create SNES context
- SNESSetFunction() - Set nonlinear function
evaluation routine and matrix - SNESSetJacobian() - Set Jacobian evaluation
routine and matrix - SNESSetFormOptions() - Set runtime options
- SNESSolve() - Run nonlinear solver
- SNESView() - View solver options used at runtime
- SNESDestroy() - Destroy solver
57Timestepping solvers
- Goal Support the time evolution of PDE systems
- U_t F(U, U_x, U_xx, t)
- User provides
- code to evaluate F(U, U_x, U_xx, t)
- Code to evaluate Jacobian of F
- OR
- Use sparse finite difference approximation
58Distributed Arrays
Star-type stencil
Ghost points
Box-type stencil
59Managing ghost points
- Managing field data layout and required ghost
values is the key to high performance of most PDE
based parallel programs - Structured grids DA objects
- Unstructured grids VecScatter objects
60Logically regular grids
- DA - Distributed arrays object containing all
information about vector distribution across the
processor - Form a DA
- Update ghostpoints
- DAGlobalToLocalBegin(DA, )
- DAGlobalToLocalEnd(DA, )
61Unstructured grid
- AO application ordering for mapping between
various global numbering schemes - IS Index set for indicating collection of nodes
- VecScatter for updating ghost points
- ISLocalToGlobalMapping for mapping from a local
(on processor) numbering of objects to a global
(across processor) numbering
62Setting up the Communication pattern
- Renumber objects so that contiguous entries are
adjacent (AO) - Determine needed neighbor values
- Generate local numbering
- Generate local and global vectors
- Create communication object (VecScatter)
63Scattering
- Local to global scatter
- VecscatterBegin( VecScatter gtol, Vec Ivec, Vec
gvec, ADD_VALUES, SCATTER_REVERSE) - VecScatterEnd( )
- Global to local scatter
- VecScatterBegin(VecScatter gtol, Vec gvec, Vec
Ivec, INSERT_VALUES, SCATTER_FORWARD) - VecScatterEnd( )
64Communication and local computations
Communication
Local Computation
Structured grids
Loop over I,J,K indices
DAGlobalToLocal()
Loop over entities
Vecscatter()
Unstructured grids
65Debugging
- Automatic generation of tracebacks
- Tools for detecting memory corruptions and leaks
- Optional user-defined error handlers
66Profiling
- Integrated monitoring of
- time
- floating-point performance
- memory usage
- communication
- via the runtime option -log_summary
67Summary
- PETSc provides an user with
- management of data layout and ghost point
communication with DAs and VecScatters - use of callbacks to set up the problem
- profiling and error handling
- evaluation of parallel functions and Jacobians
- for implicit solution of PDEs
68Acknowledgment
- This tutorial is compiled mainly from training
materials on PETSc web pages and demonstration
codes.
69Get more information from
- http//www.mcs.anl.gov/petsc/petsc.html