High Performance Computing

About This Presentation

Title:

High Performance Computing

Description:

In previous lesson, we learned some basic knowledge of high ... Jacobi, Gauss-Seidel. Not robust, not efficient. Multigrid, Krylov subspace(CG, GMRES) ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 31

Provided by: kaiw

Category:

more less

Transcript and Presenter's Notes

Title: High Performance Computing

1
High Performance Computing

Kai Wang
Department of Computer Science
University of South Dakota
http//www.usd.edu/Kai.Wang

2
Goal of the Talk

In previous lesson, we learned some basic
knowledge of high performance computing
What is high performance computing
Hardware and software
A simple parallel code
In this talk, you will get to know some research
interests in high performance computing
Algorithm design
Language

3
High Performance Computing II

Kai Wang
Department of Computer Science
University of South Dakota
http//www.usd.edu/Kai.Wang

4
Table of Contents

Research in high performance computing
Parallel algorithm of sparse approximate inverse
Parallel language design

5
Research in High Performance Computing

Hardware
Design high speed CPU, memory
Build high performance computer
Software
Theory
Mechanism
Language design
Resource management
Network connection, graph theory
Input/output
Application
Given an application, design its algorithm on
high performance computers
Tools, libraries
General purpose use, ACTs (Adanvanced
CompuTational software)

6
Research in Algorithm Design

High performance computing Parallel computing
High performance computing is on high performance
computer
High performance computing is more concerned with
performance
Code efficiency
Scalability
Therefore, we focus on parallel algorithm design

7
Example of Algorithm Design

A very popular question
Ax b

8
Example of Algorithm Design

Various scientific modeling and simulation
problems are modeled by partial differential
equations
The equations are discretized
Get
Ax b
It needs to be solved

9
Why Do High Performance Computing

The matrix A is usually Large millions of
unknowns
The problem is large-scale
People is trying to get accurate simulation and
modeling results which requires the
discretization of the PDE to be detailed enough
The solution of x costs huge amount of CPU time
and storage
Gaussian elimination
O(n2) memory cost, O(n3) computational cost
No one wants to wait months

10
No Parallelism

So we need to do
Axb
On high performance computers
The most robust algorithm is Gaussian elimination
There is no parallelism in the algorithm
It is difficult to implement on parallel computers

11
Why No Parallelism
12
Why No Parallelism
13
Why No Parallelism
14
Iterative Methods

Matrix vector products
Parallelism is not a problem now
Jacobi, Gauss-Seidel
Not robust, not efficient
Multigrid, Krylov subspace(CG, GMRES)
Not enough robust
Robustness is a problem
Preconditioning Krylov

15
Preconditioning

Ax b
is difficult to solve by iterative methods
because A is ill-conditioned
transform to MAx Mb
M is called the preconditioner
This process is called preconditioning

16
Parallel Preconditioning

At least 50 different ways to compute
M
Not all of them is suitable for high performance
computers

17
Localized Parallel Techniques

Only compute the diagonal part

18
Sparse Approximate Inverse

Try to compute an M let

19
Why Have Parallelism

The problem can be changed to
The degree of parallelism is n

20
Research in Language Design

The running of program on high performance
computer need
Hardware
High performance computer
Available number of processors
Software
Parallel environment support
Resource management
A language supporting parallel program
Parallel program written in the language

21
Message Passing Interface

MPI is the most parallel popular programming
model
More than 90 parallel program
but not the best
Given P processors
It requires the programmer to decompose the
computation into exactly P pieces
Each piece is assigned to 1 processor
mpi np p progamname

22
Why not Good

Software engineering
The problem is decomposed not according to the
nature, but the number of physical processors
Load balancing
The programmer has to control the decomposition
carefully
Additional coding
Especially for dynamic applications

23
Processor Virtualization

Get rid of the limitation of physical processors
Give the runtime system power to control
The decomposition is based on the nature of the
problem
The problem is divided into m piece
Runtime system maps these m pieces to p physical
processors

24
Why is Good

Cache performance
Each piece is smaller, easy to be fit in the
cache
Can improve the cache performance even the code
is cache optimized

25
Why is Good

Overlapping of communication and computation

26
(No Transcript)
27
Implementation

Charm
C language
Each piece of computation is represented by a
chare object
Message driven communication

28
Implementation

AMPI
Same as MPI, but support virtualization
Has the automatic load balancing support
Message passing communication
ampi np p vp m programname

29
NAMD A Production MD program

NAMD
Fully featured program
NIH-funded development
Distributed free of charge (5000 downloads so
far)
Binaries and source code
Installed at NSF centers

30
NAMD on Lemieux without PME
ATPase 327,000 atoms including water
31
Conclusion

Information and knowledge are growing
exponentially
People need increasing computer power
High performance computing will get more and more
applications and be the next generation computing
technique

Write a Comment

User Comments (0)