Title: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing
1Today's Software For Tomorrow's Hardware An
Introduction to Parallel Computing
- Rahul .S. Sampath
- May 9th 2007
2- Computational Power Today
3Floating Point Operations Per Second (FLOPS)
- Humans doing long division Milli-flops (1/1000th
of one flop) - Cray-1 supercomputer, 1976, 8m 80 MFLOPS
- Pentium II, 400 mhz 100 MFLOPS
- TYPICAL HIGH-END PC TODAY 1 GFLOPS
- Sony Playstation 3, 2006 2 TFLOPS
- IBM TRIPS, 2010 (one-chip solution, CPU only) 1
TFLOPS - IBM Blue Gene, lt 2010 (with 65,536
microprocessors) 360 TFLOPS
4Why do we need more?
- "DOS addresses only 1 MB of RAM because we cannot
imagine any application needing more." --
Microsoft, 1980. - "640k ought to be enough for anybody"--Bill
Gates, 1981. - Bottom-line Demand for computational power will
continue to increase.
5Some Computationally Intensive Applications Today
- Computer Aided Surgery
- Medical Imaging
- MD simulations
- FEM simulations with gt 1010 unknowns
- Galaxy formation and evolution
- 17 million particle Cold Dark Matter Cosmology
simulation
6- Any application, which can be scaled up
- should be treated as a computationally intensive
application.
7The Need for Parallel Computing
- Memory (RAM)
- There is a theoretical limit on the RAM that is
available on your computer. - 32 bit systems 4GB (232)
- 64 bit systems 16 exabytes (gt 16,000 TB)
- Speed
- Upgrading microprocessors cant help you anymore
? - Flops is not the bottleneck, memory is
- What we need is more registers
- Think pre-computing, higher bandwidth memory bus,
L2/L3 cache, compiler optimizations, assembly
language ? Asylum ? - Or
- Think parallel
8Hacks
- If Speed is not an issue
- Is out-of-core implementation an option?
- Parallel programs can be converted into
out-of-core implementations easily.
9 10The Key Questions
- Why?
- Memory
- Speed
- Both
- What kind of platform?
- Shared Memory
- Distributed Computing
- Typical size of the application
- Small (lt 32 processors)
- Medium ( 32 - 256 processors)
- Large (gt 256 processors)
- How much time and effort do you want to invest?
- How many times will the component be used in a
single execution of the program?
11Factors to Consider in any Parallel Algorithm
Design
- Give equal work to all processors at all times
- Load Balancing
- Give equal amount of data to all processors
- Efficient Memory Management
- Processors should work independently as much as
possible - Minimize communication, especially iterative
communication - If communication is necessary, try to do some
work in the background as well - Overlapping communication and computation
- Try to keep the sequential part of the parallel
algorithm as close to the best sequential
algorithm possible - Optimal Work Algorithm
12Difference Between Sequential and Parallel
Algorithms
- Not all data is accessible at all times
- All computations must be as localized as possible
- Cant have random access
- New dimension to the existing algorithm
division of work - Which processor does what portion of the work?
- If communication can not be avoided
- How will it be initiated?
- What type of communication?
- What are the pre-processing and post-processing
operations? - Order of operations could be very critical for
performance
13Parallel Algorithm Approaches
- Data-Parallel Approach
- Partition the data among the processors
- Each processor will execute the same set of
commands - Control-Parallel Approach
- Partition the tasks to be performed among the
processors - Each processor will execute different commands
- Hybrid Approach
- Switch between the two approaches at different
stages of the algorithm - Most parallel algorithms fall in this category
14Performance Metrics
- Speedup
- Overhead
- Scalability
- Fixed Size
- Iso-granular
- Efficiency
- Speedup per processor
- Iso-Efficiency
- Problem size as a function of p in order to keep
efficiency constant
15The Take Home Message
- A good parallel algorithm is NOT a simple
extension of the corresponding sequential
algorithm. - What model to use? Problem dependent.
- e.g. abc (ab) (cd)
- Not much choice really.
- It is a big investment, but can really be worth
it.
16 17How does a parallel program work?
- You request a certain number of processors
- You setup a communicator
- Give a unique id to each processor rank
- Every processor executes the same program
- Inside the program
- Query for the rank and use it decide what to do
- Exchange messages between different processors
using their ranks - In theory, you only need 3 functions Isend,
Irecv, wait - In practice, you can optimize communication
depending on the underlying network topolgoy
Message Passing Standards
18Message Passing Standards
- The standards define a set of primitive
communication operations. - The vendors implementing these on any machine are
responsible to optimize these operations for that
machine. - Popular Standards
- Message Passing Interface (MPI)
- Open Message Passing (OpenMP)
19Languages that support MPI
- Fortran 77
- C/C
- Python
- Matlab
20MPI Implementations
- MPICH
- ftp//info.mcs.anl.gov/pub/mpi
- LAM
- http//www.mpi.nd.edu/lam/download
- CHIMP
- ftp//ftp.epcc.ed.ac.uk/pub/chimp/release
- WinMPI (Windows)
- ftp//csftp.unomaha.edu/pub/rewini/WinMPI
- W32MPI (Windows)
- http//dsg.dei.uc.pt/wmpi/intro.html
21Open Source Parallel Software
- PETSc ( Linear and NonLinear Solvers )
- http//www-unix.mcs.anl.gov/petsc/petsc-as/
- ScaLAPACK ( Linear Algebra )
- http//www.netlib.org/scalapack/scalapack_home.htm
l - SPRNG ( Random Number Generator )
- http//sprng.cs.fsu.edu/
- Paraview ( Visualization )
- http//www.paraview.org/HTML/Index.html
- NAMD ( Molecular Dynamics )
- http//www.ks.uiuc.edu/Research/namd/
- CHARMM ( Parallel Objects )
- http//charm.cs.uiuc.edu/research/charm/
22References
- Parallel Programming with MPI, Peter S. Pacheco
- Introduction to Parallel Computing, A. Grama, A.
gupta, G. Karypis, V. Kumar - MPI-The Complete Reference, William Gropp et.al.
- http//www-unix.mcs.anl.gov/mpi/
- http//www.erc.msstate.edu/mpi
- http//www.epm.ornl.gov/walker/mpi
- http//www.erc.msstate.edu/mpi/mpi-faq.html (FAQ)
- Comp.parallel.mpi (Newsgroup)
- http//www.mpi-forum.org (MPI Forum)
23