Title: Introduction to Scientific Computing on Linux Clusters
1Introduction to Scientific Computing on Linux
Clusters
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
2Outline
- Why Clusters?
- Parallelization
- example - Game of Life
- performance metrics
- Ways to Fool the Masses
- summary
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
3Why Clusters?
- Scientific computing has traditionionally been
performed on fast, specialized machines - Buzzword - Commodity Computing
- clustering cheap, off-the-shelf processors
- can achieve good performance at a low cost if the
applications scale well
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
4Clusters (2)
- 102 clusters in current Top 500 list
- http//www.top500.org/list/2001/06/
- Resonable parallel efficiency is the key
- generally use message passing, even if there are
shared-memory CPUs in each box
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
5Compilers
- Linux Fortran compilers (F90/95)
- available from many vendors, e.g., Absoft,
Compaq, Intel, Lahey, NAG, Portland Group,
Salford - g77 is free, but is restricted to Fortran 77,
relatively slow
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
6Compilers (2)
- Intel offers free unsupported Fortran compiler
for non-commercial purposes - full F95
- OpenMP
-
- http//www.intel.com/software/products/
- compilers/f60l/noncom.htm
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
7Compilers (3)
http//www.polyhedron.com/
8Compilers (4)
- Linux C/C compilers
- gcc/g seems to be the standard, usually
described as a good compiler - also available from vendors, e.g., Compaq, Intel,
Portland Group
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
9Parallelization of Scientific Codes
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
10Domain Decomposition
- Typically perform operations on arrays
- e.g., setting up and solving system of equations
- domain decomposition
- arrays are broken into chunks, and each chunk is
handled by a separate processor - processors operate simultaneously on their own
chunks of the array
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
11Other Methods
- Parallelzation also possible without domain
decomposition - less common
- e.g., process one set of inputs while reading
another set of inputs from a file
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
12Embarrassingly Parallel
- if operations are completely independent of one
another, this is called embarrassingly parallel - e.g., initializing an array
- some Monte Carlo simulations
- not usually the case
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
13Game of Life
- Early simple cellular automata
- created by John Conway
- 2-D grid of cells
- each has one of 2 states (alive or dead)
- cells are initialized with some distribution of
alive and dead states
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
14Game of Life (2)
- at each time step states are modified based on
states of adjacent cells (including diagonals) - Rules of the game
- 3 alive neighbors - alive
- 2 alive neighbors - no change
- other - dead
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
15Game of Life (3)
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
16Game of Life (4)
- Parallelize on 2 processors
- assign block of columns to each processor
- Problem - What happens at split?
17Game of Life (5)
- Each time step, pass overlap data processor to
processor
18Message Passing
- Largest bottleneck to good parallel efficiency is
usually message passing - much slower than number crunching
- set up your algorithm to minimize message passing
- minimize surface-to-volume ratio of subdomains
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
19Domain Decomp.
For this domain
To run on 2 processors, decompose like this
Not like this
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
20How to Pass Msgs.
- MPI is the recommended method
- PVM may also be used
- MPICH
- most common
- free download
- http//www-unix.mcs.anl.gov/mpi/mpich/
- others also avalable, e.g., LAM
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
21How to Pass Msgs.
- some MPI tutorials
- Boston University
- http//scv.bu.edu/Tutorials/MPI/
- NCSA
- http//pacont.ncsa.uiuc.edu8900/public/MPI/
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
22Performance
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
23Code Timing
- How well has code been parallelized?
- CPU time vs. wallclock time
- both are seen in literature
- I prefer wallclock
- only for dedicated processors
- CPU time doesnt account for load imbalance
- unix time command
- Fortran system_clock subroutine
- MPI_Wtime
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
24Parallel Speedup
- quantify how well we have parallelized our code
-
- Sn parallel speedup
- n number of processors
- T1 time on 1 processor
- Tn time on n processors
25Parallel Speedup (2)
26Parallel Efficiency
- hn parallel efficiency
- T1 time on 1 processor
- Tn time on n processors
- n number of processors
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
27Parallel Efficiency (2)
28Parallel Efficiency (3)
- What is a reasonable level of parallel
efficiency? - Depends on
- how much CPU time you have available
- when the paper is due
- can think of (1-h) as wasted CPU time
- my personal rule of thumb 60
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
29Parallel Efficiency (4)
- Superlinear speedup
- parallel efficiency gt 1.0
- sometimes quoted in the literature
- generally attributed to cache issues
- subdomains fit entirely in cache, entire domain
does not - this is very problem dependent
- be suspicious!
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
30Amdahls Law
- Always some operations which are performed
serially - want a large fraction of code to execute in
parallel
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
31Amdahls Law (2)
- Let fraction of code that executes serially be
denoted s - Let fraction of code that executes in parallel be
denoted p
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
32Amdahls Law (3)
- Noting that p (1-s)
- The parallel speedup is
Amdahls Law
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
33Amdahls Law (4)
- The parallel efficiency is
Alternate version of Amdahls Law
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
34Amdahls Law (5)
35Amdahls Law (6)
- Should we despair?
- No!
- bigger machines solve bigger problems
- smaller value of s
- if you want to run on a large number of
processors, try to minimize s
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
36Ways to Fool the Masses
- full title Twelve Ways to Fool the Masses When
Giving Performance Results on Parallel Computers - Created by David Bailey of NASA Ames in 1991
- following is selection of ways, some paraphrased
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
37Ways to Fool (2)
- Scale problem size with number of processors
- Project results linearly
- 2 proc, 1 hr. 1800 proc., 1 sec.
- Present performance of kernel, represent as
performance of application
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
38Ways to Fool (3)
- Compare with old code on obsolete system
- Quote MFLOPS based on parallel implementation,
not best serial implementation - increase no. operations rather than decreasing
time
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
39Ways to Fool (4)
- Quote parallel speedup making sure
single-processor version is slow - Mutilate the algorithm used in the parallel
implementation to match the architecture - explicit vs. implicit PDE solvers
- Measure parallel times on dedicated system,
serial times in busy environment
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
40Ways to Fool (5)
- If all else fails, show pretty pictures and
animated videos, and dont talk about performance.
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
41Summary
- Clusters are viable platforms for relatively
low-cost scientific computing - parallel considerations similar to other
platforms - MPI is a free, effective message passing API
- careful with performance timings
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002