Introduction to Scientific Computing on Linux Clusters - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Scientific Computing on Linux Clusters

Description:

clustering cheap, off-the-shelf processors ... Domain Decomposition. Typically perform operations on arrays ... domain decomposition ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 42
Provided by: OIT278
Learn more at: http://scv.bu.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Scientific Computing on Linux Clusters


1
Introduction to Scientific Computing on Linux
Clusters
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
2
Outline
  • Why Clusters?
  • Parallelization
  • example - Game of Life
  • performance metrics
  • Ways to Fool the Masses
  • summary

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
3
Why Clusters?
  • Scientific computing has traditionionally been
    performed on fast, specialized machines
  • Buzzword - Commodity Computing
  • clustering cheap, off-the-shelf processors
  • can achieve good performance at a low cost if the
    applications scale well

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
4
Clusters (2)
  • 102 clusters in current Top 500 list
  • http//www.top500.org/list/2001/06/
  • Resonable parallel efficiency is the key
  • generally use message passing, even if there are
    shared-memory CPUs in each box

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
5
Compilers
  • Linux Fortran compilers (F90/95)
  • available from many vendors, e.g., Absoft,
    Compaq, Intel, Lahey, NAG, Portland Group,
    Salford
  • g77 is free, but is restricted to Fortran 77,
    relatively slow

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
6
Compilers (2)
  • Intel offers free unsupported Fortran compiler
    for non-commercial purposes
  • full F95
  • OpenMP
  • http//www.intel.com/software/products/
  • compilers/f60l/noncom.htm

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
7
Compilers (3)
http//www.polyhedron.com/
8
Compilers (4)
  • Linux C/C compilers
  • gcc/g seems to be the standard, usually
    described as a good compiler
  • also available from vendors, e.g., Compaq, Intel,
    Portland Group

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
9
Parallelization of Scientific Codes
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
10
Domain Decomposition
  • Typically perform operations on arrays
  • e.g., setting up and solving system of equations
  • domain decomposition
  • arrays are broken into chunks, and each chunk is
    handled by a separate processor
  • processors operate simultaneously on their own
    chunks of the array

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
11
Other Methods
  • Parallelzation also possible without domain
    decomposition
  • less common
  • e.g., process one set of inputs while reading
    another set of inputs from a file

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
12
Embarrassingly Parallel
  • if operations are completely independent of one
    another, this is called embarrassingly parallel
  • e.g., initializing an array
  • some Monte Carlo simulations
  • not usually the case

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
13
Game of Life
  • Early simple cellular automata
  • created by John Conway
  • 2-D grid of cells
  • each has one of 2 states (alive or dead)
  • cells are initialized with some distribution of
    alive and dead states

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
14
Game of Life (2)
  • at each time step states are modified based on
    states of adjacent cells (including diagonals)
  • Rules of the game
  • 3 alive neighbors - alive
  • 2 alive neighbors - no change
  • other - dead

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
15
Game of Life (3)
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
16
Game of Life (4)
  • Parallelize on 2 processors
  • assign block of columns to each processor
  • Problem - What happens at split?

17
Game of Life (5)
  • Solution - Overlap cells
  • Each time step, pass overlap data processor to
    processor

18
Message Passing
  • Largest bottleneck to good parallel efficiency is
    usually message passing
  • much slower than number crunching
  • set up your algorithm to minimize message passing
  • minimize surface-to-volume ratio of subdomains

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
19
Domain Decomp.
For this domain
To run on 2 processors, decompose like this
Not like this
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
20
How to Pass Msgs.
  • MPI is the recommended method
  • PVM may also be used
  • MPICH
  • most common
  • free download
  • http//www-unix.mcs.anl.gov/mpi/mpich/
  • others also avalable, e.g., LAM

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
21
How to Pass Msgs.
  • some MPI tutorials
  • Boston University
  • http//scv.bu.edu/Tutorials/MPI/
  • NCSA
  • http//pacont.ncsa.uiuc.edu8900/public/MPI/

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
22
Performance
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
23
Code Timing
  • How well has code been parallelized?
  • CPU time vs. wallclock time
  • both are seen in literature
  • I prefer wallclock
  • only for dedicated processors
  • CPU time doesnt account for load imbalance
  • unix time command
  • Fortran system_clock subroutine
  • MPI_Wtime

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
24
Parallel Speedup
  • quantify how well we have parallelized our code
  • Sn parallel speedup
  • n number of processors
  • T1 time on 1 processor
  • Tn time on n processors

25
Parallel Speedup (2)
26
Parallel Efficiency
  • hn parallel efficiency
  • T1 time on 1 processor
  • Tn time on n processors
  • n number of processors

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
27
Parallel Efficiency (2)
28
Parallel Efficiency (3)
  • What is a reasonable level of parallel
    efficiency?
  • Depends on
  • how much CPU time you have available
  • when the paper is due
  • can think of (1-h) as wasted CPU time
  • my personal rule of thumb 60

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
29
Parallel Efficiency (4)
  • Superlinear speedup
  • parallel efficiency gt 1.0
  • sometimes quoted in the literature
  • generally attributed to cache issues
  • subdomains fit entirely in cache, entire domain
    does not
  • this is very problem dependent
  • be suspicious!

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
30
Amdahls Law
  • Always some operations which are performed
    serially
  • want a large fraction of code to execute in
    parallel

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
31
Amdahls Law (2)
  • Let fraction of code that executes serially be
    denoted s
  • Let fraction of code that executes in parallel be
    denoted p

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
32
Amdahls Law (3)
  • Noting that p (1-s)
  • The parallel speedup is

Amdahls Law
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
33
Amdahls Law (4)
  • The parallel efficiency is

Alternate version of Amdahls Law
Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
34
Amdahls Law (5)
35
Amdahls Law (6)
  • Should we despair?
  • No!
  • bigger machines solve bigger problems
  • smaller value of s
  • if you want to run on a large number of
    processors, try to minimize s

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
36
Ways to Fool the Masses
  • full title Twelve Ways to Fool the Masses When
    Giving Performance Results on Parallel Computers
  • Created by David Bailey of NASA Ames in 1991
  • following is selection of ways, some paraphrased

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
37
Ways to Fool (2)
  • Scale problem size with number of processors
  • Project results linearly
  • 2 proc, 1 hr. 1800 proc., 1 sec.
  • Present performance of kernel, represent as
    performance of application

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
38
Ways to Fool (3)
  • Compare with old code on obsolete system
  • Quote MFLOPS based on parallel implementation,
    not best serial implementation
  • increase no. operations rather than decreasing
    time

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
39
Ways to Fool (4)
  • Quote parallel speedup making sure
    single-processor version is slow
  • Mutilate the algorithm used in the parallel
    implementation to match the architecture
  • explicit vs. implicit PDE solvers
  • Measure parallel times on dedicated system,
    serial times in busy environment

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
40
Ways to Fool (5)
  • If all else fails, show pretty pictures and
    animated videos, and dont talk about performance.

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
41
Summary
  • Clusters are viable platforms for relatively
    low-cost scientific computing
  • parallel considerations similar to other
    platforms
  • MPI is a free, effective message passing API
  • careful with performance timings

Doug Sondak Linux Clusters and Tiled Display
Walls July 30 August 1, 2002
Write a Comment
User Comments (0)
About PowerShow.com