Title: Parallel Programming on the
1Parallel Programming on the SGI Origin2000
Taub Computer Center Technion
Anne Weill-Zrahia
With thanks to Moshe Goldberg, TCC and Igor
Zacharov SGI
Mar 2005
2Parallel Programming on the SGI Origin2000
- Parallelization Concepts
- SGI Computer Design
- Efficient Scalar Design
- Parallel Programming -OpenMP
- Parallel Programming- MPI
3Academic Press 2001
ISBN 1-55860-671-8
4 5Introduction to Parallel Computing
- Parallel computer A set of processors that
work cooperatively to solve a computational
problem. - Distributed computing a number of processors
communicating over a network - Metacomputing Use of several parallel computers
6Parallel classification
- Parallel architectures
-
- Shared Memory /
- Distributed Memory
- Programming paradigms
- Data parallel /
- Message passing
7Why parallel computing
- Single processor performance limited by physics
- Multiple processors break down problem into
simple tasks or domains - Plus obtain same results as in sequential
program, faster. - Minus need to rewrite code
8Three HPC Architectures
Shared memory
Cluster
Vector Processor
9Shared Memory
- Each processor can access any part of the memory
- Access times are uniform (in principle)
- Easier to program (no explicit message passing)
- Bottleneck when several tasks access same
location
10Symmetric Multiple Processors
Memory
Memory Bus
CPU
CPU
CPU
CPU
Examples SGI Power Challenge, Cray J90/T90
11Data-parallel programming
- Single program defining operations
- Single memory
- Loosely synchronous (completion of loop)
- Parallel operations on array elements
12Distributed Parallel Computing
Memory
Memory
Memory
Memory
CPU
CPU
CPU
CPU
Examples SP2, Beowulf clusters
13Message Passing Programming
- Separate program on each processor
- Local Memory
- Control over distribution and transfer of data
- Additional complexity of debugging due to
communications
14Distributed Memory
- Processor can only access local memory
- Access times depend on location
- Processors must communicate via explicit message
passing
15Message Passing or Shared Memory?
Message Passing
Shared Memory
Takes longer to implement More details to worry
about Increases source lines Complex to debug
and time Increase in total memory
used Scalability limited by - communications
overhead - process synchronization Parallelism
is visible
Easier to implement System handles many
details Little increase in source Easier to
debug and time Efficient memory use Scalability
limited by - serial portion of code -
process synchronization Compiler based
parallelism
16Performance issues
- Concurrency ability to perform actions
simultaneously - Scalability performance is not impaired by
increasing number of processors - Locality high ration of local memory
accesses/remote memory accesses (or low
communication)
17Objectives of HPC in the Technion
- Maintain leading position in science/engineering
- Production sophisticated calculations
- Required high speed
- Required large memory
- Teach techniques of parallel computing
- In research projects
- As part of courses
18HPC in the Technion
SGI Origin2000 22 cpu (R10000) -- 250 MHz
Total memory -- 9 GB 32 cpu (R12000)
300 MHz Total memory - 9GB PC cluster
(linux redhat 9.0) 6 cpu (pentium II -
866MHz) Memory - 500 MB/cpu PC cluster (linux
redhat 9.0) 16 cpu (pentium III 800 MHz)
Memory 500 MB/cpu
19Origin2000 (SGI) 128 processors
20Origin2000 (SGI) 22 processors
21- PC clusters (Intel)
- 6 processors
- 16 processors
22(No Transcript)
23Data Grids forHigh Energy Physics
Image courtesy Harvey Newman, Caltech
24GRIDS Globus Toolkit
- Grid Security Infrastructure (GSI)
-
- Globus Resource Allocation Manager (GRAM)
- Monitoring and Discovery Service (MDS)
- Global Access to Secondary Storage (GASS)
25November 2004
26A Recent Example Matrix multiply
27Profile -- original
28Profile optimized code
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)