Parallel computing on nanco an introductory course - PowerPoint PPT Presentation

1 / 86

About This Presentation

Title:

Parallel computing on nanco an introductory course

Description:

Parallelization Concepts. Nanco Computer Design. Orientation on Nanco. Parallel Programming -MPI ... Parallelization Concepts. Parallel Power for HPC ... – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 87

Provided by: suz86

Category:

more less

Transcript and Presenter's Notes

Title: Parallel computing on nanco an introductory course

1
Parallel computing on nanco- an introductory
course

Anne Weill Zrahia
Technion,Computer Center
July 2007

2
Parallel Programming on the Nanco

Parallelization Concepts
Nanco Computer Design
Orientation on Nanco
Parallel Programming -MPI
5) Queuing system - SGE

Parallelization Concepts

4
Parallel Power for HPC

A closely coupled, scalable set of
interconnected computer system, sharing common
hardware and software infrastructure, providing a
parallel set of resources to applications for
improved performance.

5
Resources needed for applications arising from
Nanotechnology

Large memory Tbytes
High floating point computing speed Tflops
High data throughput state of the art

6
Parallel classification

Parallel architectures
Shared Memory /
Distributed Memory
Programming paradigms
Data parallel /
Message passing

7
Shared Memory

Each processor can access any part of the memory
Access times are uniform (in principle)
Easier to program (no explicit message passing)
Bottleneck when several tasks access same
location

8
SMP architecture
P
P
P
P
Memory
9
Distributed Memory

Processor can only access local memory
Access times depend on location
Processors must communicate via explicit message
passing

10
Distributed Memory
Processor Memory
Processor Memory
Interconnection network
11
Message Passing Programming

Separate program on each processor
Local Memory
Control over distribution and transfer of data
Additional complexity of debugging due to
communications

12
Why not a cluster

Single SMP system easier to purchase/maintain
Ease of programming in SMP systems

13
Why a cluster

Scalability
Total available physical RAM
Reduced cost
But

14
Performance issues

Concurrency ability to perform actions
simultaneously
Scalability performance is not impaired by
increasing number of processors
Locality high ration of local memory
accesses/remote memory accesses (or low
communication)

15
SP2 Benchmark

Goal Checking performance of real world
applications on the SP2
Execution time (seconds)CPU time for
applications
Speedup
Execution time for 1 processor
---------------------------------
---
Execution time for p processors

16
(No Transcript)
17
2) Nanco design
18
Nanco architecture
19
Configuration
M
M
M
P
P
P
P
P
P
node2
node64
node1
Infiniband Switch
20
Configuration

64 dual-processor, dual core compute nodes, each
dual-core Opteron Rev. F
8GB RAM memory/node
2 master nodes for H/A , also Opterons
Infiniband Interconnect switch HCAs
Netapp storage

21
(No Transcript)
22
AMD Opteron processor
23
Memory bottleneck
24
AMD Hypertransport
25
(No Transcript)
26
How does this reflect on performance?

27
Performance

Access to local memory 1hop
Access to 2nd processor memory 2hops
Prefetch can be useful for predictable patterns
Multithreading can be used at node level

28
Infiniband interconnect
29
3) Orientation on nanco
30
Getting started

Security
Logging in
Shell environment
Transferring files

31
System access-security

Secure access
X-tunelling (for graphics
Can use ssh X for tunnelling

32
Working on nanco

Because of high-availability, we have 2 master
nodes (masternode1 and masternode2) as points of
entry to the cluster.
Login ssh nanco.technion.ac.il and you will be
redirected to one of the masters

33
Login Environment

Paths and environment variables have been setup
(change things with care)
TCSH is the default (can transfer to bash if you
like)
User modifiable environment variables are in
.cshrc in home directory
Home directory is in /u/courseXX

34
Compilers

Options are gcc, gcc4, suncc for C
g , sunCC for C
G77(no F90) , gfortran,sunf90 for
Fortran77/Fortran90

35
Useful commands

ssh-key a script to allow ssh to all nodes
top - to see your processes Attention you
must login to the actual machine to see your
process
ps u ltusernamegt - to see processes

36
Useful commands(cont.)

parps a script to allow see running processes
on a set of nodes . Usage
parps n1 n2 - from noden1 to noden2
parshow - a script to see where a particular
executable is running

37
Flags for compilation

sunf90 fast -xO5 -xarchamd64a myprog.f o myprog
Gcc O3 marchopteron myprog.c o myprog

38
Compilation with MPI

Most MPI implementation support C,C,Fortran77
and Fortran90 bindings.
Scripts for compilation of type mpif77,mpif90,
mpicc etc.
You can specify generic compiler options

39
4) Parallel programming with MPI
40
WHAT is MPI?

A message- passing library specification
Extended message-passing model
Not specific to implementation or computer

41
BASICS of MPI PROGRAMMING

MPI is a message-passing library
Assumes a distributed memory architecture
Includes routines for performing communication
(exchange of data and synchronization) among the
processors.

42
Message Passing

Data transfer synchronization
Synchronization the act of bringing one or more
processes to known points in their execution
Distributed memory memory split up into
segments, each may be accessed by only one
process.

43
Message Passing
May I send?
yes
Send data
44
MPI STANDARD

Standard by consensus, designed in an open forum
Introduced by the MPI FORUM in May 1994, updated
in June 1995.
MPI-2 (1998) produces extensions to the MPI
standard

45
Why use MPI ?

Standardization
Portability
Performance
Richness
Designed to enable libraries

46
Writing an MPI Program

If there is a serial version , make sure it is
debugged
If not, try to write a serial version first
When debugging in parallel , start with a few
nodes first.

47
Format of MPI routines
48
Six useful MPI functions
49
Communication routines
50
End MPI part of program
51
The simplest MPI program
52
Exercise 1 running a simple MPI program

53
Exercise 2 modifying and using send/receive

54
MPI Messages

DATA data to be sent
ENVELOPE information to route the data.

55
Description of MPI_Send (MPI_Recv)
56
Description of MPI_Send (MPI_Recv)
57

program hello
include mpif.h status(MPI_STATUS_SIZE)
character12 message call MPI_INIT(ierror) call
MPI_COMM_SIZE(MPI_COMM_WORLD, size,ierror) call
MPI_COMM_RANK(MPI_COMM_WORLD, rank,ierror) tag
100 if(rank .eq. 0) then message 'Hello,
world' do i1, size-1 call
MPI_SEND(message, 12, MPI_CHARACTER , i,
tag,MPI_COMM_WORLD,ierror)
enddo
else
call MPI_RECV(message, 12, MPI_CHARACTER,
0,tag,MPI_COMM_WORLD, status, ierror)
endif
print, 'node', rank, '', message
call MPI_FINALIZE(ierror)
end

58
int main( int argc, char argv) int tag100
int rank,size,i MPI_Status status char
message12 MPI_Init(argc,argv)
MPI_Comm_size(MPI_COMM_WORLD,size)
MPI_Comm_rank(MPI_COMM_WORLD,rank)
strcpy(message,"Hello,world")
if (rank0) for
(i1iltsizei)
MPI_Send(message,12,MPI_CHAR,i,tag,MPI_COMM_WORLD)
else
MPI_Recv(message,12,MPI_CHAR,0,tag,MPI_C
OMM_WORLD,status) printf("node d s
\n",rank,message) MPI_Finalize() return
0
59
Hellosend
60
Some useful remarks

Source MPI_ANY_SOURCE means that any source is
acceptable
Tags specified by sender and receiver must match,
or MPI_ANY_TAG any tag is acceptable
Communicator must be the same for send/receive.
Usually MPI_COMM_WORLD

61
Computing pi using MPI
62
Computing pi using MPI(2)
63
Computing pi using MPI(3)
64
Computing pi using MPI(4)
65
Broadcast

Send data on one node to all other nodes in
communicator.
MPI_Bcast(buffer, count, datatype,root,comm,ierr)

66
Broadcast
DATA
A0
A0
P0
A0
P1
A0
P2
A0
P3
67
Performance evaluation

Fortran
Real8 t1
T1 MPI_Wtime() ! Returns elapsed time
C
double t1
t1 MPI_Wtime ()

68
MPI References

The MPI Standard
www-unix.mcs.anl.gov/mpi/index.html
Parallel Programming with MPI,Peter S.
Pacheco,Morgan Kaufmann,1997
Using MPI, W. Gropp,Ewing Lusk,Anthony Skjellum,
The MIT Press,1999.

69
5) Queuing system Sun Grid Engine
70
Sun Grid Engine

Open-source batch queuing system similar to PBS
or LSF
Automatically runs jobs on less loaded nodes
Queue jobs for later execution to avoid
overloading of system

71
Queues definition

System job execution policy
Resource allocation
Resource limits
Accounting

72
SGE properties

Can schedule serial or MPI jobs
- serial jobs run in individual host queues
- parallel jobs must include a parallel
environment request

73
Working with SGE jobs

There are command for querying or modifying the
status of a job running or queued by SGE
- qsub submit a job
- qstat - query the status of a job
- qdel - deleting a job from SGE

74
Submitting a serial job