Basic of Parallel Programming and MPI Introduction - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Basic of Parallel Programming and MPI Introduction

Description:

... mode: multiple process are running simultaneously in an interleaved fashion ... pgfile lists host processors in order of their use (put in in home directory) ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 53
Provided by: ajtir
Category:

less

Transcript and Presenter's Notes

Title: Basic of Parallel Programming and MPI Introduction


1
Lecture 3
  • Basic of Parallel Programming and MPI Introduction

2
Parallel Programming
  • The activity of constructing a parallel program
    from a given algorithm.
  • Approaches
  • Library Subroutine - P4, PVM, MPI
  • New Constructs to handle parallelism Fortran90
  • Compiler Directives or formatted comments are
    added to the language to mark the parallel block
    - HPF

3
What is a process?
  • A process is an entity with 4 components P
    (P,C,D,S)
  • P is a program or a piece of code that a process
    is associated with.
  • C is a control state, defined by a set of control
    variables that indicate where to execute next.
  • D is a data state, defined by a set of data
    variables that hold user data
  • S is a process status ready, suspended,
    running, etc.

4
Process Status
5
Execution Mode
  • A computer system has two modes, which are system
    mode and user mode.
  • Kernel (important programs in OS) process are
    always execute in system mode.
  • Processes created by user programs are in user
    mode unless such process requests services from
    kernel (I/O function, exception handles)

6
Address Space
A new address space is crate when a new process
is created.
  • Hold the code of the processes, fixed size and
    not writable
  • Hold states and dynamic data of the process,
  • can grow and shrink
  • 3. Hold activation records of the procedure calls
    made by
  • a process, can grow and shrink
  • 4. Hold process status and context

7
Context switching
  • Context of a process is the part of a program
    that is stored in the processor register.
  • When context switch occurs, the processs current
    context will be saved and the new one will be
    loaded.
  • Actions such as keyboard interrupt, status
    changes of a process to suspended, will cause
    context switching.

8
Process Control
  • Performs by a kernel using information in process
    descriptors.
  • A kernel will check to ensure that a process only
    accesses the resource (Processor, memory, I/O) it
    suppose to.
  • A Kernel will also act as a scheduler assigning
    resource to a process.

9
Resource Sharing
  • Resource sharing, handling by the scheduler, can
    happen in several forms
  • Dedicated mode not share, occupied from start
    to finish
  • Batch mode user process once scheduled to run
    will run to completion
  • Time-sharing mode multiple process are running
    simultaneously in an interleaved fashion
  • Preemption mode a high priority process can take
    away processor from a lower priority process

10
Threads
  • Creating a new process, hundreds of thousands of
    clock cycles will be wasted because a new address
    space has to be created.
  • In some case, creating a heavy weight process is
    not suitable (When a process holds a large data
    set).
  • To reduce the overhead, a light- weight process
    or thread is considered.
  • Threads can exist sharing and address space
    within a process.
  • Creating a thread is much faster because less
    memory allocation is required.

11
Single vs. Multiple Programs
  • Single-program users can write one program that
    will be run on all nodes
  • pid my-process-id()
  • if (pid 1) A()
  • else if (pid 2) B()
  • else if (pid 3) C()
  • Multiple-program users can write 3 executable
    programs A, B and C which are loaded to three
    nodes in a shell script
  • run A on node 1
  • run B on node 2
  • run C on node 3

12
Static vs. Dynamic Parallelism
  • Static A program that the number of processes
    can be predetermined at the compile time. This
    type of parallelism has less run time overhead
    and more efficient binary code.
  • Dynamic A program that a process can be created
    during run-time and the number of processors to
    be created is unknown at the beginning. This
    type of parallelism is more flexible.

13
Fork/Join
  • Process A
  • begin
  • Fork(B)
  • X foo(z)
  • Join(B) Output(xy)
  • end
  • Process B
  • begin
  • Y boo(z)
  • end
  • A is a parent process and B is a child process.
  • When fork(B) is executed, A and B execute in
    parallel
  • Join(B) forces A to wait until B terminates
    before executing
  • the output.

14
Process Interaction
  • Communication passes data among two or more
    processes (shared variables or message passing)
  • Synchronization causes process to wait for one
    another.
  • Aggregation merges partial results computed by
    the component process of a parallel program.

15
Synchronous vs. Asynchronous Interaction
  • Suppose there are n processes execute a parallel
    program, where there is an interaction code C
  • If code C cannot be executed until all n
    processes have reached C, then the
    Interaction is called Synchronous.
  • If when a process reaches C, it can proceed to
    execute C without having to wait for others, then
    the interaction is Asynchronous.
  • Ex. Synchronous send/receive process will not
    return from send/receive function until the
    message is both sent and received.

16
Blocking vs. Nonblocking Interaction
  • Blocking if the process have to wait until a
    certain event happen.
  • Ex. Blocking sent - a process can move on only
    after the message is sent out (not necessarily
    been received).
  • Non-Blocking no wait time.
  • Ex. Non-Blocking sent - a process can move to the
    next operation as soon as it has requested to
    send (not necessarily sent out).
  • This non-blocking scheme implies that the send
    buffer can not yet be safely re-written.

17
Problems in Parallel Programming
  • Lost Update Problem
  • Temporary Update Problem
  • Incorrect Summary Problem

18
Lost Update
  • Process 1 Process 2
  • Read(A)
  • Read(A)
  • A Am Read(B)
  • Write(A) AAB
  • Write(A)

19
Temporary Update
  • When a process, P1, updates data and then fails
    for some reasons, another process, P2, accesses
    the updated data before the value is put back to
    the original state.

20
Incorrect Summary
  • A process is aggregating values on a set of data,
    while another process updating some of these
    data. Inconsistency can occur
  • Process1 Process2
  • Data A, B, C in x
  • Count the number of items in x
  • Write(new item D in x)
  • Read(x)
  • Sum(x) // (ABCD)
  • Ave sum/count

21
To solve the problems
  • Divide a parallel program into a set of
    transactions.
  • Each transaction must have the following
    properties
  • Consistency
  • Atomicity
  • Durability
  • Isolation
  • Serializability

22
Concurrency Control
  • Locking mechanism
  • Timestamp
  • Optimistic concurrency control (OCC)

23
Locking
  • Data to be accessed is locked so that no other
    process can gain access to that data.
  • When the process, which has the lock is through
    using that data, the data is unlocked.
  • A lock can be defined using three variables Lock
    (L, C, I).
  • L is a Boolean variable indicates lock/unlock
    state.
  • C is a condition on which a process may wait on.
  • I is an identifier on process which has the lock.

24
Timestamp
  • Each access to shared data is stamped with time.
  • During an attempt to access data, the current
    time will be compared with the timestamps of
    other processes on this data.
  • If the timestamp associated with the process that
    request an access is less than the timestamp of
    other processes, the access will be denied.
  • A request to write is granted if data was last
    read and written by older process.
  • A read request is granted if the data was last
    written by older process.
  • This scheme will introduce no deadlock.

25
OCC
  • All processes can concurrently access data, but
    before any modification is committed, a check is
    made to see if any concurrent access takes place.
    If so, modification is rejected, the data will
    remain in the original state.
  • OCC is based on the assumption that the
    likelihood of two processes accessing the same
    data is low.

26
Introduction to MPI
27
Message-passing Model
28
Processes
  • Number is specified at start-up time
  • Remains constant throughout execution of program
  • All execute same program
  • Each has unique ID number
  • Alternately performs computations and communicates

29
Advantages of Message-passing Model
  • Gives programmer ability to manage the memory
    hierarchy
  • Portability to many architectures
  • Provides a platform- and language- independent
    standard for message passing.
  • Simplifies debugging

30
The Message Passing Interface
  • Late 1980s vendors had unique libraries
  • 1989 Parallel Virtual Machine (PVM) developed at
    Oak Ridge National Lab
  • 1992 Work on MPI standard begun
  • 1994 Version 1.0 of MPI standard
  • 1997 Version 2.0 of MPI standard
  • Today MPI is dominant message passing library
    standard

31
Include Files
include ltmpi.hgt
  • MPI header file

include ltstdio.hgt
Standard I/O header file
32
Local Variables
int main (int argc, char argv) int i
int id / Process rank / int p / Number
of processes /
  • Include argc and argv they are needed to
    initialize MPI
  • One copy of every variable for each process
    running this program

33
Initialize MPI
MPI_Init (argc, argv)
  • First MPI function called by each process
  • Not necessarily first executable statement
  • Allows system to do any necessary setup

34
Communicators
  • Communicator opaque object that provides
    message-passing environment for processes
  • MPI_COMM_WORLD
  • Default communicator
  • Includes all processes
  • Possible to create new communicators
  • Will do this later

35
Communicator
MPI_COMM_WORLD
0
5
2
1
4
3
36
Determine Number of Processes
MPI_Comm_size (MPI_COMM_WORLD, p)
  • First argument is the communicator
  • Number of processes returned through second
    argument

37
Determine Process Rank
MPI_Comm_rank (MPI_COMM_WORLD, id)
  • First argument is the communicator
  • Process rank (in range 0, 1, , p-1) returned
    through second argument

38
Replication of Automatic Variables
39
Shutting Down MPI
MPI_Finalize()
  • Call after all other MPI library calls
  • Allows system to free up MPI resources

40
Point-to-point Communication
  • Involves a pair of processes
  • One process sends a message
  • Other process receives the message

41
Send/Receive Not Collective
42
Function MPI_Send
int MPI_Send ( void message,
int count, MPI_Datatype
datatype, int dest, int
tag, MPI_Comm comm )
43
Function MPI_Recv
int MPI_Recv ( void message,
int count, MPI_Datatype
datatype, int source, int
tag, MPI_Comm comm,
MPI_Status status )
44
Return from MPI_Send
  • Function blocks until message buffer free
  • Message buffer is free when
  • Message copied to system buffer, or
  • Message transmitted
  • Typical scenario
  • Message copied to system buffer
  • Transmission overlaps computation

45
Return from MPI_Recv
  • Function blocks until message in buffer
  • If message never arrives, function never returns

46
Deadlock
  • Deadlock process waiting for a condition that
    will never become true
  • Easy to write send/receive code that deadlocks
  • Two processes both receive before send
  • Send tag doesnt match receive tag
  • Process sends message to wrong destination process

47
Compiling MPI Programs
mpicc -O -o foo foo.c
  • mpicc script to compile and link CMPI programs
  • Flags same meaning as C compiler
  • -O ?? optimize
  • -o ltfilegt ? where to put executable

48
Running MPI Programs
  • mpirun -help
  • mpirun -np ltpgt ltexecgt ltarg1gt
  • -np ltpgt ? number of processes
  • ltexecgt ? executable
  • ltarg1gt ? command-line arguments

49
Specifying Host Processors
  • mpirun -p4pg ltpgfilegt ltexecgt ltarg1gt
  • ltpgfilegt lists host processors in order of their
    use (put in in home directory)
  • Exampe of a pgfile contents
  • olab1 0 / home/ tiranee/ myprog
  • olab2 1 / home/ tiranee/ myprog
  • olab3 2 / home/ tiranee/ myprog
  • olab1 1 / home/ tiranee/ myprog

50
Enabling Remote Logins
  • MPI needs to be able to initiate processes on
    other processors without supplying a password
  • Each processor in group must list all other
    processors in its .rhosts file e.g.,

51
Deciphering Output
  • Output order only partially reflects order of
    output events inside parallel computer
  • If process A prints two messages, first message
    will appear before second
  • If process A calls printf before process B, there
    is no guarantee process As message will appear
    before process Bs message
  • Try to put fflush() after every printf()

52
Lab 1
  • Write a Hello World program where each process in
    the group sends a Hello world message to
    process 0 and process 0 print out all messages.
Write a Comment
User Comments (0)
About PowerShow.com