Parallel Computing Overview - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Parallel Computing Overview

Description:

P-1) recv(me 1, bLoc(NdivP 1, 1:n)) if (me .eq. 0) then ibeg = 2 else ibeg = 1 endif ... do i = ibeg, iend. do j = 2, n-1. bLoc(i,j) = aLoc(i,j) end do. end do ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 19
Provided by: asimk
Category:

less

Transcript and Presenter's Notes

Title: Parallel Computing Overview


1
Parallel Computing Overview
  • CS 524 High-Performance Computing

2
Parallel Computing
  • Multiple processors that are able to work
    cooperatively to solve a computational problem
  • Example of parallel computing include specially
    designed parallel computers and algorithms to
    geographically distributed network of
    workstations cooperating on a task
  • There are problems that cannot be solved by
    present-day serial computers or they take an
    impractically long time to solve
  • Parallel computing exploits concurrency and
    parallelism inherent in the problem domain
  • Task parallelism
  • Data parallelism

3
Development Trends
  • Advances in IC technology and processor design
  • CPU performance double every 18 months for past
    20 years (Moores Law)
  • Clock rates increase from 4.77 MHz for 8088
    (1979) to 3.6 GHz for Pentium 4 (2004)
  • FLOPS increase from a handful (1945) to 35.86
    TFLOPS (Earth Simulator by NEC, 2002 to date)
  • Decrease in cost and size
  • Advances in computer networking
  • Bandwidth increase from a few bits per second to
    gt 10 Gb/s
  • Decrease in size and cost, and increase in
    reliability
  • Need
  • Solution of larger and more complex problems

4
Issues in Parallel Computing
  • Parallel architectures
  • Design of bottleneck-free hardware components
  • Parallel programming models
  • Parallel view of problem domain for effective
    partitioning and distribution of work among
    processors
  • Parallel algorithms
  • Efficient algorithms that take advantage of
    parallel architectures
  • Parallel programming environments
  • Programming languages, compilers, portable
    libraries, development tools, etc

5
Two Key Algorithm Design Issues
  • Load balancing
  • Execution time of parallel programs is the time
    elapsed from start of processing by the first
    processor to end of processing by the last
    processor
  • Partitioning of computational load among
    processors
  • Communication overhead
  • Processors are much faster than communication
    links
  • Partitioning of data among processors

6
Parallel MVM Row-Block Partition
  • do i 1, N
  • do j 1, N
  • y(i) y(i)A(i,j)x(j)
  • end do
  • end do

P0
P1
P2
P3
x
j
P0
P1
i
P2
P3
y
A
7
Parallel MVM Column-Block Partition
  • do j 1, N
  • do i 1, N
  • y(i) y(i)A(i,j)x(j)
  • end do
  • end do

P0
P1
P2
P3
x
j
P0
P1
i
P2
P3
y
A
8
Parallel MVM Block Partition
  • Can we do any better?
  • Assume same distribution of x and y
  • Can A be partitioned to reduce communication?

P0
P1
P2
P3
x
j
P0
P0
P1
P1
i
P2
P2
P3
P3
y
A
9
Parallel Architecture Models
  • Bus-based shared memory or symmetric
    multiprocessor SMP (e.g. suraj, dual/quad
    processor Xeon machines)
  • Network-based distributed-memory (e.g. Cray T3E,
    our linux cluster)
  • Network-based distributed-shared-memory (e.g. SGI
    Origin 2000)
  • Network-based distributed shared-memory (e.g. SMP
    clusters)

10
Bus-Based Shared-Memory (SMP)
Shared memory
Bus
  • Any processor can access any memory location at
    equal cost (symmetric multiprocessor)
  • Tasks communicate by writing/reading commonly
    accessible locations
  • Easier to program
  • Cannot scale beyond 30 processors (bus
    bottleneck)
  • Examples most workstation vendors make SMPs
    (Sun, IBM, Intel-based SMPs), Cray T90, SV1 (uses
    cross-bar)

11
Network-Connected Distributed-Memory
M
M
M
M
Interconnection network
  • Each processor can only access own memory
  • Explicit communication by sending and receiving
    messages
  • More tedious to program
  • Can scale to thousand of processors
  • Examples Cray T3E, clusters

12
Network-Connected Distributed-Shared-Memory
M
M
M
M
  • Each processor can directly access any memory
    location
  • Physically distributed memory
  • Non-uniform memory access costs
  • Example SGI Origin 2000

13
Network-Connected Distributed Shared-Memory
M
M
Bus
Bus
Interconnection network
  • Network of SMPs
  • Each SMP can only access own memory
  • Explicit communication between SMPs
  • Can take advantage of both shared-memory and
    distributed-memory programming models
  • Can scale to hundreds of processors
  • Examples SMP clusters

14
Parallel Programming Models
  • Global-address (or shared-address) space model
  • POSIX threads (PThreads)
  • OpenMP
  • Message passing (or distributed-address) model
  • MPI (message passing interface)
  • PVM (parallel virtual machine)
  • Higher level programming environments
  • High-Performance Fortran (HPF)
  • PETSc (portable extensible toolkit for scientific
    computation)
  • POOMA (parallel object-oriented methods and
    applications)

15
Other Parallel Programming Models
  • Task and channel
  • Similar to message passing
  • Instead of communicating between named tasks (as
    in message passing model), it communicates
    through named channels
  • SPMD (single program multiple data)
  • Each processor executes the same program code
    that operates on different data
  • Most message passing programs are SPMD
  • Data parallel
  • Operations on chunks of data (e.g. arrays) are
    parallelized
  • Grid
  • Problem domain viewed in parcels with processing
    for parcel(s) allocated to different processors

16
Example
  • real a(n,n), b(n,n)
  • do k 1, NumIter
  • do i 2, n-1
  • do j 2, n-1
  • a(i,j) (b(i-1,j) b(i,j-1
  • b(i1,j) b(i,j1))/4
  • end do
  • end do
  • do i 2, n-1
  • do j 2, n-1
  • b(i,j) a(i,j)
  • end do
  • end do
  • end do

17
Shared-Address Space Model OpenMP
  • real a(n,n), b(n,n)
  • comp parallel shared(a,b,k) private(i,j)
  • do k 1, NumIter
  • comp do
  • do i 2, n-1
  • do j 2, n-1
  • a(i,j) (b(i-1,j) b(i,j-1)
  • b(i1,j) b(i,j1))/4
  • end do
  • end do
  • comp do
  • do i 2, n-1
  • do j 2, n-1
  • b(i,j) a(i,j)
  • end do
  • end do
  • end do

18
Message Passing Pseudo-code
  • real aLoc(NdivP,n), bLoc(0NdivP1,n)
  • me get_my_procnum()
  • do k 1, NumIter
  • if (me .ne. P-1) send(me1, bLoc(NdivP, 1n))
  • if (me .ne. 0) recv(me-1, bLoc(0, 1n))
  • if (me .ne. 0) send(me-1, bLoc(1, 1n))
  • if (me .ne. P-1) recv(me1, bLoc(NdivP1, 1n))
  • if (me .eq. 0) then ibeg 2 else ibeg 1
    endif
  • if (me .eq. P-1) then iend NdivP-1 else iend
    NdivP endif
  • do i ibeg, iend
  • do j 2, n-1
  • aLoc(i,j) (bLoc(i-1,j) bLoc(i,j-1)
  • bLoc(i1,j) bLoc(i,j1))/4
  • end do
  • end do
  • do i ibeg, iend
  • do j 2, n-1
  • bLoc(i,j) aLoc(i,j)
  • end do
Write a Comment
User Comments (0)
About PowerShow.com