Parallel Computing Overview - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Parallel Computing Overview

Description:

Number of Views:39

Avg rating:3.0/5.0

Slides: 19

Provided by: asimk

Category:

Tags: bloc | computing | overview | parallel

Transcript and Presenter's Notes

Title: Parallel Computing Overview

1
Parallel Computing Overview

2
Parallel Computing

Multiple processors that are able to work
cooperatively to solve a computational problem
Example of parallel computing include specially
designed parallel computers and algorithms to
geographically distributed network of
workstations cooperating on a task
There are problems that cannot be solved by
present-day serial computers or they take an
impractically long time to solve
Parallel computing exploits concurrency and
parallelism inherent in the problem domain
Task parallelism
Data parallelism

3
Development Trends

Advances in IC technology and processor design
CPU performance double every 18 months for past
20 years (Moores Law)
Clock rates increase from 4.77 MHz for 8088
(1979) to 3.6 GHz for Pentium 4 (2004)
FLOPS increase from a handful (1945) to 35.86
TFLOPS (Earth Simulator by NEC, 2002 to date)
Decrease in cost and size
Advances in computer networking
Bandwidth increase from a few bits per second to
gt 10 Gb/s
Decrease in size and cost, and increase in
reliability
Need
Solution of larger and more complex problems

4
Issues in Parallel Computing

Parallel architectures
Design of bottleneck-free hardware components
Parallel programming models
Parallel view of problem domain for effective
partitioning and distribution of work among
processors
Parallel algorithms
Efficient algorithms that take advantage of
parallel architectures
Parallel programming environments
Programming languages, compilers, portable
libraries, development tools, etc

5
Two Key Algorithm Design Issues

Load balancing
Execution time of parallel programs is the time
elapsed from start of processing by the first
processor to end of processing by the last
processor
Partitioning of computational load among
processors
Communication overhead
Processors are much faster than communication
links
Partitioning of data among processors

6
Parallel MVM Row-Block Partition

P0
P1
P2
P3
x
j
P0
P1
i
P2
P3
y
A
7
Parallel MVM Column-Block Partition

P0
P1
P2
P3
x
j
P0
P1
i
P2
P3
y
A
8
Parallel MVM Block Partition

P0
P1
P2
P3
x
j
P0
P0
P1
P1
i
P2
P2
P3
P3
y
A
9
Parallel Architecture Models

Bus-based shared memory or symmetric
multiprocessor SMP (e.g. suraj, dual/quad
processor Xeon machines)
Network-based distributed-memory (e.g. Cray T3E,
our linux cluster)
Network-based distributed-shared-memory (e.g. SGI
Origin 2000)
Network-based distributed shared-memory (e.g. SMP
clusters)

10
Bus-Based Shared-Memory (SMP)
Shared memory
Bus

Any processor can access any memory location at
equal cost (symmetric multiprocessor)
Tasks communicate by writing/reading commonly
accessible locations
Easier to program
Cannot scale beyond 30 processors (bus
bottleneck)
Examples most workstation vendors make SMPs
(Sun, IBM, Intel-based SMPs), Cray T90, SV1 (uses
cross-bar)

11
Network-Connected Distributed-Memory
M
M
M
M
Interconnection network

12
Network-Connected Distributed-Shared-Memory
M
M
M
M

13
Network-Connected Distributed Shared-Memory
M
M
Bus
Bus
Interconnection network

Network of SMPs
Each SMP can only access own memory
Explicit communication between SMPs
Can take advantage of both shared-memory and
distributed-memory programming models
Can scale to hundreds of processors
Examples SMP clusters

14
Parallel Programming Models

15
Other Parallel Programming Models

Task and channel
Similar to message passing
Instead of communicating between named tasks (as
in message passing model), it communicates
through named channels
SPMD (single program multiple data)
Each processor executes the same program code
that operates on different data
Most message passing programs are SPMD
Data parallel
Operations on chunks of data (e.g. arrays) are
parallelized
Grid
Problem domain viewed in parcels with processing
for parcel(s) allocated to different processors