Multiprocessing - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Multiprocessing

Description:

Even various ways of increasing a single processor performance have been ... Lockstep. All Ps do the same or nothing. Possible Applications: ... – PowerPoint PPT presentation

Number of Views:887

Avg rating:3.0/5.0

Slides: 31

Provided by: papadopoul3

Category:

more less

Transcript and Presenter's Notes

Title: Multiprocessing

1
Multiprocessing

COMP381
Tutorial 13
2nd-5th Dec, 08

2
Outline

Why do we need multiprocessing?
Performance Potential Using Multiprocessing
Flynns Taxonomy of Computing
SIMD
MIMD
Shared Memory Model
Cache Coherence Problem Solution

3
The Bottleneck of Single Processor
Even various ways of increasing a single
processor performance have been introduced, the
performance of a single processor still has its
bottleneck. The example above demonstrated the
limitation of ILP .
4
We Need High Performance Computing

less time for a time-consuming operation
high number of operations in a period of time

5
Multiprocessing (Parallel Processing)

Hard or cost too much to improve the performance
of single processor
Concurrent execution of tasks (programs) using
multiple computing, memory and interconnection
resources
Provides alternative to faster clock for
performance
Using multiple processors to solve a single
problem

6
Outline

Why do we need multiprocessing?
Performance Potential Using Multiprocessing
Flynns Taxonomy of Computing
SIMD
MIMD
Shared Memory Model
Cache Coherence Problem Solution
Clusters

7
Performance Potential Using Multiprocessing

Amdahls Law
Using multiple processors to solve the same
problems as the single processor.
Pessimistic, limited speedup factor
Gustafson View
Using multiple processors to solve larger or more
complex problems
Parallel portion increases as the problem size
increases, speedup factor can be increased by
deploying more processors

8
Amdahls Law

A parallel program has a sequential part and a
parallel part, the proportion for both of them
are ? and (1-?).
Assume the total execution time for a single
processer is
T1 ?T1 (1-?)T1
The total execution time for p processors would
be
Tp ?T1 (1-?)T1 / p
Speedup(p) 1 / (? (1-?)/p) p / (? p 1 -
?)? 1 / ?

9
Amdahls Law

Amdahl's Law is pessimistic (in this case)
Let s be the serial part
Let p be the part that can be parallelized n ways
Serial SSPPPPPP
6 processors SSP
P
P
P
P
P
Speedup 8/3 2.67
T(n)
As n ? ?, T(n) ?

10
Amdahls Law
Speedup
25
20
15
1000 CPUs
16 CPUs
4 CPUs
10
5
Limited speedup factor!!!
0
10
20
30
40
50
60
70
80
90
99
Serial
11
Gustafson View

Parallel portion increases as the problem size
increases
Let s be the serial part
Let p be the part that can be parallelized n ways
Old Serial SSPPPPPP
6 processors SSPPPPPP
PPPPPP
PPPPPP
PPPPPP
PPPPPP
PPPPPP
Hypothetical Serial
SSPPPPPP PPPPPP PPPPPP PPPPPP PPPPPP PPPPPP
Speedup(6) (856)/8 4.75
Speedup(N) N(1- ?) ? Speedup'(? ) ? ?!!!

12
(No Transcript)
13
Outline

Why do we need multiprocessing?
Performance Potential Using Multiprocessing
Flynns Taxonomy of Computing
SIMD
MIMD
Shared Memory Model
Cache Coherence Problem solusion
Clusters

14
Flynns Taxonomy of Computing

SISD (Single Instruction, Single Data)
Typical uniprocessor systems that weve studied
throughout this course.
Uniprocessor systems can time share and still be
SISD.
SIMD (Single Instruction, Multiple Data)
Multiple processors simultaneously executing the
same instruction on different data.
Specialized applications (e.g., image
processing).
MIMD (Multiple Instruction, Multiple Data)
Multiple processors autonomously executing
different instructions on different data.
Keep in mind that the processors are working
together to solve a single problem.

15
Outline

Why do we need multiprocessing?
Performance Potential Using Multiprocessing
Flynns Taxonomy of Computing
SIMD
MIMD
Shared Memory Model
Cache Coherence Problem solusion
Clusters

16
Typical SIMD Systems
One control unit Lockstep All Ps do the same or
nothing
Possible Applications database, image
processing, signal processing and etc.
17
SIMD Operations on Single Processor

Sequential pixel operations take a very long time
to perform.
A 512x512 image would require 262,144 iterations
through a sequential loop with each loop
executing 10 instructions. That translates to
2,621,440 clock cycles (if each instruction is a
single cycle) plus loop overhead.

Each pixel is operated on sequentially one
after another.
18
SIMD Operations with Multiprocessing

On a SIMD system with 512x512 processors (which
is not unreasonable on SIMD machines) the same
operation would take 10 cycles.

Each processor operates on a single pixel in
parallel.
Speedup due to parallelism 2,621,440/10
262,144 512x512 (number of processor)!
512x512 image
Notice no loop overhead!
19
Outline

Why do we need multiprocessing?
Performance Potential Using Multiprocessing
Flynns Taxonomy of Computing
SIMD
MIMD
Shared Memory Model
Cache Coherence Problem Solutions
Clusters

20
MIMD Architecture
Multiple processors autonomously executing
different instructions on different data. This is
a more general form of multiprocessing, and can
be used in numerous applications
21
Outline

Why do we need multiprocessing?
Performance Potential Using Multiprocessing
Flynns Taxonomy of Computing
SIMD
MIMD
Shared Memory Model
Cache Coherence Problem Solutions

22
Programming with Shared Memory

We use share memory model for multiprocessing.

Store
23
Outline

Why do we need multiprocessing?
Performance Potential Using Multiprocessing
Flynns Taxonomy of Computing
SIMD
MIMD
Shared Memory Model
Cache Coherence Problem Solutions

24
Cache Coherence Problem
W X 17
R X
R X
X17
X42
X42
X42

Processor 3 does not see the value written by
processor 0

24
25
Write Through does not help
W X 17
R X
R X
R X
X17
X17
X42
X42
X42

Processor 3 sees 42 in cache (does not get the
correct value (17) from memory.

25
26
Outline

Why do we need multiprocessing?
Performance Potential Using Multiprocessing
Flynns Taxonomy of Computing
SIMD
MIMD
Shared Memory Model
Cache Coherence Problem Solutions

27
Solutions to Cache Coherence Problem

Shared Cache
Cache placement identical to single cache, only
one copy of any cached block which results in
limited bandwidth
Snoopy Cache-Coherence Protocols for Distributed
Cache

28
Snooping Cache Coherency
Bus snoop
Cache-memory transaction

Cache Controller snoops all transactions on
the shared bus, take action to ensure coherence
(invalidate, update, or supply value)
A transaction is a relevant transaction if it
involves a cache block currently contained in
this cache

28
29
Hardware Cache Coherence Demonstration