Title: CPE 431531 Chapter 9 Multiprocessors and Clusters
1CPE 431/531Chapter 9 Multiprocessors and
Clusters
- Swathi T. Gurumani
- Modified From Slides of
- Dr. Rhonda Kay Gaede
- UAH
29.1 Introduction - Motivation
- Multiprocessor Parallel processors with a single
shared memory - Parallel Processing Program A single program
that runs on multiple processors simultaneously - Why multiprocessors?
- Effective than building a high-performance
uni-processor with more advanced technology - Many scientific applications are too demanding to
make progress with a single processor weather
prediction, protein folding
39.1 Introduction - Design Questions
- How do parallel processors share data?
- Shared memory processors
- Message passing for communication
- Shared memory processors
- Shared Memory Memory for a parallel processor
with a single address space, implying implicit
communication with loads and stores - Synchronization Process of coordinating the
behavior of two or more processes, which may be
running on different processors - Locks a synchronization device which allows
access to data to only one processor at a time - UMA and NUMA
49.1 Introduction - Design Questions
- Symmetric multiprocessor(SMP)-Uniform Memory
Access(UMA) accesses to main memory take the
same amount of time no matter which processors
requests the access and no matter which word is
accessed. - Nonuniform memory access Single address space
multiprocessor in which some memory accesses are
faster than others depending on which processor
asks for which word - NUMA machines are scalable and hence potentially
provide high performance
59.1 Introduction - Design Questions
- Message Passing communication between multiple
processors by explicitly sending and receiving
information - Needed in machines with private memories
- Cluster Set of computers connected over a LAN
that function as a single large multiprocessor - Send message routine Used by a processor with
private memories to pass to another processor - Receive message routine Used by a processor to
accept the message another processor
69.1 Introduction - Design Questions
- Apart from two main communication styles,
multiprocessors are connected in two basic
organizations - Connect by Single bus
- Connect by network
- Number of processors in the multiprocessor has a
lot to do with this choice
79.2 Programming Multiprocessors
- Its difficult to rewrite programs to run on
multiprocessors, therefore its not done often. - Furthermore, many applications dont require many
processors. - Problems
- Must get good performance and efficiency
- Communication overhead
- Programmer should have a good knowledge about
hardware
89.2 Programming Multiprocessors -Speedup Challenge
- Suppose you want to achieve linear speedup with
100 processors. What fraction of the original
computation can be sequential?
99.2 Programming Multiprocessors -Speedup
Challenge Bigger Problem
- Suppose you want to perform two sums one is a
sum of two scalar variables and one is a matrix
sum of a pair of two-dimensional arrays, size
1000 by 1000. What speedup do you get with 1000
processors?
109.3 Multiprocessors Connectedby a Single Bus
- Each processor is
- smaller than a multichip processor, more
processors can be placed on a bus - Caches can lower bus traffic
- Mechanisms for cache cohereency
119.3 Multiprocessors Connected by a Single Bus -
Parallel Program
- Suppose we want to sum 100,000 numbers on a
single-bus multiprocessor computer. Lets assume
we have 100 processors. - sumPn 0
- for (i 1000Pn i lt 1000(Pn1) i i 1)
- sumPn sumPn Ai / sum the assigned
areas / - half 100 / 100 processors in 1-bus
multiprocessor / - repeat
- synch() / wait for partial sum completion /
- if (half2 ! 0 Pn 0)
- sum0 sum0 sumhalf-1
- / Conditional sum needed when half is odd
- Processor0 gets missing element /
- half half/2 / dividing line on who sums /
- if (Pn lt half) sumPn sumPn sumPn
half - until (half 1) / exit with final sum in
sum0 /
129.3 Multiprocessors Connected by a Single Bus -
Multiprocessor Cache Coherence
Cache coherence Consistency in the value of
data between the versions in the caches of
several processors. Snooping Maintaining cache
coherency such that all cache controllers monitor
or snoop on the bus to determine whether or not
they have a copy of the desired block.
- Snooping
- Write-invalidateInvalidate all other shared
copies - Write-updateUpdate shared copies with the value
being - written
139.3 Multiprocessors Connected by a Single Bus
Cache Coherency Protocol
- Transitions in the state of a cache block happen
on read misses, write misses or write hits read
hits do not change cache state.