Title: CSL718 : Multiprocessors
1CSL718 Multiprocessors
- Interconnection Mechanisms
- Performance Models
- 20th April, 2006
2Connecting Processors and Memories
- Shared Buses
- Interconnection Networks
- Static Networks
- Dynamic Networks
3Shared Bus
each processor sees this picture
processing
bus access
prob of a processor using the bus ? prob of a
processor not using the bus 1 ? prob of none
of the n processors using the bus (1 ?)n prob
of at least one processor using the bus 1 (1
?)n achieved BW on a relative scale 1 (1
?)n required BW n ? available BW 1
4Effect of re-submitted requests
? (1-PA )
1- ? ?PA
1-PA
A
W
PA
prob qA
prob qW
5(No Transcript)
6(No Transcript)
7Waiting time
8Switched Networks
- BUS
- Shared media
- Lower Cost
- Lower throughput
- Scalability poor
- Switched Network
- Switched paths
- Higher cost
- Higher throughput
- Scalability better
9Interconnection Networks
- Topology who is connected to whom
- Direct / Indirect where is switching done
- Static / Dynamic when is switching done
- Circuit switching / packet switching how are
connections established - Store forward / worm hole routing how is the
path determined - Centralized / distributed how is switching
controlled - Synchronous/asynchronous mode of operation
10Direct and Indirect Networks
link
node
node
P M S
node
P M S
link
node
SWITCH
link
link
link
link
link
node
node
S M P
node
S M P
link
node
DIRECT
INDIRECT
11Static and Dynamic Networks
- Static Networks
- fixed point to point connections
- usually direct
- each node pair may not have a direct connection
- routing through nodes
- Dynamic Networks
- connections established as per need
- usually indirect
- path can be established between any pair of nodes
- routing through switches
12Static Network Topologies
Non-uniform connectivity
2D-Mesh
Linear
Tree
Star
13Static Networks Topologies- contd.
Uniform connectivity
Ring
Torus
Fully Connected
14Illiac IV Mesh Network
0
0
1
2
1
8
3
4
5
2
7
6
7
8
3
6
4
5
neighbors of node r (r ? 1) mod 9 and (r ? 3)
mod 9
Chordal Ring
15Fat Tree Network
16Dynamic Networks
k ? k cross -bar switch
building block for multi-stage dynamic networks
simplest cross-bar
2 ? 2 switch
straight
exchange
upper broadcast
lower broadcast
17Baseline Network
000
000
001
001
010
010
011
011
100
100
101
101
110
110
111
111
blocking can occur
18Benes Network
non-blocking
19Switching Mechanism
- Circuit Switching (connection oriented
communication) - A circuit is established between the source and
the destination - Packet Switching (connectionless communication)
- Information is divided into packets and each
packet is sent independently from node to node
20Routing in Networks
node
outgoing message
incoming message
header payload/data
store forward routing
time
worm hole routing
21Routing in presence of congestion
- Worm hole routing
- When message header is blocked, many links get
blocked with the message - Solution cut-through routing
- When message header is blocked, tail is allowed
to move, compressing the message into a single
node
22Routing Options
- Deterministic routing always same path followed
- Adaptive routing best path selected to minimize
congestion - Source based routing message specifies path to
destination - Destination based routing message specifies only
destination address
23Some Performance Parameters
overhead
Tx timebytes/BW
sender
time of flight
overhead
Tx timebytes/BW
receiver
transport latency
total latency
time
24Other Parameters
- Throughput ? Bandwidth (no credit for header)
- Bisection bandwidth BW across a bisection
- Node degree
- Network Diameter
- Cost
- Fault Tolerance
25Multidimensional Grid/Mesh
- Size
- k ? k ? . ? k (n times)
- k n
- Diameter
- (k-1) ? n without end around
- connections
- k ? n /2 with end around
- connections
k-ary n-cube
for (Binary) Hypercube k 2
26Grid/Mesh Performance - 1
kd
27Grid/Mesh Performance - 2
28Grid/Mesh Performance - 3
k-ary n-cube
29Switch Performance
k ? m cross -bar switch
30Switch Performance contd.
31Switch Performance contd.
32Effect of re-submitted requests
33Effect of buffering
- There are two possibilities
- Buffering before switching (k buffers, one at
each input port) - Buffering after switching (m buffers, one at each
output port)
34Switch with input buffers
- Rate of messages at input and output of each
queue is same in steady state - r per cycle - Service time includes delays due to conflicts,
calculated as earlier. This has an
exponential distribution recall the analysis
for a shared bus. - M/M/1 open queue model can be used to calculate
queuing delay. Details are omitted.
35Switch with output buffers
- Here we assume that all the messages destined for
same output are queued in the same buffer, in
some order. That is no rejections and no
re-submissions. - For each queue,
- Messages arriving per service cycle ?
- Prob of a request coming from one of
- the k sources p
- Apply MB/D/1 model for finding queuing delay Tw
36References
- D. Sima, T. Fountain, P. Kacsuk, "Advanced
Computer Architectures A Design Space
Approach", Addison Wesley, 1997. - K. Hwang, "Advanced Computer Architecture
Parallelism, Scalability, Programmability",
McGraw Hill, 1993.