Dynamic Interconnect - PowerPoint PPT Presentation

About This Presentation

Title:

Dynamic Interconnect

Description:

N processors to connect, Nlog(N) links. log(N) stages, each stage is ... Requires setup time and poor bandwidth, ... No pre-setup time and better bandwidth, ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 21

Provided by: zhaoyu

Learn more at: http://www.wgz.org

Category:

more less

Transcript and Presenter's Notes

Title: Dynamic Interconnect

1
Dynamic Interconnect

Lecture 5

2
Multistage Network--Omega Network

Motivation simulate crossbar network but with
fewer links
Components
N processors to connect, Nlog(N) links
log(N) stages, each stage is connected by shuffle
Each stage N/2 2x2 switch boxes

P0
P0
P1
P1
P2
P2
P3
P3
P4
P4
P5
P5
P6
P6
P7
P7
3
Omega Network -- Routing

Distributed control
Check the bit in this stage if it is 0 then
connect to upper port, otherwise connect to the
lower port
Not all permutations are possible
What if 010 connects to 110, 110 to 100, and
000 to 101

P0
P0
P1
P1
P2
P2
P3
P3
P4
P4
P5
P5
P6
P6
P7
P7
4
Comparisons between Dynamic Networks

Bus System
Assume n processors on the bus bus width is w
bits
Data transfer latency constant
Bandwidth per processor O(w/n) to O(w)
Wire complexity O(w)
Switching complexity O(n)
Routing capability only one to one at a time
Advantage
Cheap to build
Disadvantage
Low bandwidth available to each processor
Prone to failure

5
Comparisons between Dynamic Networks

Crossbar Switch
Assume n x n crossbar with line width of w bits
Data transfer latency constant
Bandwidth per processor O(w) to O(nw)
Wire complexity O(n2w)
Switching complexity O(n2)
Routing capability all permutations one at a
time
Advantage
Highest bandwidth
Highest routing capability
Disadvantage
High hardware cost

6
Comparisons between Dynamic Networks

Multistage network
Assume n x n processors to connect with line
width of w bits using 2 x 2 switch
Data transfer latency O(logn)
Bandwidth per processor O(w) to O(nw)
Wire complexity O(nwlogn)
Switching complexity O(nlogn)
Routing capability Some permutations and
broadcast
Advantage
Scalability with modular construction
Medium cost
Disadvantage
Long latency

7
Message Transfer Mechanisms

Message typically consist of
A header which contains information about the
destination
The data that needs to be transmitted
A trailer which signals the end of the message
Circuit switching strategy determines how message
data is actually transferred across network links
in the chosen message route
Three components to message transfer cost
Startup time (ts) - cost of handling message at
sending processor
Per-hop time (tp) - it is the time taken by the
header to traverse a link
Per-word transfer time (tw) - time taken for a
word to traverse a link

8
Dynamic Network -- Switching Strategy

Circuit switching
A circuit path is established from source to the
destination.
Like telephone system
Requires setup time and poor bandwidth, but has
short latency
Latency for routing a m word message with l hops
t ts tp mtw ? ts m tw

P0
P0
P1
P1
P2
P2
P3
P3
P4
P4
P5
P5
P6
P6
P7
P7
9
Dynamic Network -- Switching Strategy

Store-and-forward (packet switching)
Message travels one link a time when neighbor
link is free
Buffer the message when there is link is not free
Like postal offices
No pre-setup time and better bandwidth, but
longer latency
Only one link on the path could be active
Latency n(ts m tw)

P0
P0
Whole package buffered here
P1
P1
P2
P2
P3
P3
P4
P4
P5
P5
P6
P6
P7
P7
10
Dynamic Network -- Switching Strategy

Cut-through
Similar to Store-and-forward, but
Message will be broken into parcels
All the links on the path could be active
Also called warmhole routing
Small setup time
Latency l(ts tp) mtw ? ltp mtw

P0
P0
P1
P1
P2
P2
P3
P3
P4
P4
Parcels are buffered here
P5
P5
P6
P6
P7
P7
11
Static Network Vs Dynamic Network

Static Network
There is a point-to-point links between
processors
Parallel system expansion is easy
Some processors may be closer than others
Generally used for message passing machine
interconnects
Dynamic Network
Paths are established as needed between
processors
System expansion is difficult
Processors are usually equidistant
Usually used for shared memory machine
interconnects

12
One-to-all broadcast

Algorithms often require a processor to send
identical data to all other processors or a
subset of processors. This operation is called
one-to-all broadcast or single node broadcast
At the start of a single node broadcast, each
processor has m words of data that needs to be
sent. At the end there a p copies of this data,
one on each processor
The dual of a broadcast operation is a all-to-one
reduction or single node reduction
All-to-one reduction
At the start of a single node reduction each
processor has m words of data, the reduction
combines all the data from processors using an
associative operator to produce m words at the
receiver
Naive single node broadcast or reduction using
p-1 steps

13
One-to-all Broadcast
M
M
M
Broadcast
...
...
0
1
p-1
0
1
p-1
Reduction Accumulation
14
Store-and-forward Routing on Ring

Source send message on both outgoing links in
first two steps
All other processors receive on a link and
transmit on other link
It takes p/2 steps
Cost (ts m tw) p/2
What if we use circuit switching routing?

3
4
7
6
5
4
2
2
0
1
2
3
1
2
3
15
Store-and-forward Routing on Hypercube

Takes log(p) steps for a p processor hypercube
In the ith step, all processors that have the
message transmit it to the neighboring processor
that differs in the ith most significant bit
Cost (ts mtw)log(p)

111
011
3
3
101
001
3
3
010
2
110
2
000
100
1
16
Homework

Due next lecture
Assume there a mesh interconnect network with p
N x N nodes. Using store-and-forward for the
routing.
(a) Find the node which the highest complexity
for operation one-to-all broadcast.
(b) Describe your routing algorithm (using pseudo
code).
(b) What is the broadcast cost?

17
Cut-through Routing on Ring

Algorithm takes log(p) steps
In step i, message is sent to processor at
distant p/2i
All messages flow in the same direction
Cost log(p) (ts mtw) tp(p-1)

3
3
2
7
6
5
4
1
0
1
2
3
2
3
18
Cut-through Routing on 2D Torus

Apply ring algorithm for the processor row of
sender
Now use ring algorithm for all processor columns
2log(p) steps
Cost
(tsmtw) log(p) 2tp(?p -1)
This algorithm works for 2D mesh too

12
13
14
15
4
4
4
4
8
9
10
11
3
3
3
3
4
5
6
7
4
4
4
4
2
2
0
1
2
3
1
19
Cut-through Routing on Hypercube

Takes log(p) steps for a p processor hypercube
In the ith step, all processors that have the
message transmit it to the neighboring processor
that differs in the ith most significant bit
Cost (ts mtw)log(p)
Cut-through does not provide benifits because of
the use of only single link of communications