Dynamic Interconnect - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Dynamic Interconnect

Description:

N processors to connect, Nlog(N) links. log(N) stages, each stage is ... Requires setup time and poor bandwidth, ... No pre-setup time and better bandwidth, ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 21
Provided by: zhaoyu
Learn more at: http://www.wgz.org
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Interconnect


1
Dynamic Interconnect
  • Lecture 5

2
Multistage Network--Omega Network
  • Motivation simulate crossbar network but with
    fewer links
  • Components
  • N processors to connect, Nlog(N) links
  • log(N) stages, each stage is connected by shuffle
  • Each stage N/2 2x2 switch boxes

P0
P0
P1
P1
P2
P2
P3
P3
P4
P4
P5
P5
P6
P6
P7
P7
3
Omega Network -- Routing
  • Distributed control
  • Check the bit in this stage if it is 0 then
    connect to upper port, otherwise connect to the
    lower port
  • Not all permutations are possible
  • What if 010 connects to 110, 110 to 100, and
    000 to 101

P0
P0
P1
P1
P2
P2
P3
P3
P4
P4
P5
P5
P6
P6
P7
P7
4
Comparisons between Dynamic Networks
  • Bus System
  • Assume n processors on the bus bus width is w
    bits
  • Data transfer latency constant
  • Bandwidth per processor O(w/n) to O(w)
  • Wire complexity O(w)
  • Switching complexity O(n)
  • Routing capability only one to one at a time
  • Advantage
  • Cheap to build
  • Disadvantage
  • Low bandwidth available to each processor
  • Prone to failure

5
Comparisons between Dynamic Networks
  • Crossbar Switch
  • Assume n x n crossbar with line width of w bits
  • Data transfer latency constant
  • Bandwidth per processor O(w) to O(nw)
  • Wire complexity O(n2w)
  • Switching complexity O(n2)
  • Routing capability all permutations one at a
    time
  • Advantage
  • Highest bandwidth
  • Highest routing capability
  • Disadvantage
  • High hardware cost

6
Comparisons between Dynamic Networks
  • Multistage network
  • Assume n x n processors to connect with line
    width of w bits using 2 x 2 switch
  • Data transfer latency O(logn)
  • Bandwidth per processor O(w) to O(nw)
  • Wire complexity O(nwlogn)
  • Switching complexity O(nlogn)
  • Routing capability Some permutations and
    broadcast
  • Advantage
  • Scalability with modular construction
  • Medium cost
  • Disadvantage
  • Long latency

7
Message Transfer Mechanisms
  • Message typically consist of
  • A header which contains information about the
    destination
  • The data that needs to be transmitted
  • A trailer which signals the end of the message
  • Circuit switching strategy determines how message
    data is actually transferred across network links
    in the chosen message route
  • Three components to message transfer cost
  • Startup time (ts) - cost of handling message at
    sending processor
  • Per-hop time (tp) - it is the time taken by the
    header to traverse a link
  • Per-word transfer time (tw) - time taken for a
    word to traverse a link

8
Dynamic Network -- Switching Strategy
  • Circuit switching
  • A circuit path is established from source to the
    destination.
  • Like telephone system
  • Requires setup time and poor bandwidth, but has
    short latency
  • Latency for routing a m word message with l hops
  • t ts tp mtw ? ts m tw

P0
P0
P1
P1
P2
P2
P3
P3
P4
P4
P5
P5
P6
P6
P7
P7
9
Dynamic Network -- Switching Strategy
  • Store-and-forward (packet switching)
  • Message travels one link a time when neighbor
    link is free
  • Buffer the message when there is link is not free
  • Like postal offices
  • No pre-setup time and better bandwidth, but
    longer latency
  • Only one link on the path could be active
  • Latency n(ts m tw)

P0
P0
Whole package buffered here
P1
P1
P2
P2
P3
P3
P4
P4
P5
P5
P6
P6
P7
P7
10
Dynamic Network -- Switching Strategy
  • Cut-through
  • Similar to Store-and-forward, but
  • Message will be broken into parcels
  • All the links on the path could be active
  • Also called warmhole routing
  • Small setup time
  • Latency l(ts tp) mtw ? ltp mtw

P0
P0
P1
P1
P2
P2
P3
P3
P4
P4
Parcels are buffered here
P5
P5
P6
P6
P7
P7
11
Static Network Vs Dynamic Network
  • Static Network
  • There is a point-to-point links between
    processors
  • Parallel system expansion is easy
  • Some processors may be closer than others
  • Generally used for message passing machine
    interconnects
  • Dynamic Network
  • Paths are established as needed between
    processors
  • System expansion is difficult
  • Processors are usually equidistant
  • Usually used for shared memory machine
    interconnects

12
One-to-all broadcast
  • Algorithms often require a processor to send
    identical data to all other processors or a
    subset of processors. This operation is called
    one-to-all broadcast or single node broadcast
  • At the start of a single node broadcast, each
    processor has m words of data that needs to be
    sent. At the end there a p copies of this data,
    one on each processor
  • The dual of a broadcast operation is a all-to-one
    reduction or single node reduction
  • All-to-one reduction
  • At the start of a single node reduction each
    processor has m words of data, the reduction
    combines all the data from processors using an
    associative operator to produce m words at the
    receiver
  • Naive single node broadcast or reduction using
    p-1 steps

13
One-to-all Broadcast
M
M
M
Broadcast
...
...
0
1
p-1
0
1
p-1
Reduction Accumulation
14
Store-and-forward Routing on Ring
  • Source send message on both outgoing links in
    first two steps
  • All other processors receive on a link and
    transmit on other link
  • It takes p/2 steps
  • Cost (ts m tw) p/2
  • What if we use circuit switching routing?

3
4
7
6
5
4
2
2
0
1
2
3
1
2
3
15
Store-and-forward Routing on Hypercube
  • Takes log(p) steps for a p processor hypercube
  • In the ith step, all processors that have the
    message transmit it to the neighboring processor
    that differs in the ith most significant bit
  • Cost (ts mtw)log(p)

111
011
3
3
101
001
3
3
010
2
110
2
000
100
1
16
Homework
  • Due next lecture
  • Assume there a mesh interconnect network with p
    N x N nodes. Using store-and-forward for the
    routing.
  • (a) Find the node which the highest complexity
    for operation one-to-all broadcast.
  • (b) Describe your routing algorithm (using pseudo
    code).
  • (b) What is the broadcast cost?

17
Cut-through Routing on Ring
  • Algorithm takes log(p) steps
  • In step i, message is sent to processor at
    distant p/2i
  • All messages flow in the same direction
  • Cost log(p) (ts mtw) tp(p-1)

3
3
2
7
6
5
4
1
0
1
2
3
2
3
18
Cut-through Routing on 2D Torus
  • Apply ring algorithm for the processor row of
    sender
  • Now use ring algorithm for all processor columns
  • 2log(p) steps
  • Cost
  • (tsmtw) log(p) 2tp(?p -1)
  • This algorithm works for 2D mesh too

12
13
14
15
4
4
4
4
8
9
10
11
3
3
3
3
4
5
6
7
4
4
4
4
2
2
0
1
2
3
1
19
Cut-through Routing on Hypercube
  • Takes log(p) steps for a p processor hypercube
  • In the ith step, all processors that have the
    message transmit it to the neighboring processor
    that differs in the ith most significant bit
  • Cost (ts mtw)log(p)
  • Cut-through does not provide benifits because of
    the use of only single link of communications

111
011
3
3
101
001
3
3
010
2
110
2
000
100
1
20
Summary
  • Switching Strategies
  • Circuit switch
  • Store-forward
  • Cut-through (wormhole)
  • One-to-all broadcasting on
  • Ring
  • Using store-forward
  • Using cut-through
  • Hypercube
  • Using store-forward
  • Using cut-through
Write a Comment
User Comments (0)
About PowerShow.com