MESSAGE ROUTING SCHEMES IN A HYPERCUBE MACHINE

About This Presentation

Title:

MESSAGE ROUTING SCHEMES IN A HYPERCUBE MACHINE

Description:

The torus and hypercube are symmetrical topologies in which the degree of a node ... mesh, all the nodes in tori and hypercubes are identical in connectivity. ... – PowerPoint PPT presentation

Number of Views:416

Avg rating:3.0/5.0

Slides: 49

Provided by: Syed82

Category:

more less

Transcript and Presenter's Notes

Title: MESSAGE ROUTING SCHEMES IN A HYPERCUBE MACHINE

1
MESSAGE ROUTING SCHEMES IN A HYPERCUBE MACHINE

S. Raghupathy,
M. R. Leuze, and
S. R. Schach
Presented by Syed Md. Shakir

2
What are Interconnected Networks and why do we
need them?

One way for processors to communicate data is to
use a shared memory and shared variables. However
this is unrealistic for large numbers of
processors. A more realistic assumption is that
each processor has its own private memory and
data communication takes place using message
passing via an Interconnection Network.
The interconnection network plays a central role
in determining the overall performance of a
multicomputer system. If the network cannot
provide adequate performance, for a particular
application, nodes will frequently be forced to
wait for data to arrive.

3
Parallel Computers

Large-scale parallel computers are potential
candidates for providing very high computational
power
These systems are usually organized as an
ensemble of nodes, each with its own processor,
local memory, and other supporting devices.
The nodes are interconnected using a variety of
topologies that can be classified into two broad
categories
Direct
Indirect.

4
Direct Networks

In direct networks, each node has a
point-to-point or direct connection to some of
the other nodes, called neighboring nodes
examples of direct network topologies include
hypercube, mesh, and tree.

5
Indirect Networks

In indirect networks, the nodes are connected to
other nodes or a shared memory through one or
more switching elements.
Examples of indirect networks include crossbar,
bus, and multistage interconnection networks.

Multistage interconnected Network
6
Indirect Network

Cross Bar
7
Communication Latency

The communication latency of direct networks
depends on several factors including switching,
routing, flow control, and topology. Several
switching techniques have been proposed for
direct networks.
Wormhole switching has emerged as a popular
technique and has been used in both commercial
and experimental systems.
Wormhole switching can be employed in both direct
and indirect networks. It is widely used in
contemporary multicomputer because of its low
latency and requirement of small buffers at the
nodes.

8
cont...

The mesh is an asymmetrical topology in which the
node degree depends on its location.
Interprocessor communication performance depends
on the location of source and destination.
The torus and hypercube are symmetrical
topologies in which the degree of a node is the
same irrespective of its location in the network.
Thus, unlike the mesh, all the nodes in tori and
hypercubes are identical in connectivity.

9
Routing in Parallel Computers

Parallel computers are modeled by directed graphs
All interconnections between processors (nodes)
occur in synchronous steps
Each link can carry at most one unit message
(packet) in one step
During a step, a node can send at most one packet
to each of its neighbors
Each node is uniquely identified by a number
between 1 and N

10
Switching Techniques

In most multicomputer systems, a message enters
the network from a source node and is switched or
routed towards its destination through a series
of intermediate nodes.
Four types of switching techniques are usually
used for this purpose
circuit switching
packet switching
virtual cut-through switching
wormhole switching.

11
Circuit Switching

In circuit switching, a dedicated path is
established between the source and the
destination before data transfer initiates.
Once the data transfer is initiated the message
is never blocked.
As the channels creating the path are reserved
exclusively, buffering of data is not required.
On the other hand, establishing the path requires
significant overhead during the
data-transmission
phase, all channels are reserved for the entire
duration of message transfer.
Circuit switching thus degrades performance and
is no longer used in commercial multicomputer
systems.

12
Packet Switching

In packet switching, a message is divided into
packets that are independently routed towards its
destination.
The destination address is encoded in the header
of each packet. The entire packet is stored at
every intermediate node and then forwarded to the
next node in its path.
The main advantage of packet switching is that
the channel resource is occupied only when a
packet is actually transferred.

13
Packet Switching cont...

Each packet contains the routing information and
alternative paths can be selected upon
encountering network congestion or faulty nodes.
The major drawback of packet switching
Since the packet is stored entirely at each
intermediate node, the time to transmit a packet
from source to destination is directly
proportional to the number of hops in the path.
At each intermediate node, we need buffer space
to hold at least one packet.

14
Virtual Cut Through

In order to reduce the time to store the packets
at each node, Kermani and Kleinrock introduced a
technique called virtual cut-through
In this, while routing toward its destination, a
message is stored at an intermediate node only if
the next channel required is occupied by another
packet.
Now, the distance between the source and
destination has little effect on communication
latency.

15
cont...

In an extreme case, when a message encounters
blocking at all the intermediate nodes, the
virtual cut-through technique reduces to packet
switching.
The disadvantage of the virtual cut-through
technique
Implementation cost each node must provide
sufficient buffer space for all the messages
passing through it, and because multiple messages
may be blocked at any node, a very large buffer
space is required at each node.
This implementation constraint limits the use of
virtual cut-through technique.

16
Wormhole Switching

Wormhole switching is a variant of the virtual
cut-through technique that avoids the need for
large buffer spaces.
In wormhole switching, a packet is transmitted
between the nodes in units of flits, the smallest
units of a message on which flow control can be
performed.
The header flit(s) of a message contains all the
necessary routing information and all the other
flits contain the data elements.
The flits of the message are transmitted through
the network in a pipelined fashion.

17
cont...

Since only the header flit(s) has the routing
information, all the trailing flits follow the
header flit(s) contiguously.
Flits of two different messages cannot be
interleaved at any intermediate node.
Successive flits in a packet are pipelined
asynchronously in hardware using a handshaking
protocol.
When the header flit is blocked, then all the
trailing flits occupy the buffers at the
intermediate nodes.

18

Wormhole Switching
Messages

D
H
Packets
Flits
D
D
D
D
D
D
D
D
D
D
D
D
D
D
H
D Data Flit H Header Flit (a)
(b)
Message format and routing in Wormhole Switching
19
Advantages of Wormhole Switching

The main advantage of wormhole switching derives
from the pipelined message flow since
transmission latency is insensitive to the
distance between the source and destination.
Moreover, since the message moves flit by flit
across the network, each node needs to store only
one flit.
Some implementations, however, require storage of
multiple flits at each node to improve routing
performance. The reduction of buffer
requirements at each node has a major effect on
the cost and size of multicomputer systems.

20
Disadvantages of Wormhole Switching

The main disadvantage of wormhole switching comes
from the fact that only the header flit has the
routing information.
If the header flit cannot advance in the network
due to resource contention, all the trailing
flits are also blocked along the path and these
blocked messages can block other messages.
This chained blocking can also lead to deadlock
where messages wait for each other in a cycle and
hence no message can advance any further.

21
cont...

Prevention of deadlock is one of the main issues
in wormhole switching, and is usually
accomplished by a suitable choice of routing
function that selectively prohibits messages from
taking all the available paths, thus preventing
cycles in the network.
Selection of a routing algorithm is thus a major
issue in wormhole-switched networks.

22
Hypercube Network

An n-dimensional hypercube network
Number of nodes N 2n
Degree n
The node i with address (i1, i2, , in) ? 0, 1n
and the node j with address (j1, j2, , jn) ? 0,
1n are connected if the hamming distance between
(i1, i2, , in) and (j1, j2, , jn) is 1

23
Hypercube Topology
24
4d Hypercube

K dimensional hypercube is formed by combining
two k-1 dimensional hypercubes and connecting
corresponding nodes i.e. hypercubes are
recursive, each node is connected to k other
nodes i.e. each is of degree k.

25
Static routing in Hypercube

Given a source node Ns
Destination node Nd
The addresses of the 2n processors can be
represented using n bits.
Then the next node on the route
from Ns to Nd is the node represented by bit
pattern (en-l, . . ., cl, CO) with bit i
flipped, that
is to say, the message is routed in dimension
i
The algorithm continues in this way until
the message arrives at node Nd.

26
Static routing

Algorithm
Given a destination address d(i) and an
intermediate node ?(i)
Compare the bits of d(i) with ?(i) from left to
right
Identify the first bit position at which these
two addresses differ
Route this packet to its neighbor n(i) such that
?(i) and n(i) differ only in this bit position

27
Static Routing Algorithm

Example
Source (0, 0, 0, 0, 0, 0)
Destination (1, 0, 1, 0, 1, 1)
(0, 0, 0, 0, 0, 0) ? (1, 0, 0, 0, 0, 0) ?
(1, 0, 1, 0, 0, 0) ? (1, 0, 1, 0, 1, 0) ?
(1, 0, 1, 0, 1, 1)

28
Advantages and Disadvantages

Advantage
No overhead for calculating new routes.
Same CPU cycles can be used for other
computational purpose.
Disadvantage
Blocking is a common consequence.

29

Dynamic routing

It allows every message to select the (locally)
optimal route under the current circumstances.
In Dynamic routing, if link is blocked then
attempt is made to pass the message through other
link.
More utilization of the network
It uses local knowledge.

30
Dynamic routing

Allows the message to route from Ns,
to Nd ,depending on circumstances.
Allows optimal route under the current
circumstances
Overhead of implementing dynamic routing.
At each node calculations have to be performed to
determine the next node to which the message
should be routed,
and links have to be tested to see which ones
are free.

31
Advantages And Disadvantages

Advantages
Blocking is not a major problem
Disadvantages
overhead of implementing dynamic routing.
At each node calculations have to be performed to
determine the next node to which the message
should be routed,
links have to be tested to see which ones
are free.
The size of the overhead will vary
from hypercube to hypercube. In some
machines, the additional work can be done in
hardware in parallel with other operations in
other machines, it must be done in software,
using machine cycles that could otherwise be
used for productive computing.

32
PRIORITIZATION

If a number of messages are waiting to use a
link, one method of choosing which message
to transmit is on the basis of
(FIFO), the method used in commercial
hypercubes.
In the paper alternative prioritization schemes,
such as LIFO, giving priority to the message with
the maximum number of remaining hops is also
considered

33
Other Prioritization Schema

The processes form a DAG, each process can be
assigned a sequence number such that every
message is sent to a process with a higher
sequence number than the sequence number of the
process that generated the message.
The sequence number of the generating process can
then be used to prioritize messages

34
Message Format
35
The Prioritization Schema
36
The Simulator

The simulator was constructed to investigate
routing strategies.
The header contains information such as source
and destination node, as well as information
needed when the order of transmission of messages
is done on the basis of
prioritization, such as sequence number, time
generated, arrival time at the current node, and
number of hops that still have to be traversed.

37
Execution Cycle Of The Simulator

The simulator has three phases
Message generation
Message ordering
Message routing.

38
Message Generation Phase

In this phase each active process is checked to
see if it has received all the messages it
requires.
If so, the messages it is to transmit are
generated, and placed in the message buffer.
The process then terminates.
After all possible messages have been
generated, the simulator enters the message
ordering phase.

39
Message Ordering Phase

After entering the message ordering phase the
messages in each buffer are ordered according to
the prioritization scheme currently being
evaluated.
In the case of equal priorities, ties are broken
randomly.
Finally, the message routing cycle commences.

40
Message Routing Phase

After each message is fetched from the message
buffer and an attempt is made to transmit it to a
neighboring node.
If static routing is being used, and the
predetermined link is in use, then that
particular message is blocked.
When dynamic routing is used, an attempt is made
to transmit the message over the first unused
link that will move it closer to its destination.

41
Results

Dynamic routing performs better than static
routing, but the improvement factor varies
depending on the prioritization scheme. At best,
the improvement is by a
factor of two.
Best results occur when priority is given to
messages with the lowest sequence number.
Results almost as good are obtained when priority
is given to messages with the fewer number of
hops, either in the original message or remaining
to be traversed.

42
Results Continued...

Messages of lowest sequence number are
essentially those transmitted earliest in the
computation sequence. Giving priority to such
messages essentially speeds up the rate at which
processes can begin transmitting, and hence
speeds up the computation as a whole.
The traffic congestion in the hypercube is
decreased by giving priority to messages with the
fewest numbers of hops and therefore allowing the
longer messages to proceed with less blocking
than would otherwise be the case.
By giving priority to messages with fewer
choices, the
overall amount of blocking is decreased

43
One Bidirectional Link Between Nodes
44
Two Unidirectional Links Between Nodes
45
Percentage Improvement When Two Unidirectional
Lines Are Used
46
Observations From The Graphs Above

Having two unidirectional links improve
throughput over one bidirectional link
Note Improvement depends on the prioritization
scheme.
The percentage improvement is rarely more than
fifteen per cent, and is usually much smaller.
This effect may be caused by the fact that the
problem graph is a DAG, thereby imposing a
directionality on the flow of messages

47
Conclusions

Throughput of a certain class of problems on a
hypercube can be increased by up an order of two
through use of dynamic rather than static routing
algorithms, and also by prioritizing the
messages.
It is likely that different prioritization
schemes would yield
improved throughput for other classes of
problems.

48
Questions ?

Write a Comment

User Comments (0)