Title: MESSAGE ROUTING SCHEMES IN A HYPERCUBE MACHINE
1MESSAGE ROUTING SCHEMES IN A HYPERCUBE MACHINE
- S. Raghupathy,
- M. R. Leuze, and
- S. R. Schach
- Presented by Syed Md. Shakir
2What are Interconnected Networks and why do we
need them?
- One way for processors to communicate data is to
use a shared memory and shared variables. However
this is unrealistic for large numbers of
processors. A more realistic assumption is that
each processor has its own private memory and
data communication takes place using message
passing via an Interconnection Network. - The interconnection network plays a central role
in determining the overall performance of a
multicomputer system. If the network cannot
provide adequate performance, for a particular
application, nodes will frequently be forced to
wait for data to arrive.
3Parallel Computers
- Large-scale parallel computers are potential
candidates for providing very high computational
power - These systems are usually organized as an
ensemble of nodes, each with its own processor, - local memory, and other supporting devices.
- The nodes are interconnected using a variety of
topologies that can be classified into two broad
categories - Direct
- Indirect.
4Direct Networks
- In direct networks, each node has a
point-to-point or direct connection to some of
the other nodes, called neighboring nodes - examples of direct network topologies include
hypercube, mesh, and tree.
5Indirect Networks
- In indirect networks, the nodes are connected to
other nodes or a shared memory through one or
more switching elements. - Examples of indirect networks include crossbar,
bus, and multistage interconnection networks.
Multistage interconnected Network
6Indirect Network
Cross Bar
7Communication Latency
- The communication latency of direct networks
depends on several factors including switching,
routing, flow control, and topology. Several
switching techniques have been proposed for
direct networks. - Wormhole switching has emerged as a popular
technique and has been used in both commercial
and experimental systems. - Wormhole switching can be employed in both direct
and indirect networks. It is widely used in
contemporary multicomputer because of its low
latency and requirement of small buffers at the
nodes.
8cont...
- The mesh is an asymmetrical topology in which the
node degree depends on its location. - Interprocessor communication performance depends
on the location of source and destination. - The torus and hypercube are symmetrical
topologies in which the degree of a node is the
same irrespective of its location in the network.
Thus, unlike the mesh, all the nodes in tori and
hypercubes are identical in connectivity.
9Routing in Parallel Computers
- Parallel computers are modeled by directed graphs
- All interconnections between processors (nodes)
occur in synchronous steps - Each link can carry at most one unit message
(packet) in one step - During a step, a node can send at most one packet
to each of its neighbors - Each node is uniquely identified by a number
between 1 and N
10Switching Techniques
- In most multicomputer systems, a message enters
the network from a source node and is switched or
routed towards its destination through a series
of intermediate nodes. - Four types of switching techniques are usually
used for this purpose - circuit switching
- packet switching
- virtual cut-through switching
- wormhole switching.
11Circuit Switching
- In circuit switching, a dedicated path is
established between the source and the
destination before data transfer initiates. - Once the data transfer is initiated the message
is never blocked. - As the channels creating the path are reserved
exclusively, buffering of data is not required. - On the other hand, establishing the path requires
significant overhead during the
data-transmission - phase, all channels are reserved for the entire
duration of message transfer. - Circuit switching thus degrades performance and
is no longer used in commercial multicomputer
systems.
12Packet Switching
- In packet switching, a message is divided into
packets that are independently routed towards its
destination. - The destination address is encoded in the header
of each packet. The entire packet is stored at
every intermediate node and then forwarded to the
next node in its path. - The main advantage of packet switching is that
the channel resource is occupied only when a
packet is actually transferred.
13Packet Switching cont...
- Each packet contains the routing information and
alternative paths can be selected upon
encountering network congestion or faulty nodes. - The major drawback of packet switching
- Since the packet is stored entirely at each
intermediate node, the time to transmit a packet
from source to destination is directly
proportional to the number of hops in the path. - At each intermediate node, we need buffer space
to hold at least one packet.
14Virtual Cut Through
- In order to reduce the time to store the packets
at each node, Kermani and Kleinrock introduced a
technique called virtual cut-through - In this, while routing toward its destination, a
message is stored at an intermediate node only if
the next channel required is occupied by another
packet. - Now, the distance between the source and
destination has little effect on communication
latency.
15cont...
- In an extreme case, when a message encounters
blocking at all the intermediate nodes, the
virtual cut-through technique reduces to packet
switching. - The disadvantage of the virtual cut-through
technique - Implementation cost each node must provide
sufficient buffer space for all the messages
passing through it, and because multiple messages
may be blocked at any node, a very large buffer
space is required at each node. - This implementation constraint limits the use of
virtual cut-through technique.
16Wormhole Switching
- Wormhole switching is a variant of the virtual
cut-through technique that avoids the need for
large buffer spaces. - In wormhole switching, a packet is transmitted
between the nodes in units of flits, the smallest
units of a message on which flow control can be
performed. - The header flit(s) of a message contains all the
necessary routing information and all the other
flits contain the data elements. - The flits of the message are transmitted through
the network in a pipelined fashion.
17cont...
- Since only the header flit(s) has the routing
information, all the trailing flits follow the
header flit(s) contiguously. - Flits of two different messages cannot be
interleaved at any intermediate node. - Successive flits in a packet are pipelined
asynchronously in hardware using a handshaking
protocol. - When the header flit is blocked, then all the
trailing flits occupy the buffers at the
intermediate nodes.
18 Wormhole Switching
Messages
D
H
Packets
Flits
D
D
D
D
D
D
D
D
D
D
D
D
D
D
H
D Data Flit H Header Flit (a)
(b)
Message format and routing in Wormhole Switching
19Advantages of Wormhole Switching
- The main advantage of wormhole switching derives
from the pipelined message flow since
transmission latency is insensitive to the
distance between the source and destination. - Moreover, since the message moves flit by flit
across the network, each node needs to store only
one flit. - Some implementations, however, require storage of
multiple flits at each node to improve routing
performance. The reduction of buffer
requirements at each node has a major effect on
the cost and size of multicomputer systems.
20Disadvantages of Wormhole Switching
- The main disadvantage of wormhole switching comes
from the fact that only the header flit has the
routing information. - If the header flit cannot advance in the network
due to resource contention, all the trailing
flits are also blocked along the path and these
blocked messages can block other messages. - This chained blocking can also lead to deadlock
where messages wait for each other in a cycle and
hence no message can advance any further. -
21cont...
- Prevention of deadlock is one of the main issues
in wormhole switching, and is usually
accomplished by a suitable choice of routing
function that selectively prohibits messages from
taking all the available paths, thus preventing
cycles in the network. - Selection of a routing algorithm is thus a major
issue in wormhole-switched networks.
22Hypercube Network
- An n-dimensional hypercube network
- Number of nodes N 2n
- Degree n
- The node i with address (i1, i2, , in) ? 0, 1n
and the node j with address (j1, j2, , jn) ? 0,
1n are connected if the hamming distance between
(i1, i2, , in) and (j1, j2, , jn) is 1
23Hypercube Topology
244d Hypercube
- K dimensional hypercube is formed by combining
two k-1 dimensional hypercubes and connecting
corresponding nodes i.e. hypercubes are
recursive, each node is connected to k other
nodes i.e. each is of degree k.
25Static routing in Hypercube
- Given a source node Ns
- Destination node Nd
- The addresses of the 2n processors can be
represented using n bits. - Then the next node on the route
- from Ns to Nd is the node represented by bit
- pattern (en-l, . . ., cl, CO) with bit i
flipped, that - is to say, the message is routed in dimension
i - The algorithm continues in this way until
- the message arrives at node Nd.
26Static routing
- Algorithm
- Given a destination address d(i) and an
intermediate node ?(i) - Compare the bits of d(i) with ?(i) from left to
right - Identify the first bit position at which these
two addresses differ - Route this packet to its neighbor n(i) such that
?(i) and n(i) differ only in this bit position
27Static Routing Algorithm
- Example
- Source (0, 0, 0, 0, 0, 0)
- Destination (1, 0, 1, 0, 1, 1)
- (0, 0, 0, 0, 0, 0) ? (1, 0, 0, 0, 0, 0) ?
- (1, 0, 1, 0, 0, 0) ? (1, 0, 1, 0, 1, 0) ?
- (1, 0, 1, 0, 1, 1)
28Advantages and Disadvantages
- Advantage
- No overhead for calculating new routes.
- Same CPU cycles can be used for other
computational purpose. - Disadvantage
- Blocking is a common consequence.
29 Dynamic routing
- It allows every message to select the (locally)
optimal route under the current circumstances. - In Dynamic routing, if link is blocked then
attempt is made to pass the message through other
link. - More utilization of the network
- It uses local knowledge.
30Dynamic routing
- Allows the message to route from Ns,
- to Nd ,depending on circumstances.
- Allows optimal route under the current
circumstances - Overhead of implementing dynamic routing.
- At each node calculations have to be performed to
determine the next node to which the message
should be routed, - and links have to be tested to see which ones
are free.
31Advantages And Disadvantages
- Advantages
- Blocking is not a major problem
- Disadvantages
- overhead of implementing dynamic routing.
- At each node calculations have to be performed to
determine the next node to which the message
should be routed, - links have to be tested to see which ones
- are free.
- The size of the overhead will vary
- from hypercube to hypercube. In some
- machines, the additional work can be done in
- hardware in parallel with other operations in
- other machines, it must be done in software,
- using machine cycles that could otherwise be
used for productive computing.
32PRIORITIZATION
- If a number of messages are waiting to use a
- link, one method of choosing which message
- to transmit is on the basis of
- (FIFO), the method used in commercial
- hypercubes.
- In the paper alternative prioritization schemes,
such as LIFO, giving priority to the message with
the maximum number of remaining hops is also
considered
33Other Prioritization Schema
- The processes form a DAG, each process can be
assigned a sequence number such that every
message is sent to a process with a higher
sequence number than the sequence number of the
process that generated the message. - The sequence number of the generating process can
then be used to prioritize messages
34Message Format
35The Prioritization Schema
36The Simulator
- The simulator was constructed to investigate
routing strategies. - The header contains information such as source
and destination node, as well as information
needed when the order of transmission of messages
is done on the basis of - prioritization, such as sequence number, time
generated, arrival time at the current node, and
number of hops that still have to be traversed.
37Execution Cycle Of The Simulator
- The simulator has three phases
- Message generation
- Message ordering
- Message routing.
38Message Generation Phase
- In this phase each active process is checked to
see if it has received all the messages it
requires. -
- If so, the messages it is to transmit are
generated, and placed in the message buffer. - The process then terminates.
- After all possible messages have been
- generated, the simulator enters the message
ordering phase.
39Message Ordering Phase
- After entering the message ordering phase the
messages in each buffer are ordered according to
the prioritization scheme currently being
evaluated. - In the case of equal priorities, ties are broken
- randomly.
- Finally, the message routing cycle commences.
40Message Routing Phase
- After each message is fetched from the message
buffer and an attempt is made to transmit it to a
neighboring node. - If static routing is being used, and the
predetermined link is in use, then that
particular message is blocked. - When dynamic routing is used, an attempt is made
to transmit the message over the first unused
link that will move it closer to its destination.
41Results
- Dynamic routing performs better than static
routing, but the improvement factor varies
depending on the prioritization scheme. At best,
the improvement is by a - factor of two.
- Best results occur when priority is given to
messages with the lowest sequence number. - Results almost as good are obtained when priority
is given to messages with the fewer number of
hops, either in the original message or remaining
to be traversed.
42Results Continued...
- Messages of lowest sequence number are
essentially those transmitted earliest in the
computation sequence. Giving priority to such
messages essentially speeds up the rate at which
processes can begin transmitting, and hence
speeds up the computation as a whole. - The traffic congestion in the hypercube is
decreased by giving priority to messages with the
fewest numbers of hops and therefore allowing the
longer messages to proceed with less blocking
than would otherwise be the case. - By giving priority to messages with fewer
choices, the - overall amount of blocking is decreased
43One Bidirectional Link Between Nodes
44Two Unidirectional Links Between Nodes
45Percentage Improvement When Two Unidirectional
Lines Are Used
46Observations From The Graphs Above
- Having two unidirectional links improve
throughput over one bidirectional link - Note Improvement depends on the prioritization
scheme. - The percentage improvement is rarely more than
fifteen per cent, and is usually much smaller. - This effect may be caused by the fact that the
problem graph is a DAG, thereby imposing a
directionality on the flow of messages
47Conclusions
- Throughput of a certain class of problems on a
hypercube can be increased by up an order of two
through use of dynamic rather than static routing
algorithms, and also by prioritizing the
messages. -
- It is likely that different prioritization
schemes would yield - improved throughput for other classes of
problems.
48 Questions ?