Title: Network-on-Chip
1- Network-on-Chip
- (2/2)
- Ben Abdallah, Abderazek
- The University of Aizu
- E-mail benab_at_u-aizu.ac.jp
KUST University, March 2011
2Part 3
- Routing
- Routing Algorithms
- Deterministic Routing
- Oblivious Routing
- Adaptive Routing
3Routing Basics
- Once topology is fixed
- Routing algorithm determines path(s) from source
to destination - They must prevent deadlock, livelock , and
starvation
4Routing Deadlock
- Without routing restrictions, a resource cycle
can occur - Leads to deadlock
5Deadlock Definition
- Deadlock A packet does not reach its
destination, because it is blocked at some
intermediate resource - Livelock A packet does not reach its
destination, because it enters a cyclic path - Starvation A packet does not reach its
destination, because some resource does not
grant access (wile it grants access to other
packets)
6Routing Algorithm Attributes
- Number of destinations
- Unicast, Multicast, Broadcast?
- Adaptivity
- Deterministic , Oblivious or Adaptive?
- Implementation (Mechanisms)
- Source or node routing?
- Table or circuit?
7 8Deterministic Routing
- Always choose the same path between two nodes
- Easy to implement and to make deadlock free
- Do not use path diversity and thus bad on load
balancing - Packets arrive in order
9Deterministic Routing - Example Destination-Tag
Routing in Butterfly Networks
- Depends on the destination address only (not on
source)
2
1
101
1
3
0
The destination address interpreted as quaternary
digits. 111011(2) 23(4), selects the route
The destination address in binary is 5 101
down, up, down, selects the route.
Note Starting from any source and using the
same pattern always routes to destination.
10Deterministic Routing- Dimension-Order Routing
- For n-dimensional hypercubes and meshes,
dimension-order routing produces deadlock-free
routing algorithms. - It is called XY routing in 2-D mesh and e-cube
routing in hypercubes
11Dimension-Order Routing - XY Routing Algorithm
D
D
12Dimension-Order Routing -XY Routing Algorithm
XY routing algorithm for 2 D Mesh
13Deterministic Routing - E-cube Routing Algorithm
Dimension order routing algorithm for Hypercubes
14 15Oblivious (unconscious) Routing
- Always choose a route without knowing about the
state of the network - Random algorithms that do not consider the
network state, are oblivious algorithms - Include deterministic routing algorithms as a
subset
16Minimal Oblivious Routing
- Minimal oblivious routing attempts to achieve the
load balance of randomized routing without giving
up the locality - This is done by restricting routes to minimal
paths - Again routing is done in two steps
- Route to random node
- Route to destination
17Minimal Oblivious Routing - (Torus)
- Idea For each packet randomly determine a node x
inside the minimal quadrant, such that the packet
is routed from source node s to x and then to
destination node d - Assumption At each node routing in x or y
direction is allowed.
18Minimal Oblivious Routing - (Torus)
- For each node in quadrant (00, 10, 20, 01, 11,
21) - Determine a minimal route via x
- Start with x 00
- Three possible routes
- (00, 01, 11, 21) (p0.33)
- (00, 10, 20, 21) (p0.33)
- (00,10,11,21) (p0.33)
19Minimal Oblivious Routing - (Torus)
- x 01
- One possible route
- (00, 01, 11, 21) (p1)
20Minimal Oblivious Routing - (Torus)
- x 10
- Two possible routes
- (00, 10, 20, 21) (p0.5)
- (00, 10, 11, 21) (p0.5)
21Minimal Oblivious Routing - (Torus)
- x 11
- Two possible routes
- (00, 10, 11, 21) (p0.5)
- (00, 01, 11, 21) (p0.5)
22Minimal Oblivious Routing - (Torus)
- x 20
- One possible route
- (00, 10, 20, 21) (p1)
23Minimal Oblivious Routing - (Torus)
- x 21
- Three possible routes
- (00, 01, 11, 21) (p0.33)
- (00, 10, 20, 21) (p0.33)
- (00, 10, 11, 21) (p0.33)
24Minimal Oblivious Routing - (Torus)
- Adding the probabilities on each channel
- Example, link (00,01)
- P1/3, x 00
- P1, x 01
- P0, x 10
- P1/2, x 11
- P0, x 20
- P1/3, x 21
- P(00,01)(21/31/21)/6
- 2.17/6
25Minimal Oblivious Routing - (Torus)
- Results
- Load is not very balanced
- Path between node 10 and 11 is very seldomly used
- Good locality performance is achieved at expense
of worst-case performance
26- Adaptive Routing
- (route influenced by traffic along the way)
27Adaptive Routing
- Uses network state to make routing decisions
- Buffer occupancies often used
- Couple with flow control mechanism
- Local information readily available
- Global information more costly to obtain
- Network state can change rapidly
- Use of local information can lead to non-optimal
choices - Can be minimal or non-minimal
28Adaptive RoutingLocal Information not enough
- In each cycle
- Node 5 sends packet to node 6
- Node 3 sends packet to node 7
29Adaptive RoutingLocal Information not enough
- Node 3 does not know about the traffic between 5
and 6 before the input buffers between node 3 and
5 are completely filled with packets!
30Adaptive RoutingLocal Information is not enough
- Adaptive flow works better with smaller buffers,
since small buffers fill faster and thus
congestion is propagated earlier to the sensing
node (stiff backpressure)
31Adaptive Routing
- How does the adaptive routing algorithm sense the
state of the network? - It can only sense current local information
- Global information is based on historic local
information - Changes in the traffic flow in the network are
observed much later
32Minimal Adaptive Routing
- Minimal adaptive routing chooses among the
minimal routes from source s to destination d
33Minimal Adaptive Routing
- At each hop a routing function generates a
productive output vector that identifies which
output channels of the current node will move the
packet closer to its destination - Network state is then used to select one of these
channels for the next hop
34Minimal Adaptive Routing
- Good at locally balancing load
- Poor at globally balancing load
- Minimal adaptive routing algorithms are unable to
avoid congestion of source-destination pairs with
no minimal path diversity.
35Fully Adaptive Routing
- Fully-Adaptive Routing does not restrict packets
to take the shortest path - Misrouting is allowed
- This can help to avoid congested areas and
improves load balance
36Fully Adaptive RoutingLive-Lock
- Fully-Adaptive Routing may result in live-lock!
- Mechanisms must be added to prevent livelock
- Misrouting may only be allowed a fixed number of
times
37Summary of Routing Algorithms
- Deterministic routing is a simple and inexpensive
routing algorithm, but does not utilize path
diversity and thus is weak on load balancing - Oblivious algorithms give often good results
since they allow load balancing and their effects
are easy to analyse - Adaptive algorithms, though in theory superior,
suffer from that global information is not
available at a local node -
38Summary of Routing Algorithms
- Latency paramount concern
- Minimal routing most common for NoC
- Non-minimal can avoid congestion and deliver low
latency - To date NoC research favors DOR for simplicity
and deadlock freedom - Only covered unicast routing
- Recent work on extending on-chip routing to
support multicast
39- Part 4
- NoC Routing Mechanisms
-
40Routing
The term routing mechanics refers to the
mechanism that is used to implement any routing
algorithm
- Two approaches
- Fixed routing tables at the source or at each hop
- Algorithmic routing uses specialized hardware to
compute the route or next hop at run-time
41Table-based Routing
- Two approaches
- Source-table routing implements all-at-once
routing by looking up the entire route at the
source - Node-table routing performs incremental routing
by looking up the hop-by-hop routing relation at
each node along the route - Major advantage
- A routing table can support any routing relation
on any topology
42Table-based Routing
Example routing mechanism for deterministic
source routing NoCs. The NI uses a LUT to store
the route map.
43Source Routing
- All routing decisions are made at the source
terminal - To route a packet
- the table is indexed using the packet destination
- a route or a set of routes are returned
- one route is selected
- the route is prepended to the packet
- Because of its speed, simplicity and scalability
source routing is very often used for
deterministic and oblivious routing
44Source Routing - Example
- The example shows a routing table for a 4x2 torus
network - In this example there are two alternative routes
for each destination - Each node has its own routing table
4x2 torus network
In this example the order of XY should be the
opposite, i.e. 21-gt12
Source routing table for node 00 of 4x2 torus
network
Destination Route 0 Route 1
00 X X
10 EX WWWX
20 EEX WWX
30 WX EEEX
01 NX SX
11 NEX ENX
21 NEEX WWNX
31 NWX WNX
Example -Routing from 00 to 21 -Table is indexed
with 21 -Two routes NEEX and WWNX -The source
arbitrarily selects NEEX
index
select
45Arbitrary Length Encoding of Source Routes
- Advantage
- It can be used for arbitrary-sized networks
- The complexity of routing is moved from the
network nodes to the terminal nodes - But routers must be able to handle arbitrary
length routes
46Arbitrary Length-Encoding
- Router has
- 16-bit phits
- 32-bit flits
- Route has 13 hops NENNWNNENNWNN
- Extra symbols
- P Phit continuation selector
- F Flit continuation Phit
- The tables entries in the terminals must be of
arbitrary length
47Node-Table Routing
- Table-based routing can also be performed by
placing the routing table in the routing nodes
rather than in the terminals - Node-table routing is appropriate for adaptive
routing algorithms, since it can use state
information at each node
48Node-Table Routing
- A table lookup is required, when a packet arrives
at a router, which takes additional time compared
to source routing - Scalability is sacrificed, since different nodes
need tables of varying size - Difficult to give two packets arriving from a
different node a different way through the
network without expanding the tables
49Example
N
E
- Table shows a set of routing tables
- There are two choices from a source to a
destination
Routing Table for Node 00
Note Bold font ports are misroutes
50Example
Livelock can occur
A packet passing through node 00 destined for
node 11. If the entry for (00-gt11) is N , go to
10 and (10-gt 11) is S gt 00 lt-gt 10 (livelock)
51Algorithmic Routing
- Instead of using a table, algorithms can be used
to compute the next route - In order to be fast, algorithms are usually not
very complicated and implemented in hardware
52Algorithmic Routing - Example
- Dimension-Order Routing
- sx and sy indicated the preferred directions
- sx0, x sx1, -x
- sy0, y sy1, -y
- x and y represent the number of hops in x and y
direction - The PDV is used as an input for selection of a
route
Determines the type of the routing
Indicates which channels advance the packet
53Algorithmic Routing - Example
- A minimal oblivious router - Implemented by
randomly selecting one of the active bits of the
PDV as the selected direction - Minimal adaptive router - Achieved by making
selection based on the length of the respective
output Qs. - Fully adaptive router Implemented by picking up
unproductive direction if Qs gt threshold results
54Summary
- Routing Mechanics
- Table based routing
- Source routing
- Node-table routing
- Algorithmic routing
55Exercise
- Compression of source routes. In the source
routes, each port selector symbol N,S,W,E, and
X was encoded with three bits. Suggest an
alternative encoding to reduce the average length
(in bits) required to represent a source route.
Justify your encoding in terms of typical routes
that might occur on a torus. Also compare the
original three bits per symbol with your encoding
on the following routes - NNNNNEEX
- WNEENWWWWWNX
56- Part 5
- NoC Flow Control
- Resources in a Network Node
- Bufferless Flow Control
- Buffered Flow control
57Flow Control (FC)
FC determines how the resources of a network,
such as channel bandwidth and buffer capacity are
allocated to packets traversing a network.
- Goal is to use resources as efficient as possible
to allow a high throughput - An efficient FC is a prerequisite to achieve a
good network performance
58Flow Control
- FC can be viewed as a problem of
- Resource allocation
- Contention resolution
- Resources in form of channels, buffers and state
must be allocated to each packet - If two packets compete for the same channel flow
control can only assign the channel to one
packet, but must also deal with the other packet
59Flow Control
- Flow Control can be divided into
- Bufferless flow control
- Packets are either dropped or misrouted
- Buffered flow control
- Packets that cannot be routed via the desired
channel are stored in buffers
60Resources in a Network Node
- Control State
- Tracks the resources allocated to the packet in
the node and the state of the packet - Buffer
- Packet is stored in
a buffer before it is
send to next node - Bandwidth
- To travel to the next node bandwidth has to be
allocated for the packet
61Units of Resource Allocation -Packet or Flits?
- Contradictory requirements on packets
- Packets should be very large in order to reduce
overhead of routing and sequencing - Packets should be very small to allow efficient
and fine-grained resource allocation and minimize
blocking latency - Flits try to eliminate this conflict
- Packets can be large (low overhead)
- Flits can be small (efficient resource allocation)
62Units of Resource Allocation - Size Phit, Flit,
Packet
- There are no fixed rules for the size of phits,
flits and packets - Typical values
- Phits 1 bit to 64 bits
- Flits 16 bits to 512 bits
- Packets 128 bits to 1024 bits
63Bufferless Flow Control
- No buffers ?less implementation cost
- If more than 1 packet shall be routed to the same
output, 1 has to be - Misrouted or
- Dropped
- Example two
- packets A, and B
- (consisting
- of several flits) arrive at a network node.
64Bufferless Flow Control
- Packet B is dropped and must be resended
- There must be a protocol that informs the sending
node that the packet has been dropped - Example Resend after no acknowledge has been
received within a given time
65Bufferless Flow Control
- Packet B is misrouted
- No further action is required here, but at the
receiving node packets have to be sorted into
original order
66Circuit Switching
- Circuit-Switching is a bufferless flow control,
where several channels are reserved to form a
circuit - A request (R) propagates from source to
destination, which is answered by an
acknowledgement (A) - Then data is sent (here two five flit packets
(D)) and a tail flit (T) is sent to deallocate
the channels
67Circuit Switching
- Circuit-switching does not suffer from dropping
or misrouting packets - However there are two weaknesses
- High latency T 3 H tr L/b
- Low throughput, since channel is used to a large
fraction of time for signaling and not for
delivery of the payload
68Circuit Switching Latency
T 3 H tr L/b
Where H time required to set up the channel
and delivers the head flit tr serialization
latency L time of flight b contention time
Note 3 x header latency because the path from
source to destination must be traversed 3 times
to deliver the packet Once in each direction to
set up the circuit and then again to deliver the
first flit
69Buffered Flow Control
- More efficient flow control can be achieved by
adding buffers -
- With sufficient buffers packets do not need to be
misrouted or dropped, since packets can wait for
the outgoing channel to be ready
70Buffered Flow Control
- Two main approaches
- Packet-Buffer Flow Control
- Store-And-Forward
- Cut-Through
- Flit-Buffer Flow Control
- Wormhole Flow Control
- Virtual Channel Flow Control
71Store Forward Flow Control
- Each node along a route waits until a packet is
completely received (stored) and then the packet
is forwarded to the next node - Two resources are needed
- Packet-sized buffer in the switch
- Exclusive use of the outgoing channel
72Store Forward Flow Control
- Advantage While waiting to acquire resources, no
channels are being held idle and only a single
packet buffer on the current node is occupied - Disadvantage Very high latency
- T H (tr L/b)
73Cut-Through Flow Control
- Advantages
- Cut-through
reduces the latency - T H tr L/b
- Disadvantages
- No good utilization of buffers, since they are
allocated in units of packets - Contention latency is increased, since packets
must wait until a whole packet leaves the
occupied channel
74Wormhole Flow Control
- Wormhole FC operates like cut-through, but with
channel and buffers allocated to flits rather
than packets - When the head flit arrives at a node, it must
acquire resources (VC, B,) before it can be
forwarded to the next node - Tail flits behave like body flits, but release
also the channel
75Wormhole (WH) Flow Control
- Virtual channels hold the state needed to
coordinate the handling of flits of a packet over
a channel - Comparison to cut-through
- wormhole flow control makes far more efficient
use of buffer space - Throughput maybe less, since wormhole flow
control may block a channels mid-packets
76Example for WH Flow Control
- Input virtual channel is in idle state (I)
- Upper output channel is occupied, allocated to
lower channel (L)
77Example for WH Flow Control
- Input channel enters waiting state (W)
- Head flit is buffered
78Example for WH Flow Control
- Body flit is also buffered
- No more flits can be buffered, thus congestion
arises if more flits want to enter the switch
79Example for WH Flow Control
- Virtual channel enters active state (A)
- Head flit is output on upper channel
- Second body flit is accepted
80Example for WH Flow Control
- First body flit is output
- Tail flit is accepted
81Example for WH Flow Control
- Second body flit is output
82Example for WH Flow Control
- Tail flit is output
- Virtual channels is deallocated and returns to
idle state
83Wormhole Flow Control
- The main advantage of wormhole to cut-through is
that buffers in the routers do not need to be
able to hold full packets, but only need to store
a number of flits - This allows to use smaller and faster routers
84- Part 6
- NoC Flow Control (continued)
- Blocking
- Virtual Channel-Flow Control
- Virtual Channel Router
- Credit-Based Flow Control
- On/Off Flow Control
- Flow Control Summary
85Blocking - Cut-Through and Wormhole
Cut-Through (Buffer-Size 1 Packet)
Blocked
Wormhole (Buffer-Size 2 Flits)
Blocked
- If a packet is blocked, the flits of the wormhole
packet are stored in different routers
86Wormhole Flow Control
- There is only one virtual channel for each
physical channel - Packet A is blocked and cannot acquire channel p
- Though channels p and q are idle packet A cannot
use these channels since B owns channel p
87Virtual Channel-Flow Control
- In virtual channel flow-control several channels
are associated with a single physical channel - This allows to use the bandwidth that otherwise
is left idle when a packet blocks the channel - Unlike wormhole flow control subsequent flits
are not guaranteed bandwidth, since they have to
compete for bandwidth with other flits
88Virtual Channel Flow Control
- There are several virtual channels for each
physical channel - Packet A can use a second virtual channel and
thus proceed over channel p and q
89Virtual Channel Allocation
- Flits must be delivered in order, H, B, B, T.
- Only the head flit carries routing information
- Allocate VC at the packet level, i.e.,
packet-by-packet - The head flit responsible for allocating VCs
along the route. - Body and tail flits must follow the VC path, and
the tail flit releases the VCs. - The flits of a packet cannot interleave with
those of any other packet
90Virtual Channel Flow Control -Fair Bandwidth
Arbitration
- VCs interleave their flits ? Results in a high
average latency
91Virtual Channel Flow Control -Winner-Take-All
Arbitration
- A winner-take all arbitration reduces the average
latency with no throughput penalty
92Virtual Channel Flow Control -Buffer Storage
- Buffer storage is organized in two dimensions
- Number of virtual channels
- Number of flits that can be buffered per channel
93Virtual Channel Flow Control - Buffer Storage
- Virtual channel buffer shall at least be as deep
as needed to cover round-trip credit latency - In general it is usually better to add more
virtual channels than to increase the buffer size
94Virtual Channel
A active W waiting I idle
95Virtual Channel Router
96Buffer Organization
Single buffer per input
Multiple fixed length queues per physical channel
97Buffer Management
- In buffered CF nodes there is a need for
communication between nodes in order to inform
about the availability of buffers - Backpressure informs upstream nodes that they
must stop sending to a downstream node when the
buffers of that downstream node are full
Traffic Flow
upstream node
downstream node
98Credit-Based Flow Control
- The upstream router keeps a count of the number
of free flit buffers in each virtual channel
downstream - Each time the upstream router forwards a flit, it
decrements the counter - If a counter reaches zero, the downstream buffer
is full and the upstream node cannot send a new
flit - If the downstream node forwards a flit, it frees
the associated buffer and sends a credit to the
upstream buffer, which increments its counter
99Credit-Based Flow Control
100Credit-Based Flow Control
- The minimum time between the credit being sent at
time t1 and a credit send for the same buffer at
time t5 is the credit round-trip delay tcrt
All buffers on the downstream are full
101Credit-Based Flow Control
- If there is only a single flit buffer, a flit
waits for a new credit and the maximum throughput
is limited to one flit for each tcrt - The bit rate would be then Lf / tcrt where Lf is
the length of a flit in bits
102Credit-Based Flow Control
- If there are F flit buffers on the virtual
channel, F flits could be sent before waiting for
the credit, which gives a throughput of F flits
for each tcrt and a bit rate of FLf / tcrt
103Credit-Based Flow Control
- In order not to limit the throughput by low level
flow control the flit buffer should be at least - where b is the bandwidth of a channel
104Credit-Based Flow Control
- For each flit sent downstream a corresponding
credit is set upstream -
- Thus there is a large amount of upstream
signaling, which especially for small flits can
represent a large overhead!
105On/Off Flow Control
- On/off Flow control tries to reduce the amount of
upstream signaling - An off signal is sent to the upstream node, if
the number of free buffers falls below the
threshold Foff - An on signal is sent to the upstream node, if
the number of free buffers rises above the
threshold Fon -
- With carefully dimensioned buffers on/off flow
control can achieve a very low overhead in form
of upstream signaling
106Ack/Nack Flow Control
- In ack/nack flow control the upstream node sends
packets without knowing, if there are free
buffers in the downstream node
107Ack/Nack Flow Control
- If there is no buffer available
- the downstream node sends nack and drops the
flit - the flit must be resent
- flits must be reordered at the downstream node
- If there is a buffer available
- The downstream node sends ack and stores the flit
in a buffer
108Buffer Management
- Because of its buffer and bandwidth inefficiency
ack/nack is rarely used - Credit-based flow control is used in systems with
small numbers of buffers - On/off flow control is used in systems that have
large numbers of flit buffers
109Flow Control Summary
- Bufferless flow control
- Dropping, misroute packets
- Circuit switching
- Buffered flow control
- Packet-Buffer Flow Control SAF vs. Cut Through
- Flit-Buffer Flow Control Wormhole and Virtual
Channel - Switch-to-switch (link level) flow control
- Credit-based, On/Off, Ack/Nack
110Part 7
- Router Architecture
- Virtual-channel Router
- Virtual channel state fields
- The Router Pipeline
- Pipeline Stalls
111Router Microarchitecture -Virtual-channel Router
- Modern routers are pipelined and work at the flit
level - Head flits proceed through buffer stages that
perform routing and virtual channel allocation - All flits pass through switch allocation and
switch traversal stages - Most routers use credits to allocate buffer space
112Typical Virtual Channel Router
- A routers functional blocks can be divided into
- Datapath handles storage and movement of a
packets payload - Input buffers
- Switch
- Output buffers
- Control coordinating the movements of the
packets through the resources of the datapath - Route Computation
- VC Allocator
- Switch Allocator
113Typical Virtual Channel Router
- The input unit contains a set of flit buffers
- Maintains the state for each virtual channel
- G Global State
- R Route
- O Output VC
- P Pointers
- C Credits
114Virtual Channel State Fields(Input)
115Typical Virtual Channel Router
- During route computation the output port for the
packet is determined - Then the packet requests an output virtual
channel from the virtual-channel allocator
116Typical Virtual Channel Router
- Flits are forwarded via the virtual channel by
allocating a time slot on the switch and output
channel using the switch allocator - Flits are forwarded to the appropriate output
during this time slot - The output unit forwards the flits to the next
router in the packets path
117Virtual Channel State Fields(Output)
118Packet Rate and Flit Rate
- The control of the router operates at two
distinct frequencies - Packet Rate (performed once per packet)
- Route computation
- Virtual-channel allocation
- Flit Rate (performed once per flit)
- Switch allocation
- Pointer and credit count update
119The Router Pipeline
- A typical router pipeline includes the following
stages - RC (Routing Computation)
- VC (Virtual Channel Allocation)
- SA (Switch Allocation)
- ST (Switch Traversal
no pipeline stalls
120The Router Pipeline
- Cycle 0
- Head flit arrives and the packet is directed to
an virtual channel of the input port (G I)
no pipeline stalls
121The Router Pipeline
- Cycle 1
- Routing computation
- Virtual channel state changes to routing (G R)
- Head flit enters RC-stage
- First body flit arrives at router
no pipeline stalls
122The Router Pipeline
- Cycle 2 Virtual Channel Allocation
- Route field (R) of virtual channel is updated
- Virtual channel state is set to waiting for
output virtual channel (G V) - Head flit enters VA state
- First body flit enters RC stage
- Second body flit arrives at router
no pipeline stalls
123The Router Pipeline
- Cycle 2 Virtual Channel Allocation
- The result of the routing computation is input to
the virtual channel allocator - If successful, the allocator assigns a single
output virtual channel - The state of the virtual channel is set to active
(G A
no pipeline stalls
124The Router Pipeline
- Cycle 3 Switch Allocation
- All further processing is done on a flit base
- Head flit enters SA stage
- Any active VA (G A) that contains buffered
flits (indicated by P) and has downstream buffers
available (C gt 0) bids for a single-flit time
slot through the switch from its input VC to the
output VC
no pipeline stalls
125The Router Pipeline
- Cycle 3 Switch Allocation
- If successful, pointer field is updated
- Credit field is decremented
no pipeline stalls
126The Router Pipeline
- Cycle 4 Switch Traversal
- Head flit traverses the switch
- Cycle 5
- Head flit starts traversing the channel to the
next router
no pipeline stalls
127The Router Pipeline
- Cycle 7
- Tail traverses the switch
- Output VC set to idle
- Input VC set to idle (G I), if buffer is empty
- Input VC set to routing (G R), if another head
flit is in the buffer
no pipeline stalls
128The Router Pipeline
- Only the head flits enter the RC and VC stages
- The body and tail flits are stored in the flit
buffers until they can enter the SA stage
no pipeline stalls
129Pipeline Stalls
- Pipeline stalls can be divided into
- Packet stalls
- can occur if the virtual channel cannot advance
to its R, V, or A state - Flit stalls
- If a virtual channel is in active state and the
flit cannot successfully complete switch
allocation due to - Lack of flit
- Lack of credit
- Losing arbitration for the switch time slot
130Example for Packet Stall
- Virtual-channel allocation stall
- Head flit of A can first enter the VA stage when
the tail flit of packet B completes switch
allocation and releases the virtual channel
131Example for Packet Stall
- Virtual-channel allocation stall
Head flit of A can first enter the VA stage when
the tail flit of packet B completes switch
allocation and releases the virtual channel
132Example for Flit Stalls
Switch allocation stall
Second body flit fails to allocate the requested
connection in cycle 5
133Example for Flit Stalls
Buffer empty stall
Body flit 2 is delayed three cycles. However,
since it does not have to enter the RC and VA
stage the output is only delayed one cycle!
134Credits
- A buffer is allocated in the SA stage on the
upstream (transmitting) node - To reuse the buffer, a credit is returned over a
reverse channel after the same flit departs the
SA stage of the downstream (receiving) node - When the credit reaches the input unit of the
upstream node the buffer is available can be
reused
135Credits
- The credit loop can be viewed by means of a token
that - Starting at the SA stage of the upstream node
- Traveling downwards with the flit
- Reaching the SA stage at the downstream node
- Returning upstream as a credit
136Credit Loop Latency
- The credit loop latency tcrt, expressed in flit
times, gives a lower bound on the number of flit
buffers needed on the upstream size for the
channel to operate with full bandwidth - tcrt in flit times is given by
137Credit Loop Latency
- If the number of buffers available per virtual
channel is F, the duty factor of the channel will
be - d min (1, F / tcrt)
- The duty factor will be 100 as long as there are
sufficient flit buffers to cover the round trip
latency
138Credit Stall
Virtual Channel Router with 4 flit buffers
139Flit and Credit Encoding
- Flits and credits are send over separated lines
with separate width - Flits and credits are transported via the same
line. This can be done by - Including credits into flits
- Multiplexing flits and credits at phit level
- Option (A) is considered more efficient. For a
more detailed discussion check Section 16.6 in
the Dally-book
140Summary
- NoC is a scalable platform for billion-transistor
chips - Several driving forces behind it
- Many open research questions
- May change the way we structure and model VLSI
systems
141References
- OASIS NoC Architecture Design in Verilog HDL,
Technical Report,TR-062010-OASIS, Adaptive
Systems Laboratory, the University of Aizu, June
2010. - OASIS NoC Project
- http//web-ext.u-aizu.ac.jp/benab/research/projec
ts/oasis/
142- Network-on-Chip
- Ben Abdallah, Abderazek
- The University of Aizu
- E-mail benab_at_u-aizu.ac.jp
KUST University, March 2011