Chapter 4 Retiming - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Chapter 4 Retiming

Description:

Retiming is a mapping from a given DFG, G to a retimed DFT, Gr such that the ... DFG to all edges of opposing edges across the same cut set will not alter the ... – PowerPoint PPT presentation

Number of Views:335
Avg rating:3.0/5.0
Slides: 25
Provided by: YuHe8
Category:

less

Transcript and Presenter's Notes

Title: Chapter 4 Retiming


1
Chapter 4 Retiming
2
Definitions
  • Retiming
  • Retiming is a mapping from a given DFG, G to a
    retimed DFT, Gr such that the corresponding
    transfer function of G and Gr differ by a pure
    delay z-L.
  • Purposes
  • To facilitate pipelining to reduce clock cycle
    time
  • To reduce number of registers needed.

3
Cut-set Retiming
  • Feed-forward cut-set
  • Feed-back cut-set
  • Delay transfer theorem
  • Adding arbitrary non-negative number of delays to
    each edge of a feed-forward cut-set of a DFG will
    not alter its output, except the output timing
    will be delayed.
  • Transfer the same amount of delays from edges of
    the same direction across a feed-back cut set of
    a DFG to all edges of opposing edges across the
    same cut set will not alter the output, but its
    timing.

4
Feed-forward Cut-Set Retiming
  • Consider the FIR digital filter and its DFG
  • y(n) b0x(n) b1x(n-1)
  • Critical path length TMTA
  • Select a cut set
  • Insert a delay each to each edge in the cut set.
  • Retiming
  • ynew(n) b0x(n-1) b1x(n-2)
  • ynew(n) y(n-1)
  • Critical path Max(TM, TA)

D
x(n)
x(n-1)
X
X
b1
b0
D
x(n)
x(n-1)

y(n)
X
X
b1
b0
D
D

y(n)
5
Feed-back Cut Set Retiming
  • Consider an IIR digital filter
  • y(n) ay(n-2) x(n)
  • loop bound (TMTA)/2
  • clock cycle TMTA
  • Shift 1 delay to the other edge across a
    feed-back cut set
  • Filter remains unchanged.
  • loop bound (TMTA)/2
  • clock cycle Max(TM ,TA)

x(n)
y(n)
x(n)
y(n)


2D
D
D
a
a
?
?
6
Timing Diagram
  • Assume tM tA 1 t.u.
  • Before retiming
  • After retiming

x(1)
x(2)
x(3)
x(4)
1
2
3
4
MAC
y(1)
y(2)
y(3)
y(4)
x(1)
x(2)
x(3)
x(4)
x(5)
x(6)
x(7)
x(7)
1
2
3
4
5
6
7
8
Add
y(1)
y(5)
y(6)
y(7)
y(7)
y(2)
y(3)
y(4)
a y(1)
1
2
3
4
5
6
7
8
Mul
0
7
Feed-back Cut Set Retiming
  • Consider an IIR digital filter
  • y(n) ay(n-1) x(n)
  • loop bound (TMTA)
  • throughput 1/(TMTA)
  • x(2k-1)x(k)
  • x(2k) 0
  • Clock period (TMTA)
  • Throughput 1/2(TMTA)

x(n)
y(n)

x(m)
y(m)

D
2D
a
?
a
?
8
Slowdown Retiming
  • Start with
  • y(n) a y(n-1) x(n)
  • clock cycle Max(TM ,TA)
  • Throughput 1/2max(TM,TA)
  • Start with
  • y(n) a y(n-2) x(n)
  • loop bound (TMTA)/2
  • clock cycle Max(TM ,TA)
  • throughput 1/ Max(TM ,TA)

x(n)
y(n)
x(m)
y(m)


D
D
D
D
a
a
?
?
9
Example 3.2.1
D
a4
a2
a6
a1
D
  • Node delay 1 t.u.
  • Before retiming
  • Critical path a3 ? a4 ? a5 ? a6
  • Clock cycle time 4
  • 2 delay units
  • After cut-set retiming
  • Critical path a3 ? a5, a4 ? a6
  • Clock cycle time 2
  • 6 delay units
  • After additional retiming
  • Critical path none
  • Clock cycle time 1
  • 11 delay units

a5
a3
2D
a4
a2
D
D
a6
2D
a1
D
D
D
2D
a3
a5
10
Slow Down for Cut-Set Retiming
11
Node Retiming
  • Transfer delay through a node in DFG
  • r(v) of delays transferred from out-going
    edges to incoming edges of node v w(e) of
    delays on edge e
  • wr(e) of delays on edge e after retiming
  • Retiming equation
  • subject to wr(e) ? 0.
  • Let p be a path from v0 to vk
  • then

e
v
u
D
3D
2D
r(v) 2
v
v
2D
3D
D
p
12
Invariant Properties
  • Retiming does NOT change the total number of
    delays for each cycle.
  • Retiming does not change loop bound or iteration
    bound of the DFG
  • If the retiming values of every node v in a DFG G
    are added to a constant integer j, the retimed
    graph Gr will not be affected. That is, the
    weights ( of delays) of the retimed graph will
    remain the same.

13
Node Retiming Examples
r(2) 1
14
DFG Illustration of the Example
T? max. (121)/2, (121)/3 2 Cr. Path
Delay max2,2,11 2 t.u
T? max. (121)/2, (121)/3 2 Cr. Path
delay 21 3 t.u
15
Retiming for Minimizing Clock Period
  • Note that retiming will NOT alter iteration bound
    T?.
  • Iteration bound is the theoretical minimum clock
    period to execute the algorithm.
  • Let edge e connect node u to node v. If the node
    computing time t(u) t(v) gt T?, then clock
    period T gt T?. For such an edge, we require that
  • To generalize, for any path from v0 to vk, we
    have
  • In other words, for any possible critical path in
    the DFG that is larger than T?, we require wr(e)
    ? 1.

16
Retiming Example Revisited
  • wr(e21) ? 0, since t(2)t(1) 2 T?.
  • wr(e13) ? 1, since t(1)t(3) 3 gt T?.
  • wr(e14) ? 1, since t(1)t(4) 3 gt T?.
  • wr(e32) ? 1, since t(3)t(2) 3 gt T?.
  • wr(e42) ? 1, since t(4)t(2) 3 gt T?.
  • Use eq. wr(euv) w(e) r(v) r(u),
  • w(e21) r(1) r(2) 1 r(1) r(2) ? 0
  • w(e13) r(3) r(1) 1 r(3) r(1) ? 1
  • w(e14) r(4) r(1) 2 r(4) r(1) ? 1
  • w(e32) r(2) r(3) 0 r(2) r(3) ? 1
  • w(e42) r(2) r(4) 0 r(2) r(4) ? 1

17
Solution continues
  • Since the retimed graph Gr remain the same if all
    node retiming values are added by the same
    constant. We thus can set r(1) 0.
  • The inequalities become
  • 1 r(2) ? 0 or r(2) ? 1
  • 1 r(3) ? 1 or r(3) ? 0
  • 2 r(4) ? 1 or r(4) ? 1
  • r(2) r(3) ? 1 or r(3)? r(2) - 1
  • r(2) r(4) ? 1 or r(2) ? r(4) 1
  • Since
  • one must have r(2) 1.
  • This implies r(3) ? 0. But we also have r(3) ? 0.
    Hence r(3)0.
  • These leave 1 ? r(4) ? 0.
  • Hence the two sets of solutions are
  • r(0) r(3) 0, r(2) 1, and r(4) 0 or -1.

18
Systematic Solutions
  • Given a systems of inequalities
  • r(i) r(j) ? k 1 ? i,j ? N
  • Construct a constraint graph
  • Map each r(i) to node i. Add a node N1.
  • For each inequality
  • r(i) r(j) ? k,
  • draw an edge eji
  • such that w(eji) k.
  • Draw N edges eN1,i 0.
  • The system of inequalities has a solution if and
    only if the constraint graph contains no negative
    cycles
  • If a solution exists, one solution is where ri is
    the minimum length path from the node N1 to the
    node i.
  • Shortest path algorithms (Applendix A)
  • Bellman-Ford algorithm
  • Floyd-Warshall algorithm

19
Bellman-Ford Algorithm
  • Find shortest path from an arbitrarily chosen
    origin node U to each node in a directed graphif
    no negative cycle exists.
  • Given a direct graph
  • w(m,n) weight on edge from node m to node n,
    ? if there is no edge from m to n
  • r(i,j) the shortest path from node U to node i
    within j-1 steps.
  • r(i,1) w(U,i),
  • r(i,j1) min r(k,j) w(k,i),
  • j 1, 2, , N-1
  • if max(r(,n-1)-r(,n))gt0, then there is a
    negative cycle. Else, r(i,n-1) gives shortest
    cycle length from i to U.

-3
2
1
1
1
1
2
3
4
  • Note that 1 gt 0, hence there is at least one
    negative cycle.

spbf.m
20
Floyd-Warshall Algorithm
-3
2
1
  • Find shortest path between all possible pairs of
    nodes in the graph provided no negative cycle
    exists.
  • Algorithm
  • Initialization R(1) W
  • For k1 to N
  • R(k1)(u,v) minR(k)(u,) R(k)(,v)
  • If R(k)(u,u) lt 0 for any k, u, then a negative
    cycle exist. Else, R(N1)(u,v) is SP from u to v

1
2
1
2
3
4
21
Retiming Example
  • For retiming example
  • r(2) r(1) ? 1
  • r(1) r(3) ? 0
  • r(1) r(4) ? 1
  • r(3) r(2) ? 1
  • r(4) r(2) ? 1
  • Bellman-Ford Algorithm for Shortest Path

-1
0
1
2
1
3
1
-1
4
0
0
0
0
5
22
Retiming Example
  • Floyd-Warshall algorithm

23
Retiming to Reduce Registers
  • Register Sharing
  • When a node has multiple fan-out with different
    number of delays, the registers can be shared so
    that only the branch with max. of delays will
    be needed.
  • Register reduction through node delay transfer
    from multiple input edges to output edges (e.g.
    r(v) gt 0)
  • Should be done only when clock cycle constraint
    (if any) is not violated.

24
Time Scaling (Slow Down)
y(3) y(2) y(1)
x(3) x(2) x(1)
  • Transform each delay element (register) D to ND
    and reduce the sample frequency by N fold will
    slow down the computation N times.
  • During slow down, the processor clock cycle time
    remains unchanged. Only the sampling cycle time
    increased.
  • Provides opportunity for retiming, and
    interleaving.


D
?
y(3) -- y(2) -- y(1)
-- x(3) -- x(2) -- x(1)

2D
?
Write a Comment
User Comments (0)
About PowerShow.com