Title: Low Power Techniques in FIR Filters
1Low Power Techniques in FIR Filters
- Mohsen Saneei
- DSP Implementation Systems Course Seminar
Spring 83
2Outline
- Power Elements
- Block diagram of an FIR filter
- Number Representation techniques for low power
- Reduced 2SC Representation
- Mixed Number Representation
- Bus coding
- Gray Code addressing
- Bus Invert Coding
- Bus Bit Reordering
- Parallel Processing and Pipelining
3 Outline (cont.)
- Low power technique in FIR filters
- Coefficient Scaling
- Reduced Number of Multiplications in Linear Phase
Filters - Coefficient Optimization
- Using Differential Coefficients
- Multi-rate Architectures
- Coefficient and Data Swapping in Booth
Multipliers - Selective Coefficient Negation
- Coefficient Ordering
- Adder input Bit Swapping
- Coefficient Segmentation Algorithm
- Data Block Processing
- Transposed Direct form Implementation
- Use of Multiple Multiplier (2SC or SM)
41) Power Elements
- Sources of power dissipation in CMO circuits
- Switching power
- Short-circuit power
- Leakage power
- Switching power (Dynamic Power)
- Pdynamic aT . Cswitch . V2 . fclk
52) Block diagram of an FIR filter
62) Block diagram of an FIR filter (cont.)
AU for the conventional filter Using 2SC data
and coefficient
AU for the conventional filter Using SM data and
coefficient
73-1) Reduced 2SC Representation 1
300000011 3-411111111
X xm-1xm-2.x2x1x0
1 1 . . . . 1 1
correction vector -----------------------------
-------------------- Xxm-1xm-1 . . .
xm-1xm-1xm-2.x2x1x0
-31111110100000001
111111
-------------------
83-1) Low Power Filter With Dynamic Reduce
Representation
93-1) Experimental Results
- 0.25 µm CMOS
- 160 taps (8 taps per hybrid section
- 100 MHz clock speed
- 10 bit coefficient
- 2.5 V Power supply
- 6 mm2 core size
- Power dissipation
- 200 mW in dynamic reduced Representation mode
- 295 mW in fixed word-length reduced
Representation mode - Power saving 32
103-1) Another examples
- Booth-Encoding Multiplier
Mult. Size 2SC Time(ns) Power(mW) 2SC Time(ns) Power(mW) Reduced Rep. Time(ns) Power (mW) Reduced Rep. Time(ns) Power (mW) Power saving
8x8 20.15 2.2 20.48 1.91 13
16x16 38.2 17.2 37.55 15 13
- Transposed Form Feed-Forward Equalization Filter
- 2SC 105.6 mW
- Reduced Rep 78.8 mW
- Power saving 25
113-2) Mixed Number Representation 2
- Multiplier Booth encoding
- Multiplicand SM
- Expected Switching Activity(ESA)
- Negation of a 2SC number Complement all bits
and then adding 1 - Negation of a SM number Complement Sign-bit
- So ESA in SM number is lower of 2SC
123-2) Average Probability of ESA per bit
Operand Length 8 bits 16 bits 32 bits 64 bits
2SC 0.4063 0.3906 0.3828 0.3789
SM 0.0508 0.0244 0.0120 0.0059
Reduction () 87.5 93.8 96.9 98.4
133-2) The Algorithm
- Convert the multiplicand from 2SC into the SM
representation . - Apply the radix-4 Booths algorithm to Multiplier
and generate all the PPs representation in SM
notation. - Convert all the partial products from SM into RB
representation - Sum up all the PPs through a RB adder tree.
- Convert the final result from RB into 2SC
notation
143-2) Multiplier Block Diagram
154-1) Gray Code addressing 3
- For Gray Code , Hamming distance in sequential
number is 1. - During the FIR filter computation, both the
coefficient and the data are accessed
sequentially. - So gray code is approach for address bus
encoding.
164-2) Bus Invert Coding Encoder Decoder 4
174-3) Bus Bit Reordering 3
Reduction in the number of adjacent signal
transitions in opposite direction as a function
of the bus-reordering span
185) Parallel Processing and Pipelining 5
Architecture Voltage Area (normalized) Power (normalized)
Simple 5 V 1 1
Parallel 2.9 V 3.4 0.36
Pipelined 2.9 V 1.3 0.39
Pipelined-parallel 2 V 3.7 0.2
196-1) Coefficient Scaling 3
- Scale coefficient of the filter
- An optimal scaling factor K can be found such
that the total Hamming distance between
consecutive coefficient value is minimized.
206-2) Reduced Number of Multiplications in Linear
Phase Filters 3
- The coefficient symmetry of linear phase FIR
filters can be used to reduced by half the number
of multiplication per output.
N multiplication reduced to N/2 multiplication
216-3) Coefficient Optimization 3
- Given a N-tap filter with coefficient hi that
satisfy the response in terms of pass-band
ripple, stop-band attenuation. - Find a new set of coefficient ki.hi such that the
total hamming distance between successive
coefficient is minimized while still satisfying
the desired filter characteristics.
22Coefficient Optimization an optimization
algorithm
23Hamming distance and adjacent signal toggles
after coefficient scaling and optimization
N initial initial Nonlinear phase Nonlinear phase Nonlinear phase Nonlinear phase Linear phase Linear phase Linear phase Linear phase
N initial initial Coeff. Opt. Coeff. Opt. red red Coeff. Opt. Coeff. Opt. red red
N HD Ts HD Ts HD Ts HD Ts HD Ts
24 180 50 118 12 34 76 118 14 34 72
28 214 44 138 6 36 86 140 8 35 82
29 220 16 156 12 29 25 154 10 30 37
34 258 36 168 14 35 61 178 16 31 56
41 292 44 258 25 12 43 264 28 10 36
50 372 58 298 19 20 67 302 20 19 66
246-4) Using Differential Coefficients 6
- Yn-2 h0xn-2 h1xn-3 h2xn-4 h3xn-5
- Yn-1 h0xn-1 h1xn-2 h2xn-3 h3xn-4
- Yn h0xn h1xn-1 h2xn-2 h3xn-3
- h1xn-1 h0xn-1 (h1-h0)xn-1
h3xn-3 h2xn-3 (h3-h2)xn-3 -
h2xn-2 h1xn-2 (h2-h1)xn-2 -
h1xn-2 h0xn-2 (h1-h0)xn-2
256-4) Using Differential Coefficients (cont.)
266-5) Multi-rate Architectures 3
X(z)Xe(z) z-1Xo(z) Y(z)Ye(z)
z-1Yo(z) H(z)He(z) z-1Ho(z)
- Results
- A N-tap direct form architecture requires
- N multiplication and (N-1) addition per output
- But, A N-tap multi-rate architecture requires
- 3N/4 multiplication and (3N2)/4 addition per
output - 30 50 power saving
276-6) Coefficient and Data Swapping in Booth
Multipliers 3
- Power dissipation in a Booth multiplier depends
on the number of 1s in the Booth encoded
input. - So, coefficient and data inputs to the multiplier
can be appropriately swapped so as to reduced
power dissipation in the multiplier.
286-7) Selective Coefficient Negation 3
- For each coefficient hi, either hi or hi stored
in the coefficient memory. - Adder replaced with an adder/substructure.
- Result
- reduces the number of 1 in the coefficient input
- Reduces Hamming distance between consecutive
coefficient
296-8) Coefficient Ordering 3
- Summation operation is commutative and
associative - So Yn h0xn h1xn-1 h2xn-2 h3xn-3
- h1xn-1 h3xn-3 h0xn h2xn-2
- We can exchange the order of coefficient and data
in memory to achieve minimum hamming distance.
30Hamming distance and adjacent signal toggles
after coefficient selective negation, scaling and
Ordering
N H.D. initial Opt. scale factor H.D. Opt. red Togs initial Togs Opt. red
16 102 0.9761 34 67 8 1 88
24 158 0.7087 44 72 20 3 85
32 204 0.7685 58 72 22 3 86
36 242 0.9263 62 74 28 9 68
40 280 0.7321 66 76 32 5 84
48 350 0.7000 76 78 50 4 92
64 452 0.8217 80 82 54 6 89
72 510 0.7580 88 83 52 9 83
96 700 0.7182 106 85 64 6 91
128 952 0.7764 108 89 84 5 94
316-9) Adder input Bit Swapping 3
bits Hamming distance Hamming distance Hamming distance Adjacent signal toggles Adjacent signal toggles Adjacent signal toggles
bits Initial Final red Initial Final red
8 7953 5937 25.3 1836 1090 40.6
12 11979 8925 25.5 2766 1791 35.2
16 15945 11865 25.6 3545 2170 38.8
326-10) Coefficient Segmentation Algorithm 7
- Coefficient set h0,h1,h2,h3,,hN-1
- For a given coefficient hk, the algorithm targets
dividing it such that hk sk mk, where - sk is the largest power of 2 smaller than hk .
- mk hk-sk is a positive number.
- hk . xk sk . xk mk . Xk
- shift multiply
33Coefficient Segmentation Algorithm (cont.)
Multiplier size Algorithm Swcap/mult (pf) Reduction ()
8-bit Conventional 14.88 62.56
8-bit New 5.57 62.56
16-bit Conventional 113.00 54.41
16-bit New 51.52 54.41
24-bit Conventional 413.81 37.15
24-bit New 260.08 37.15
346-11) Data Block Processing 8
- Yn-1 h0xn-1 h1xn-2 h2xn-3 h3xn-4
- Yn h0xn h1xn-1 h2xn-2 h3xn-3
35Data Block Processing
algorithm Power (mw) Power (mw) Area (mm2) Area (mm2)
algorithm 2SC SM 2SC SM
Conventional 7.61 5.49 0.71 0.74
Block processing 5.22 3.85 0.73 0.73
366-12) Transposed Direct form implementation (TDF)
3, 9
- In DF for each multiplication both input of the
multiplier receive new data. - In TDF the data input of the multiplier remains
unchanged for a substantial number of
multiplication operation, corresponding to the
filter length - So reduced SA in data bus and data input of
multiplier
Direct Form
Transposed Direct Form
376-13) Use of Multiple Multiplier (2SC or
SM)9,10
38Use of Multiple Multiplier (2SC or SM)
2SC and DF
SM representation and TDF
39Use of Multiple Multiplier (2SC or SM)
40Use of Multiple Multiplier (2SC or SM)
Result of a BPF with 64-tap (2SC)
mult DF/norm DF/norm DF/min DF/min TDF/norm TDF/norm TDF/min TDF/min
mult Swcap red. Swcap red. Swcap red. Swcap red.
1 6898 ---- 5938 13.9 4513 34.6 2298 66.7
2 6906 ---- 5934 14.1 4542 34.2 2319 66.4
4 6884 ---- 5953 13.5 4644 32.5 2475 64.1
8 6922 ---- 6018 13.1 4878 29.5 2788 59.7
- DF Direct Form
- TDF Transpose Direct Form
- Norm normal
- Min minimum Hamming distance
41References
- Zhan Yu, Meng-Lin Yu, Kamran Azadet and Alen N.
Willson Jr the use of reduced two's complement
representation in low power DSP design , IEEE
2002 - M. Zheng and A. Albicki Low power and high
speed multiplication design through mixed number
representation , IEEE 1995 - M. Mehendale , S. D. Sherlekar and G. Venkatesh
Low-Power Realization of FIR Filters on
Programmable DSPs , IEEE Transaction on very
large scale integration (VLSI) system, Vol. 6 ,
NO. 4, December 1998 - M. R. Stan, W. P. Burleson Bus-Invert Coding
for Low Power I/O , IEEE Transaction on very
large scale integration (VLSI) system, Vol. 3 ,
NO. 1, March 1995 - A. P. Chandrakasan , R. W. Brodersen
Minimizing Power Consumption in Digital CMOS
Circuits , Proceeding of the IEEE, Vol. 83, NO.
4 , April 1995
42References (cont.)
- N. Sankarayya, Kaushik Roy, and Debashis
Bhattacharya Algorithms for Low Power and High
Speed FIR Filter Realization Using Differential
Coefficients , IEEE TRANSACTIONS ON CIRCUITS AND
SYSTEMSII ANALOG AND DIGITAL SIGNAL PROCESSING,
VOL. 44, NO. 6, JUNE 1997 - A. T. Erdogan and T. Arslan A Coefficient
Segmentation Algorithm for Low Power
Implementation of FIR filters IEEE 1999 - A.T. Erdogan and T. Arslan LOW POWER BLOCK
BASED FIR FILTERING CORES, ISCAS-2003 - A.T. Erdogan and T. Arslan high throughput FIR
filter design for low power SoC applications,
IEEE 2000 - A.T. Erdogan and T. Arslan low power
implementation of high throughput FIR filter,
IEEE 2002