Low Power Techniques in FIR Filters

1 / 42

About This Presentation

Title:

Low Power Techniques in FIR Filters

Description:

100 MHz clock speed. 10 bit coefficient. 2.5 V Power supply. 6 mm2 core size ... Given a N-tap filter with coefficient hi that satisfy the response in terms of ... –

Number of Views:524

Avg rating:3.0/5.0

Slides: 43

Provided by: Moh59

Category:

more less

Transcript and Presenter's Notes

Title: Low Power Techniques in FIR Filters

1
Low Power Techniques in FIR Filters

Mohsen Saneei
DSP Implementation Systems Course Seminar

Spring 83
2
Outline

Power Elements
Block diagram of an FIR filter
Number Representation techniques for low power
Reduced 2SC Representation
Mixed Number Representation
Bus coding
Gray Code addressing
Bus Invert Coding
Bus Bit Reordering
Parallel Processing and Pipelining

3

Outline (cont.)

Low power technique in FIR filters
Coefficient Scaling
Reduced Number of Multiplications in Linear Phase
Filters
Coefficient Optimization
Using Differential Coefficients
Multi-rate Architectures
Coefficient and Data Swapping in Booth
Multipliers
Selective Coefficient Negation
Coefficient Ordering
Adder input Bit Swapping
Coefficient Segmentation Algorithm
Data Block Processing
Transposed Direct form Implementation
Use of Multiple Multiplier (2SC or SM)

4
1) Power Elements

Sources of power dissipation in CMO circuits
Switching power
Short-circuit power
Leakage power
Switching power (Dynamic Power)
Pdynamic aT . Cswitch . V2 . fclk

5
2) Block diagram of an FIR filter
6
2) Block diagram of an FIR filter (cont.)
AU for the conventional filter Using 2SC data
and coefficient
AU for the conventional filter Using SM data and
coefficient
7
3-1) Reduced 2SC Representation 1

XxN-1x2x1x0

300000011 3-411111111
X xm-1xm-2.x2x1x0
1 1 . . . . 1 1
correction vector -----------------------------
-------------------- Xxm-1xm-1 . . .
xm-1xm-1xm-2.x2x1x0
-31111110100000001
111111
-------------------
8
3-1) Low Power Filter With Dynamic Reduce
Representation
9
3-1) Experimental Results

0.25 µm CMOS
160 taps (8 taps per hybrid section
100 MHz clock speed
10 bit coefficient
2.5 V Power supply
6 mm2 core size
Power dissipation
200 mW in dynamic reduced Representation mode
295 mW in fixed word-length reduced
Representation mode
Power saving 32

10
3-1) Another examples

Booth-Encoding Multiplier

Mult. Size 2SC Time(ns) Power(mW) 2SC Time(ns) Power(mW) Reduced Rep. Time(ns) Power (mW) Reduced Rep. Time(ns) Power (mW) Power saving
8x8 20.15 2.2 20.48 1.91 13
16x16 38.2 17.2 37.55 15 13

Transposed Form Feed-Forward Equalization Filter
2SC 105.6 mW
Reduced Rep 78.8 mW
Power saving 25

11
3-2) Mixed Number Representation 2

Multiplier Booth encoding
Multiplicand SM
Expected Switching Activity(ESA)
Negation of a 2SC number Complement all bits
and then adding 1
Negation of a SM number Complement Sign-bit
So ESA in SM number is lower of 2SC

12
3-2) Average Probability of ESA per bit
Operand Length 8 bits 16 bits 32 bits 64 bits
2SC 0.4063 0.3906 0.3828 0.3789
SM 0.0508 0.0244 0.0120 0.0059
Reduction () 87.5 93.8 96.9 98.4
13
3-2) The Algorithm

Convert the multiplicand from 2SC into the SM
representation .
Apply the radix-4 Booths algorithm to Multiplier
and generate all the PPs representation in SM
notation.
Convert all the partial products from SM into RB
representation
Sum up all the PPs through a RB adder tree.
Convert the final result from RB into 2SC
notation

14
3-2) Multiplier Block Diagram
15
4-1) Gray Code addressing 3

For Gray Code , Hamming distance in sequential
number is 1.
During the FIR filter computation, both the
coefficient and the data are accessed
sequentially.
So gray code is approach for address bus
encoding.

16
4-2) Bus Invert Coding Encoder Decoder 4
17
4-3) Bus Bit Reordering 3
Reduction in the number of adjacent signal
transitions in opposite direction as a function
of the bus-reordering span
18
5) Parallel Processing and Pipelining 5
Architecture Voltage Area (normalized) Power (normalized)
Simple 5 V 1 1
Parallel 2.9 V 3.4 0.36
Pipelined 2.9 V 1.3 0.39
Pipelined-parallel 2 V 3.7 0.2
19
6-1) Coefficient Scaling 3

Scale coefficient of the filter
An optimal scaling factor K can be found such
that the total Hamming distance between
consecutive coefficient value is minimized.

20
6-2) Reduced Number of Multiplications in Linear
Phase Filters 3

The coefficient symmetry of linear phase FIR
filters can be used to reduced by half the number
of multiplication per output.

N multiplication reduced to N/2 multiplication
21
6-3) Coefficient Optimization 3

Given a N-tap filter with coefficient hi that
satisfy the response in terms of pass-band
ripple, stop-band attenuation.
Find a new set of coefficient ki.hi such that the
total hamming distance between successive
coefficient is minimized while still satisfying
the desired filter characteristics.

22
Coefficient Optimization an optimization
algorithm
23
Hamming distance and adjacent signal toggles
after coefficient scaling and optimization
N initial initial Nonlinear phase Nonlinear phase Nonlinear phase Nonlinear phase Linear phase Linear phase Linear phase Linear phase
N initial initial Coeff. Opt. Coeff. Opt. red red Coeff. Opt. Coeff. Opt. red red
N HD Ts HD Ts HD Ts HD Ts HD Ts
24 180 50 118 12 34 76 118 14 34 72
28 214 44 138 6 36 86 140 8 35 82
29 220 16 156 12 29 25 154 10 30 37
34 258 36 168 14 35 61 178 16 31 56
41 292 44 258 25 12 43 264 28 10 36
50 372 58 298 19 20 67 302 20 19 66
24
6-4) Using Differential Coefficients 6

Yn-2 h0xn-2 h1xn-3 h2xn-4 h3xn-5
Yn-1 h0xn-1 h1xn-2 h2xn-3 h3xn-4
Yn h0xn h1xn-1 h2xn-2 h3xn-3
h1xn-1 h0xn-1 (h1-h0)xn-1
h3xn-3 h2xn-3 (h3-h2)xn-3
h2xn-2 h1xn-2 (h2-h1)xn-2
h1xn-2 h0xn-2 (h1-h0)xn-2

25
6-4) Using Differential Coefficients (cont.)
26
6-5) Multi-rate Architectures 3
X(z)Xe(z) z-1Xo(z) Y(z)Ye(z)
z-1Yo(z) H(z)He(z) z-1Ho(z)

Results
A N-tap direct form architecture requires
N multiplication and (N-1) addition per output
But, A N-tap multi-rate architecture requires
3N/4 multiplication and (3N2)/4 addition per
output
30 50 power saving

27
6-6) Coefficient and Data Swapping in Booth
Multipliers 3

Power dissipation in a Booth multiplier depends
on the number of 1s in the Booth encoded
input.
So, coefficient and data inputs to the multiplier
can be appropriately swapped so as to reduced
power dissipation in the multiplier.

28
6-7) Selective Coefficient Negation 3

For each coefficient hi, either hi or hi stored
in the coefficient memory.
Adder replaced with an adder/substructure.
Result
reduces the number of 1 in the coefficient input
Reduces Hamming distance between consecutive
coefficient

29
6-8) Coefficient Ordering 3

Summation operation is commutative and
associative
So Yn h0xn h1xn-1 h2xn-2 h3xn-3
h1xn-1 h3xn-3 h0xn h2xn-2
We can exchange the order of coefficient and data
in memory to achieve minimum hamming distance.

30
Hamming distance and adjacent signal toggles
after coefficient selective negation, scaling and
Ordering
N H.D. initial Opt. scale factor H.D. Opt. red Togs initial Togs Opt. red
16 102 0.9761 34 67 8 1 88
24 158 0.7087 44 72 20 3 85
32 204 0.7685 58 72 22 3 86
36 242 0.9263 62 74 28 9 68
40 280 0.7321 66 76 32 5 84
48 350 0.7000 76 78 50 4 92
64 452 0.8217 80 82 54 6 89
72 510 0.7580 88 83 52 9 83
96 700 0.7182 106 85 64 6 91
128 952 0.7764 108 89 84 5 94
31
6-9) Adder input Bit Swapping 3
bits Hamming distance Hamming distance Hamming distance Adjacent signal toggles Adjacent signal toggles Adjacent signal toggles
bits Initial Final red Initial Final red
8 7953 5937 25.3 1836 1090 40.6
12 11979 8925 25.5 2766 1791 35.2
16 15945 11865 25.6 3545 2170 38.8
32
6-10) Coefficient Segmentation Algorithm 7

Coefficient set h0,h1,h2,h3,,hN-1
For a given coefficient hk, the algorithm targets
dividing it such that hk sk mk, where
sk is the largest power of 2 smaller than hk .
mk hk-sk is a positive number.
hk . xk sk . xk mk . Xk
shift multiply

33
Coefficient Segmentation Algorithm (cont.)
Multiplier size Algorithm Swcap/mult (pf) Reduction ()
8-bit Conventional 14.88 62.56
8-bit New 5.57 62.56
16-bit Conventional 113.00 54.41
16-bit New 51.52 54.41
24-bit Conventional 413.81 37.15
24-bit New 260.08 37.15
34
6-11) Data Block Processing 8

Yn-1 h0xn-1 h1xn-2 h2xn-3 h3xn-4
Yn h0xn h1xn-1 h2xn-2 h3xn-3

35
Data Block Processing
algorithm Power (mw) Power (mw) Area (mm2) Area (mm2)
algorithm 2SC SM 2SC SM
Conventional 7.61 5.49 0.71 0.74
Block processing 5.22 3.85 0.73 0.73
36
6-12) Transposed Direct form implementation (TDF)
3, 9

In DF for each multiplication both input of the
multiplier receive new data.
In TDF the data input of the multiplier remains
unchanged for a substantial number of
multiplication operation, corresponding to the
filter length
So reduced SA in data bus and data input of
multiplier

Direct Form
Transposed Direct Form
37
6-13) Use of Multiple Multiplier (2SC or
SM)9,10
38
Use of Multiple Multiplier (2SC or SM)
2SC and DF
SM representation and TDF
39
Use of Multiple Multiplier (2SC or SM)
40
Use of Multiple Multiplier (2SC or SM)
Result of a BPF with 64-tap (2SC)
mult DF/norm DF/norm DF/min DF/min TDF/norm TDF/norm TDF/min TDF/min
mult Swcap red. Swcap red. Swcap red. Swcap red.
1 6898 ---- 5938 13.9 4513 34.6 2298 66.7
2 6906 ---- 5934 14.1 4542 34.2 2319 66.4
4 6884 ---- 5953 13.5 4644 32.5 2475 64.1
8 6922 ---- 6018 13.1 4878 29.5 2788 59.7

DF Direct Form
TDF Transpose Direct Form
Norm normal
Min minimum Hamming distance

41
References

Zhan Yu, Meng-Lin Yu, Kamran Azadet and Alen N.
Willson Jr the use of reduced two's complement
representation in low power DSP design , IEEE
2002
M. Zheng and A. Albicki Low power and high
speed multiplication design through mixed number
representation , IEEE 1995
M. Mehendale , S. D. Sherlekar and G. Venkatesh
Low-Power Realization of FIR Filters on
Programmable DSPs , IEEE Transaction on very
large scale integration (VLSI) system, Vol. 6 ,
NO. 4, December 1998
M. R. Stan, W. P. Burleson Bus-Invert Coding
for Low Power I/O , IEEE Transaction on very
large scale integration (VLSI) system, Vol. 3 ,
NO. 1, March 1995
A. P. Chandrakasan , R. W. Brodersen
Minimizing Power Consumption in Digital CMOS
Circuits , Proceeding of the IEEE, Vol. 83, NO.
4 , April 1995

42
References (cont.)

N. Sankarayya, Kaushik Roy, and Debashis
Bhattacharya Algorithms for Low Power and High
Speed FIR Filter Realization Using Differential
Coefficients , IEEE TRANSACTIONS ON CIRCUITS AND
SYSTEMSII ANALOG AND DIGITAL SIGNAL PROCESSING,
VOL. 44, NO. 6, JUNE 1997
A. T. Erdogan and T. Arslan A Coefficient
Segmentation Algorithm for Low Power
Implementation of FIR filters IEEE 1999
A.T. Erdogan and T. Arslan LOW POWER BLOCK
BASED FIR FILTERING CORES, ISCAS-2003
A.T. Erdogan and T. Arslan high throughput FIR
filter design for low power SoC applications,
IEEE 2000
A.T. Erdogan and T. Arslan low power
implementation of high throughput FIR filter,
IEEE 2002