Low Power Techniques in FIR Filters

1 / 42
About This Presentation
Title:

Low Power Techniques in FIR Filters

Description:

100 MHz clock speed. 10 bit coefficient. 2.5 V Power supply. 6 mm2 core size ... Given a N-tap filter with coefficient hi that satisfy the response in terms of ... –

Number of Views:524
Avg rating:3.0/5.0
Slides: 43
Provided by: Moh59
Category:

less

Transcript and Presenter's Notes

Title: Low Power Techniques in FIR Filters


1
Low Power Techniques in FIR Filters
  • Mohsen Saneei
  • DSP Implementation Systems Course Seminar

Spring 83
2
Outline
  1. Power Elements
  2. Block diagram of an FIR filter
  3. Number Representation techniques for low power
  4. Reduced 2SC Representation
  5. Mixed Number Representation
  6. Bus coding
  7. Gray Code addressing
  8. Bus Invert Coding
  9. Bus Bit Reordering
  10. Parallel Processing and Pipelining

3

Outline (cont.)
  • Low power technique in FIR filters
  • Coefficient Scaling
  • Reduced Number of Multiplications in Linear Phase
    Filters
  • Coefficient Optimization
  • Using Differential Coefficients
  • Multi-rate Architectures
  • Coefficient and Data Swapping in Booth
    Multipliers
  • Selective Coefficient Negation
  • Coefficient Ordering
  • Adder input Bit Swapping
  • Coefficient Segmentation Algorithm
  • Data Block Processing
  • Transposed Direct form Implementation
  • Use of Multiple Multiplier (2SC or SM)

4
1) Power Elements
  • Sources of power dissipation in CMO circuits
  • Switching power
  • Short-circuit power
  • Leakage power
  • Switching power (Dynamic Power)
  • Pdynamic aT . Cswitch . V2 . fclk

5
2) Block diagram of an FIR filter
6
2) Block diagram of an FIR filter (cont.)
AU for the conventional filter Using 2SC data
and coefficient
AU for the conventional filter Using SM data and
coefficient
7
3-1) Reduced 2SC Representation 1
  • XxN-1x2x1x0

300000011 3-411111111
X xm-1xm-2.x2x1x0
1 1 . . . . 1 1
correction vector -----------------------------
-------------------- Xxm-1xm-1 . . .
xm-1xm-1xm-2.x2x1x0
-31111110100000001
111111
-------------------
8
3-1) Low Power Filter With Dynamic Reduce
Representation
9
3-1) Experimental Results
  • 0.25 µm CMOS
  • 160 taps (8 taps per hybrid section
  • 100 MHz clock speed
  • 10 bit coefficient
  • 2.5 V Power supply
  • 6 mm2 core size
  • Power dissipation
  • 200 mW in dynamic reduced Representation mode
  • 295 mW in fixed word-length reduced
    Representation mode
  • Power saving 32

10
3-1) Another examples
  • Booth-Encoding Multiplier

Mult. Size 2SC Time(ns) Power(mW) 2SC Time(ns) Power(mW) Reduced Rep. Time(ns) Power (mW) Reduced Rep. Time(ns) Power (mW) Power saving
8x8 20.15 2.2 20.48 1.91 13
16x16 38.2 17.2 37.55 15 13
  • Transposed Form Feed-Forward Equalization Filter
  • 2SC 105.6 mW
  • Reduced Rep 78.8 mW
  • Power saving 25

11
3-2) Mixed Number Representation 2
  • Multiplier Booth encoding
  • Multiplicand SM
  • Expected Switching Activity(ESA)
  • Negation of a 2SC number Complement all bits
    and then adding 1
  • Negation of a SM number Complement Sign-bit
  • So ESA in SM number is lower of 2SC

12
3-2) Average Probability of ESA per bit
Operand Length 8 bits 16 bits 32 bits 64 bits
2SC 0.4063 0.3906 0.3828 0.3789
SM 0.0508 0.0244 0.0120 0.0059
Reduction () 87.5 93.8 96.9 98.4
13
3-2) The Algorithm
  1. Convert the multiplicand from 2SC into the SM
    representation .
  2. Apply the radix-4 Booths algorithm to Multiplier
    and generate all the PPs representation in SM
    notation.
  3. Convert all the partial products from SM into RB
    representation
  4. Sum up all the PPs through a RB adder tree.
  5. Convert the final result from RB into 2SC
    notation

14
3-2) Multiplier Block Diagram
15
4-1) Gray Code addressing 3
  • For Gray Code , Hamming distance in sequential
    number is 1.
  • During the FIR filter computation, both the
    coefficient and the data are accessed
    sequentially.
  • So gray code is approach for address bus
    encoding.

16
4-2) Bus Invert Coding Encoder Decoder 4
17
4-3) Bus Bit Reordering 3
Reduction in the number of adjacent signal
transitions in opposite direction as a function
of the bus-reordering span
18
5) Parallel Processing and Pipelining 5
Architecture Voltage Area (normalized) Power (normalized)
Simple 5 V 1 1
Parallel 2.9 V 3.4 0.36
Pipelined 2.9 V 1.3 0.39
Pipelined-parallel 2 V 3.7 0.2
19
6-1) Coefficient Scaling 3
  • Scale coefficient of the filter
  • An optimal scaling factor K can be found such
    that the total Hamming distance between
    consecutive coefficient value is minimized.

20
6-2) Reduced Number of Multiplications in Linear
Phase Filters 3
  • The coefficient symmetry of linear phase FIR
    filters can be used to reduced by half the number
    of multiplication per output.

N multiplication reduced to N/2 multiplication
21
6-3) Coefficient Optimization 3
  • Given a N-tap filter with coefficient hi that
    satisfy the response in terms of pass-band
    ripple, stop-band attenuation.
  • Find a new set of coefficient ki.hi such that the
    total hamming distance between successive
    coefficient is minimized while still satisfying
    the desired filter characteristics.

22
Coefficient Optimization an optimization
algorithm
23
Hamming distance and adjacent signal toggles
after coefficient scaling and optimization
N initial initial Nonlinear phase Nonlinear phase Nonlinear phase Nonlinear phase Linear phase Linear phase Linear phase Linear phase
N initial initial Coeff. Opt. Coeff. Opt. red red Coeff. Opt. Coeff. Opt. red red
N HD Ts HD Ts HD Ts HD Ts HD Ts
24 180 50 118 12 34 76 118 14 34 72
28 214 44 138 6 36 86 140 8 35 82
29 220 16 156 12 29 25 154 10 30 37
34 258 36 168 14 35 61 178 16 31 56
41 292 44 258 25 12 43 264 28 10 36
50 372 58 298 19 20 67 302 20 19 66
24
6-4) Using Differential Coefficients 6
  • Yn-2 h0xn-2 h1xn-3 h2xn-4 h3xn-5
  • Yn-1 h0xn-1 h1xn-2 h2xn-3 h3xn-4
  • Yn h0xn h1xn-1 h2xn-2 h3xn-3
  • h1xn-1 h0xn-1 (h1-h0)xn-1
    h3xn-3 h2xn-3 (h3-h2)xn-3

  • h2xn-2 h1xn-2 (h2-h1)xn-2

  • h1xn-2 h0xn-2 (h1-h0)xn-2

25
6-4) Using Differential Coefficients (cont.)
26
6-5) Multi-rate Architectures 3
X(z)Xe(z) z-1Xo(z) Y(z)Ye(z)
z-1Yo(z) H(z)He(z) z-1Ho(z)
  • Results
  • A N-tap direct form architecture requires
  • N multiplication and (N-1) addition per output
  • But, A N-tap multi-rate architecture requires
  • 3N/4 multiplication and (3N2)/4 addition per
    output
  • 30 50 power saving

27
6-6) Coefficient and Data Swapping in Booth
Multipliers 3
  • Power dissipation in a Booth multiplier depends
    on the number of 1s in the Booth encoded
    input.
  • So, coefficient and data inputs to the multiplier
    can be appropriately swapped so as to reduced
    power dissipation in the multiplier.

28
6-7) Selective Coefficient Negation 3
  • For each coefficient hi, either hi or hi stored
    in the coefficient memory.
  • Adder replaced with an adder/substructure.
  • Result
  • reduces the number of 1 in the coefficient input
  • Reduces Hamming distance between consecutive
    coefficient

29
6-8) Coefficient Ordering 3
  • Summation operation is commutative and
    associative
  • So Yn h0xn h1xn-1 h2xn-2 h3xn-3
  • h1xn-1 h3xn-3 h0xn h2xn-2
  • We can exchange the order of coefficient and data
    in memory to achieve minimum hamming distance.

30
Hamming distance and adjacent signal toggles
after coefficient selective negation, scaling and
Ordering
N H.D. initial Opt. scale factor H.D. Opt. red Togs initial Togs Opt. red
16 102 0.9761 34 67 8 1 88
24 158 0.7087 44 72 20 3 85
32 204 0.7685 58 72 22 3 86
36 242 0.9263 62 74 28 9 68
40 280 0.7321 66 76 32 5 84
48 350 0.7000 76 78 50 4 92
64 452 0.8217 80 82 54 6 89
72 510 0.7580 88 83 52 9 83
96 700 0.7182 106 85 64 6 91
128 952 0.7764 108 89 84 5 94
31
6-9) Adder input Bit Swapping 3
bits Hamming distance Hamming distance Hamming distance Adjacent signal toggles Adjacent signal toggles Adjacent signal toggles
bits Initial Final red Initial Final red
8 7953 5937 25.3 1836 1090 40.6
12 11979 8925 25.5 2766 1791 35.2
16 15945 11865 25.6 3545 2170 38.8
32
6-10) Coefficient Segmentation Algorithm 7
  • Coefficient set h0,h1,h2,h3,,hN-1
  • For a given coefficient hk, the algorithm targets
    dividing it such that hk sk mk, where
  • sk is the largest power of 2 smaller than hk .
  • mk hk-sk is a positive number.
  • hk . xk sk . xk mk . Xk
  • shift multiply

33
Coefficient Segmentation Algorithm (cont.)
Multiplier size Algorithm Swcap/mult (pf) Reduction ()
8-bit Conventional 14.88 62.56
8-bit New 5.57 62.56
16-bit Conventional 113.00 54.41
16-bit New 51.52 54.41
24-bit Conventional 413.81 37.15
24-bit New 260.08 37.15
34
6-11) Data Block Processing 8
  • Yn-1 h0xn-1 h1xn-2 h2xn-3 h3xn-4
  • Yn h0xn h1xn-1 h2xn-2 h3xn-3

35
Data Block Processing
algorithm Power (mw) Power (mw) Area (mm2) Area (mm2)
algorithm 2SC SM 2SC SM
Conventional 7.61 5.49 0.71 0.74
Block processing 5.22 3.85 0.73 0.73
36
6-12) Transposed Direct form implementation (TDF)
3, 9
  • In DF for each multiplication both input of the
    multiplier receive new data.
  • In TDF the data input of the multiplier remains
    unchanged for a substantial number of
    multiplication operation, corresponding to the
    filter length
  • So reduced SA in data bus and data input of
    multiplier

Direct Form
Transposed Direct Form
37
6-13) Use of Multiple Multiplier (2SC or
SM)9,10
38
Use of Multiple Multiplier (2SC or SM)
2SC and DF
SM representation and TDF
39
Use of Multiple Multiplier (2SC or SM)
40
Use of Multiple Multiplier (2SC or SM)
Result of a BPF with 64-tap (2SC)
mult DF/norm DF/norm DF/min DF/min TDF/norm TDF/norm TDF/min TDF/min
mult Swcap red. Swcap red. Swcap red. Swcap red.
1 6898 ---- 5938 13.9 4513 34.6 2298 66.7
2 6906 ---- 5934 14.1 4542 34.2 2319 66.4
4 6884 ---- 5953 13.5 4644 32.5 2475 64.1
8 6922 ---- 6018 13.1 4878 29.5 2788 59.7
  • DF Direct Form
  • TDF Transpose Direct Form
  • Norm normal
  • Min minimum Hamming distance

41
References
  1. Zhan Yu, Meng-Lin Yu, Kamran Azadet and Alen N.
    Willson Jr the use of reduced two's complement
    representation in low power DSP design , IEEE
    2002
  2. M. Zheng and A. Albicki Low power and high
    speed multiplication design through mixed number
    representation , IEEE 1995
  3. M. Mehendale , S. D. Sherlekar and G. Venkatesh
    Low-Power Realization of FIR Filters on
    Programmable DSPs , IEEE Transaction on very
    large scale integration (VLSI) system, Vol. 6 ,
    NO. 4, December 1998
  4. M. R. Stan, W. P. Burleson Bus-Invert Coding
    for Low Power I/O , IEEE Transaction on very
    large scale integration (VLSI) system, Vol. 3 ,
    NO. 1, March 1995
  5. A. P. Chandrakasan , R. W. Brodersen
    Minimizing Power Consumption in Digital CMOS
    Circuits , Proceeding of the IEEE, Vol. 83, NO.
    4 , April 1995

42
References (cont.)
  1. N. Sankarayya, Kaushik Roy, and Debashis
    Bhattacharya Algorithms for Low Power and High
    Speed FIR Filter Realization Using Differential
    Coefficients , IEEE TRANSACTIONS ON CIRCUITS AND
    SYSTEMSII ANALOG AND DIGITAL SIGNAL PROCESSING,
    VOL. 44, NO. 6, JUNE 1997
  2. A. T. Erdogan and T. Arslan A Coefficient
    Segmentation Algorithm for Low Power
    Implementation of FIR filters IEEE 1999
  3. A.T. Erdogan and T. Arslan LOW POWER BLOCK
    BASED FIR FILTERING CORES, ISCAS-2003
  4. A.T. Erdogan and T. Arslan high throughput FIR
    filter design for low power SoC applications,
    IEEE 2000
  5. A.T. Erdogan and T. Arslan low power
    implementation of high throughput FIR filter,
    IEEE 2002
Write a Comment
User Comments (0)
About PowerShow.com