Title: Approaches to LowPower Implementations of DSP Systems
1Approaches to Low-Power Implementations of DSP
Systems
- Class Advisor Dr. Fakhraie
- Presentor Nariman Moezi
- DSP Design Implementation Course Seminar
- Spring 2004
2Out line
- Reduced twos complement representation
- Low power Scheduling Techniques for embedded DSP
software - Low power multiplier
- - Mitchell-Based logarithm multiplier
- - Power-Aware pipelined multiplier
3Reduced twos complement representation
- twos complement representation is widely used in
the implementation of arithmetic operations. - If X has a small magnitude and switches between a
positive and a negative value,its sign extension
changes between strings of zeros and ones. - If X has magnitude less than 2m-1 (mltN), We van
represent this number by the sum of an m-bit
vector and a constant vector
having a string of ones from bit N-1 to bit m-1
at the MSB side
(Zhan Yu et al , 2002)
4- APPLICATION Low power FIR filter using Reduced
Twos Complement Representation - Consider a hybrid-form adaptive FIR filter
,where the inputs are 5-level
data symbols and take values in -2,-1,0,-1,2 . - Assuming coefficients are N-bit twos complement
numbers - Such multiplications are simply shift and
complement operations - Assume that we detect that the maximum magnitude
of a coefficient H is less than 2m-2 .We know
that corresponding partial product P has a
magnitude less than 2m-1 .
5 Coefficient Maximum Magnitude Detection(An
example with two taps and 6 bit coefficients)
- Partial-Product generation using reduced twos
complement representation
6As the adaptive filter updates the coefficients,
the word-length of the reduced representation
will change. So does the error introduced by
using the reduced representation.We can build a
compensation vector correction path that imitates
the error propagation in the accumulation path.
- A test chip was implemented in 0.25 um CMOS
technology.There were used a hybrid-form filter
of 160 taps and having 8 taps per hybrid
section.The coefficient word-length is 10
bits.when operating at 2.5V with a 100MHz clock,
a 32 power saving has been measured as
summarized in this table
7Low-Power Scheduling Techniques for Embedded DSP
Software
- This section describes an instructional-level
power model for a processor (Fujitsu) , and
techniques to reduce the power of this processor. - The DSP processor has a special architecture that
allows instructions to be packed into pairs. - The Booth multiplier on this processor is a major
source of energy consumption for DSP programs. - So a micro-architectural power model for the on
chip Booth-multiplier is developed and analyzed
for further power minimization. - Based on this model, an effective technique of
local code modification by operand swapping is
used to further reduce power consumption.
(S. Malik,IEEE Trans 1997)
8An example of a sequence four instructions where
the overhead cost between 1 and 3 can nat be
ignored
- The sum of measured current for the four
instructions is 204 mA. - The sum of the base costs (37.214.436.614.4)
and the overhead costs of adjacent instructions
(18.418.418.418.4) is only 176.2 ,which under
estimates the actual cost by 13.6. - The difference ,27.8,in the two estimates comes
from the circuit state overhead between
non-adjacent instructions 13. - This is due to a special design at the inputs of
the multiplier.there is a latch between each
operand and multiplier to retain the the old
values until the next multiply instruction is
executed. - This overhead is dependent on the previous and
current values of input latches for each multiply
operation.
9Instruction packing for lowpower
- A special architecture of the target DSP
processor is the capability of packing an
ALU-type instruction and a data transfer
instruction codeword for simultaneous execution . - The average current for packed instructions is
only slightly more than the average current for a
sequence of the two unpacked instructions.
Comparision of energy consumed by packed and
unpacked instructions
10- As to the overhead cost of MAC instructions, when
MAC is packed with a data transfer instruction,
especially LAB ,which changes data values in
registers A and B used by MAC as inputs,
significantly wide variation of overhead cost is
observed(from 1.4mA to 33.0mA). - Such wide variation is mainly due to the complex
booth multiplier implemented in the MAC unit.
- The fundamental idea behind booth multiplier is
to recode B by skipping over 1s technique. - For example a 7-digit B value 0011110 that would
need four additions of shifted A,can be recoded
to a new value which requires one addition and a
subtraction - weight4 weight2
Micro architectural model for the booth multiplier
11- we can reduce the number of additions and
subtractions by just swapping the operands in
registers A and B, which can result in current
reduction. The table gives three experiments
where swapping
Variation of measured current by swapping
operands op1 and op2 in registers A and B for
MACLAB instructions.
- Another that determines power consumption of the
multiplier,is switching activity - For the booth multiplier the characteristic of A
is its switching activity and for B, weight
factor and switching activity
12- Average current drawn by MACLAB for different
characteristics of consecutive values in A and B.
- For a typical DSP application MACLAB
instructions are usually applied to a sequence
data for filter operations such as - As we know only C and there is no information
about X we , consider C as the value B .If
switching activity or weight factor of value C is
high we can swap operands.
Comparison of power consumption for 5 DSP
programs by different scheduling techniques
13Improved Mitchell-Based Logarithmic Multiplier
for Low-power DSP Applications
- The technique of multiplying two numbers using
logarithms is simple. Take the logarithms of two
multiplicands, add the logarithms together and
then take the antilogarithm of the resulting
summation.
- Mitchell method of calculating logarithms
- assume N 2510 110012
- The MSB is bit 4,that gives a characteristic of
1002 and the retaining bits(10012) gives the
fraction. This gives a value for the logarithm of
100.10012 (4.562510). - The correct value of log2(25) is 4.6439.
(Duncan J. McLaren et al IEEE 2003)
14- A binary number N ,can be written as
Antilogarithms of this two equations are
Note that k represents the characteristic
and x the binary fraction,with x in the range 0lt
x lt 1. The true logarithm and the approximation
using the Mitchell method are
The logarithm of a product is equal to the
sum of the logarithms of the multiplicands
To correct the error the following is used
15- This shows that to provide the correct answer, an
error correction factor should be added to the
summation before the antilogarithm is calculated.
- however this would be impractical. The approach
is to average the value of the correction factor
over a range of x values, and add this to the
summation. This results in a multiplier of
improved accuracy. - multiplier of improved accuracy. The two
fractional parts are split into 8 ranges, from 0
to 1 in steps of 0.125. This means that the 3
most significant bits of x can be used to
determine the error correction factor (which is
pre calculated).
16- To test the multiplier further, it was used as
part of a real application, in this case a Finite
Impulse Response (FIR) Filter. The filter was an
11-tap low-pass FIR, with a normalized cut-off
frequency of 0.25. The filter was implemented in
Verilog using the standard multiplier, the
un-modified Mitchell multipliers and the Improved
Mitchell multipliers. The input was 16-bit and
the output was 32-bit. The figure below shows the
magnitude response from each of the three
implementations.
17 Power-aware Pipelined Multiplier Design Based
On2-Dimensional Pipeline Gating
- Although Boolean multipliers have natural power
awareness to the changing of input precision,
deeply pipelined designs do not have this
benefit. - In Boolean unpipelined multipliers, low input
precision calculation (like 00010001) dissipates
much less power than high input precision
calculation (like 11111111). So Boolean
unpipelined multipliers are naturally power aware
to the changing of input precision. - In deeply pipelined designs, the number
- of registers is much larger than that of
- other elements, these designs do not have
- the natural power awareness to the
- changing of input precision.
(Jia Di, J. S. Yuan et al GLSVLSI 2003)
18- To solve this problem and improve the power
awareness of deeply pipelined multipliers,a
novel technique,2-dimensional pipeline gating is
proposed.This technique is to gate the clock to
the registers in both vertical and horizontal
direction.
19- In a 44 multiplier , when the input precision is
4, for example, calculating 11111111, S is
generated based on all inner partial products. If
the input precision is 2, for example,
calculating 00110011, the partial products
containing X2 or Y2 (the ones enclosed by a
rectangular) can also be disabled.
20(No Transcript)
21(No Transcript)
22References
- M. T. Lee, V. Tiwari, S. Malik, and M. Fujita,
Power analysis and minimization techniques for
embedded DSP software," IEEE Trans. VLSI Syst.,
vol. 5, pp. 123-135, Mar. 1997. - Jia Di, J. S. Yuan et al,Power-aware Pipelined
Multiplier Design Based On 2-Dimensional Pipeline
Gating GLSVLSI03, April 28-29, 2003 - Zhan Yu et al,A Low Power Adaptive Filter Using
Dynamic Reduced 2SC Representation,IEEE Custom
Integrated Circuits Conference 2002 - Duncan J. McLaren et al,Improved Mitchell-Based
Logarithmic Multiplier for Low Power DSP
ApplicationsIEEE 2003