Lecture 7 Behavioral Synthesis for LowPower - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Lecture 7 Behavioral Synthesis for LowPower

Description:

Uses differences between coefficients rather than direct-form computation ... R1 still is active, so Comb still computes correct output ... – PowerPoint PPT presentation

Number of Views:948
Avg rating:3.0/5.0
Slides: 43
Provided by: pagr
Category:

less

Transcript and Presenter's Notes

Title: Lecture 7 Behavioral Synthesis for LowPower


1
Lecture 7Behavioral Synthesis forLow-Power
  • Behavioral Level Transforms
  • Potential for large power reduction
  • Summary
  • Michael L. Bushnell
  • CAIP Center and WINLAB
  • ECE Dept., Rutgers U., Piscataway, NJ

2
Motivation
  • Conventional automated layout synthesis method
  • Describe design at RTL or higher level
  • Generate technology-independent realization
  • Map logic-level circuit to technology library
  • Optimization goal shifting from low-area to
    low-power and higher performance
  • Need accurate signal probability/activity
    estimates
  • Consider low-power needs at all design levels

3
Behavioral-Level Transformations
4
Algorithm-Level Power Reductions vs. Other Levels
5
Differential Coefficients for Finite Impulse
Response (FIR) Filters
  • Discrete-time Linear Time-Invariant FIR system
  • Ci are the filter coefficients
  • N is taps or filter length
  • Differential Coefficients Method (DCM) reduces
    computations to save power
  • Uses differences between coefficients rather than
    direct-form computation
  • Uses various orders of differences
  • Requires more storage devices and storage
    accesses

6
First-Order Differences
  • 3 Consecutive outputs Y
  • Rewrite product terms
  • Except for C0, can express each coefficient as
    the sum of preceding coefficient and difference
    between it and the preceding coefficient

7
First-Order Differences (contd)
  • Store product terms and reuse them for next
    output time period
  • Need only 1 extra addition per product term and 1
    storage element
  • Store C0 and ds
  • Trade off long multiplier for a short one and
    storage overheads
  • d1k-1/k is first-order difference between Ck and
    Ck-1

8
Orders of Differences
9
Second-Order Differences
  • Coefficient
    expressions
  • Needs just 2 extra storage variables and 2 extra
    additions per product to compute FIR output with
    2nd-order differences compared with direct form
    computation

10
Generalized mth-Order and Negative Differences
  • mth-order differences require storage of m
    intermediate results for each product term, of
    size N, so need mN storage variables and m
    additions per product term compared with direct
    form
  • Differences can be positive or negative
  • Possible to get absolute value of partial product
    with negative differences

11
Sorted Recursive Differences (SRD)
  • DCM only applicable to systems where envelope
    generated by coefficient sequences (or
    differences) is a smoothly-varying continuous
    function
  • Mainly for low-pass FIR filters
  • Recursively sort coefficients and use various
    orders of differences to reduce computation
  • Use transposed direct form of FIR output
    computation
  • No restriction on applicable coefficient sequence
  • Word length reduction not the same for each
    coefficient

12
Transposed Direct-Form (TDF) Computation
  • Compute all N product terms for particular data
    before computing terms for next sequential data
  • Same throughput as for direct-form computation
  • Signal-flow graph

13
Signal Flow Graph for TDM Realization
14
Maximum Savings in Adds Using SRD
15
Frequency Response of SRD Low-Pass Filter and
Hamming Window
16
Savings in Adds for Low-Pass Filter
  • Black N 201, Grey N 101, White N 51

17
Savings in Shifts Using SRD
18
Least-Squares Coefficient Optimization for Filters
  • Find coefficient closest to desired coefficient,
    but with fewest of 1 bits -- 1s called a
  • Goal is to reduce the of additions
  • Use sign-magnitude coefficient representation
  • k maximum code class allowed per coefficient
  • Use branch-and-bound method to solve integer
    programming problem of selecting coefficient
    approximations
  • Shown to reduce addition computations in
    low-power filters by more than 40

19
Activity-Driven Architectural Transformations
  • Basic idea Power consumption in digital filters
    depends on order of addition operators
  • Restructure addition tree to move adders with
    higher-coefficient multiplications towards the
    output
  • Higher-activity circuitry is moved closest to
    root of addition tree
  • Definition of average signal activity over N
    consecutive time frames

20
Data Flow Graph of IIR Filter
21
Perfectly Balanced Addition Tree
22
Filter Implementations
  • Can be bit-serial or word-parallel arithmetic
  • W bits fed in parallel to adders and multipliers
  • At time t 1, z of W bits change from time t
    values
  • Activity b (t) z / W
  • b (t) is a random variable stochastic process
    strict sense stationary
  • Average power dissipation proportional to
  • In bit-serial implementations, intra-word bit
    differences, and not inter-word bit differences,
    cause node activity

23
Architectural Transforms on DSP Filters
  • I inputs to word-parallel computation tree
  • I 2 l 1, l levels in tree
  • y S aj bj
  • Obtain minimum average value of qi over all
    balanced adder nodes when
  • a1 a2 aI or a1 a2 aI
  • Minimum average value in a linear array of adders
    when
  • a1 a2 aI

j I j 1

24
Linear Array of Adders
  • Assume mutually independent
    inputs, but method works even
    when signals correlate due to
    reconvergent fanouts

25
Power Optimization Algorithm
  • Simulate circuit at functional level
  • Using random, mutually-independent input values
  • Note signal activities at all adder inputs
  • Restructure adder trees using above 2 hypotheses
  • Move additions with high activity closer to root
    of computation tree
  • Recompute average activities
  • Iterate until no additional power is saved
  • Method shown to save up to 23 of power

26
Architecture-Driven Voltage Scaling
  • Scale down VDD to save power, but increases
    circuit delay
  • Reduce delay by scaling down device sizes (less
    C)
  • But interconnect C becomes dominant, not device C
  • Need architectural transformation to introduce
    more parallelism to compensate for increased
    delay
  • Introduce parallel or pipelined architecture

27
Example Original Data Path Operator
28
Redesigned Parallel Implementation gt 2 X Area
Increase
29
Redesigned Pipelined Implementation
30
Operation Reduction Methods
  • Reduce operators in data flow graph
  • Computes X3 AX2 BX C
  • Reduces C, but may slow down critical paths
  • Reduction maintaining throughput

31
Example
  • Reduction with less throughput

32
Operation Substitution Methods
  • Multiplication uses more energy than addition
  • Replace multipliers with adders in high-level
    synthesis

33
Method and Results
  • Transformations
  • Common sub-expression utilization
  • Apply distributive law
  • Replace multiplication with repeated shifting and
    adding
  • On an 11-tap FIR filter
  • Saved 62 of dissipated power

34
Precomputation-Based Optimization
  • Basic idea Precompute (with low-overhead
    hardware) circuit output logic values 1 cycle
    before they are needed
  • Use precomputed information in next clock cycle
    to disable unneeded hardware, reduces switching
    activity
  • Must be careful Precomputation hardware can add
    to area and lengthen clock period

35
Precomputation Architecture
36
Explanation
  • When both functions are 0, indicates that no
    prediction of output value is possible
  • When prediction happens (we know definitely that
    output is 1 or 0), we turn off R2
  • Reduces activity in combinational logic block
  • R1 still is active, so Comb still computes
    correct output
  • More effective if P (f1 f2) is large
  • For comparator, saves 50 of the power

37
Specific Comparator Example
38
Can Precompute Outputs Needed 2 or More Clocks
Later
  • Can reduce switching activity by 12.5

39
Example Adder-Comparator
40
Precomputation with Shannons Expansion Theorem
41
Summary
  • Behavioral or architectural level synthesis
  • Resynthesize state variable equations to save
    power
  • Scale down supply voltage, and introduce
    parallelism and pipelining to make up for
    slow-down of hardware

42
Future Research Directions
  • Formal methods for behavioral or data flow level
    to explore power reduction design space
  • Behavioral level power estimation algorithms
    needed
  • Synthesis scheduling and data path allocation
    algorithms should incorporate power tradeoffs
Write a Comment
User Comments (0)
About PowerShow.com