Towards Technology Aware Design at the Architecture Level - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Towards Technology Aware Design at the Architecture Level

Description:

Convergence is enabled by complex digital real-time SoCs. Costs is all that matters ... Many sources of processing variations exist (e.g., lithography, reliability, ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 66
Provided by: Marc230
Category:

less

Transcript and Presenter's Notes

Title: Towards Technology Aware Design at the Architecture Level


1
Towards Technology Aware Design at the
Architecture Level
  • Paul Marchal

2
Convergence of communication, computing and
consumer
3
Convergence is enabled by complex digital
real-time SoCs
20 Radios gt 200MHz CPU gt 200MHz DSP gt 64MB
Flash gt 32MB RAM 600 components 20cm PCB area
  • Costs is all that matters
  • price erosion is high (2x/decade) for high end
    products
  • But power efficiency is a must
  • new features and limited battery-capacity require
    power efficient architectures (between 10 to 200
    GOPS/Watt)

4
Scaling is main engine for cost reduction...
Relative Cost Per Gate (log scale)
5
...but it is turning into a hell of nano-scale
physics
  • Leakage power starts to dominate
  • larger gate delay than scaling for performance
    predicts
  • Voltage headroom shrinks
  • makes ARF guys deeply worried (sub 1 Volt
    circuits)
  • Interconnect claims the first role
  • challenging timing, power, synchronism, signal
    integrity
  • Increasing uncertainties
  • jeopardizes predictability and yield, affects
    design process

6
Uncertainties are omni-present, thereby...
Dopant Fluctuations
Manufacturing uncertainties
electrical uncertainties
Line Edge Roughness
NBTI
7
...causing functional errors
Chang, Symposium VLSI Tech. 2005
  • Circuits with functionally correct operation
    under the expected amount of variations have to
    be built
  • statistical aware SRAM design techniques
  • redundancy
  • See Wims presentation

8
...causing parametric uncertainties on
delay/energy of digital blocks
Energy per Access
1.6
access delay of 200 random instances of a 8kB
memory
1.4
1.2
1
0.8
0.6
0.5
1
1.5
Access Delay
  • Random shift of performancepower consumption of
    components

9
Complicating timing closure in synchronous designs
Logic
l2 in
l2 out
l1 out
  • Synchronous design paradigm consists of providing
    a static HW interface to system architects that
    makes it easy to reason
  • Delay of all critical paths is shorter than fixed
    clock cycle

clk
clk
data 1
data 2
l1 out
data2
data 1
n
l2 in
data 1
l2 out
ok
setup time violation
10
Complicating timing closure in synchronous designs
Logic
l2 in
l2 out
l1 out
  • Synchronous design paradigm consists of providing
    a static HW interface to system architects that
    makes it easy to reason
  • Delay of all critical paths is shorter than fixed
    clock cycle under all conditions!!!

clk
clk
data 1
data 2
l1 out
data2
data 1
n
l2 in
data2
data 1
slow
data 1
l2 out
setup time violation
11
Guarantee timing closure under all
uncertainties"
  • Worst-case Design
  • Design margins for all design corners (worst,
    typical best)
  • Accumulated margins lead to over-design -gt more
    area/more power
  • Farther the corners more the penalty
  • Post-fabrication testing
  • To filter outliers, keep corners closer, limit
    over-design

12
Scaling increases uncertainties
  • Prevailing worst-case design
  • Worst-case corner of new node is worse than
    previous
  • Reduced Vdd increases sensitivity to variations
  • More sources of uncertainties, more the penalty
  • Tight circuit parametric constraints limit the
    amount of tolerable low-level uncertainties
  • Testing to keep corners closer
  • Distributions getting wider gt more yield loss
  • Only works for static uncertainties
  • Cant be extended for degradation induced
    uncertainties
  • Unable to benefit from scaling
  • Especially in sub-45nm regime

13
How can we better handle these variations?
2. Extend the technology- design interface
1. Push foundry for less variations
3. Compensate for variations at run-time
14
How can we better handle these variations?
2. Extend the technology- design interface
1. Push foundry for less variations
3. Compensate for variations at run-time
15
Extending the technology/design interface
  • Models extending the design interface to expose
    the impact of manufacturing on the system
  • reliability models, post OPC extraction,
    variability information
  • Models allow for better-than-worst-case designs
  • Only modify the hot-spots of the design rather
    than everything
  • Models allow for a reduction of the accumulation
    of design margins
  • Case-studies
  • litho-simulation for better gate sizing
  • statistical static timing analysis
  • system-level yield prediction

16
Case-study 1 Using litho-simulations for better
gate sizing
J. Yang et al., Advanced timing analysis based
on post-OPC extraction of critical dimensions,
Proc DAC 2005
17
Case-study 1 Using litho-simulations for better
gate sizing
x
J. Yang et al., Advanced timing analysis based
on post-OPC extraction of critical dimensions,
Proc DAC 2005
18
Case-study 1 Using litho-simulations for better
gate sizing
x
J. Yang et al., Advanced timing analysis based
on post-OPC extraction of critical dimensions,
Proc DAC 2005
19
Case-study 1 Using litho-simulations for better
gate sizing
x
  • Manufacturing introduces line-width variations
    (LWV)
  • major contributor to timing variation
  • yield loss as we signed of on an incorrect layout
  • However, 50 of LWV are systematic

J. Yang et al., Advanced timing analysis based
on post-OPC extraction of critical dimensions,
Proc DAC 2005
20
Case-study 1 Using litho-simulations for better
gate sizing
  • Systematic variations can be modeled after
    physical layout thru an aerial simulation
  • Extraction of layout and transistor-level timing
    analysis allows for an identification of true
    critical paths after manufacturing
  • Design corrections for true critical paths (e.g.,
    gate length trimming lt-gt resizing )
  • Critical path information can be used for tuning
    manufacturing (e.g., reducing data set for mask
    production)

21
Avoiding the accumulation of design margins
  • In corner point design, design margins are
    accumulated to ensure that the design operates
    under all worst-case conditions
  • Highest temperature, lowest voltage and worst
    process conditions for all gates
  • Likelihood of these corner is extremely low
  • Optimize the design such that the probability
    that it meets the time/power constraints is
    sufficiently high
  • Case-studies
  • statistical static timing analysis
  • system-level yield prediction

22
Case-study 2 Statistical Static Timing Analysis
gate netlist
gate netlist
Statistical Static Timing Analysis
Static Timing Analysis
delay distribution
delay
gate lib with statistical delay distribution
gate lib characterized for worst-case
library design
library design
design
technology
design rules parameterized manufacturing
induced variations
worst-case design rules
  • Tools exist MAGMA, Synopsys, ExtremeDA, etc.
  • Solution is as good as the input models on
    variability

23
Case-study 3 System-level Yield Prediction
Yield-aware Architecture Exploration
  • Can the system be made yielding?
  • What are the yield critical blocks?
  • What-if analysis?

component
Energy
Delay
Correlated Energy/Delay info per component (e.g.
obtained via statistical analysis or simulation
techniques)
24
Case-study 3 Yield-aware Architecture Exploration
1.6
less and faster memories
1.4
1.2
99.9
1
manufactured systems meeting clock frequency
Average Energy for memory organization (Relative)
0.8
0.6
0.4
0.2
0
Ref.
Redesigned Architecture
Ref.
Redesigned Architecture
Ref.
Redesigned Architecture
Architecture
Architecture
Architecture
Image Processing
Wireless Receiver
Audio Decoder
25
Summarizing benefits and challenges of extending
the technology/design interface
  • Model-based BTWC refining the interface between
    technology/processing data and design
  • Pass limits of manufacturing flow to the
    designers
  • Pass functional intent to manufacturing flow
  • Fast analysis tools are required at all
    abstraction layers
  • Incorporate most important effects
  • Manual analysis of the yield on a 1gCell netlist
    is impossible
  • Research for limiting run-times (e.g., STA after
    litho simulation)
  • Silicon foundries should provide processing
    information
  • Many sources of processing variations exist
    (e.g., lithography, reliability,...)
  • factory-floor IT systems need to be able to
    archive, retrieve and analyze all types of
    cross-sections
  • They are reluctant to do so for other reasons
  • Analysis Results should be translated in design
    methods
  • Strain/stress simulation tools are available, but
    these tools cannot be used for design (e.g., no
    layout is available during synthesis
  • Techniques are tedious design margins remain
    required

26
How can we better handle these variations?
2. Extend the technology- design interface
1. Push foundry for less variations
3. Compensate for variations at run-time
27
Compensating for delay variations at run-time...
  • Ensure that the system is fast enough at run-time
  • But avoid taking design margins, but rather speed
    up the limiting circuits at run-time
  • Typical case optimization
  • only burn energy when necessary
  • The only solution for extreme conditions
  • large variations
  • sensitive designs (analog blocks, memories,
    ultra-low power logic)
  • Complementary to other design techniques
  • Can be made independent of origin of
    uncertainties
  • may raze design margins for ALL uncertainties in
    a single shot

deadline
energy
x
x
x
x
x
x
x
x
x
x
x
x
x
delay
28
...requires a self-adjusting system
Application knowledge (deadlines, workload, )
  • guarantee functional correctness
  • circuits that remain functionally correct under
    variations
  • guarantee parametric correctness
    (performance/energy)
  • energy/delay/robustness monitors
  • speed knobs
  • A method for integrating knobs/monitors into the
    system at low cost

System knobs
Run-time Controller (finds optimal knob
settings, RTOS/HW)
Hardware Status
Distributed energy/delay measurement
Conceptual View
29
Case-study 4 building a delay-variation tolerant
memory hierarchy
  • Memories are most vulnerable to random variations
  • Many minimal sized transistors Pelgroms law!!!
  • Many critical paths
  • Small memories near the functional units are best
    candidates
  • these are most power consuming
  • Functional failures can be eliminated using
    circuit design techniques

tile
tile
tile
Inter-tile communication network
tile
tile
tile
L2 memory
L2 memory
L1mem
L1mem
L1mem
L1mem
L1mem
Intra-tile communication network
PE
PE
PE
PE
30
...requires a self-adjusting system
Application knowledge (deadlines, workload, )
  • guarantee functional correctness
  • circuits that remain functionally correct under
    variations
  • guarantee parametric correctness
    (performance/energy)
  • energy/delay/robustness monitors
  • speed knobs
  • A method for integrating knobs/monitors into the
    system at low cost

System knobs
Run-time Controller (finds optimal knob
settings, RTOS/HW)
Hardware Status
Distributed energy/delay measurement
Conceptual View
31
Guaranteeing functional correctness under
variations
  • Delay variations cause functional errors in
    synchronous designs
  • Can we build circuits that avoid latching the
    wrong data independent of how much variations
    occur?
  • self-timed logic
  • double-latched
  • adaptive synchronous

Logic
clk
32
Self-timed logic based on handshaking (1)
  • Data hand-off between any pair of registers is by
    hand-shake
  • Delivers actual performance
  • operand values determine the delay
  • Each register (or associated combinational logic)
    need to have completion detection circuitry
  • Cost-effective robust completion-detectors for
    data-path FUs need to be explored
  • Matched delay line approach still requires design
    margins (cfr. Wim)

Jens Sparoe et al, Principles of Asynchronous
Circuit Design A Systems Perspective, Kluwer
Academic Publishers, Jan. 2002
33
Self-timed logic based on handshaking (2)
  • Self-timed logic is attractive for knobbed
    components
  • no combined control of both Vdd/frequency
    required
  • Difficult to integrate in current design flows
  • dedicated cells, difficult to characterize with
    current tools
  • logic and physical synthesis
  • actual case timing
  • avoiding hazards
  • robustness
  • testing (no clock)
  • Async design is reviving
  • Handshake Solution next Friday seminar
  • ARM developed an async version of the ARM9

Schmoo-plot of a self-timed circuit (ASPIDA
processor). The chip operates correctly over a
large Vdd range. (from Cortadella et al.,
De-synchronization synthesis of asynchronous
circuits from synchronous specifications, TCAD,
Oct. 2005, Vol. 25, Issue 10, pp. 1904-)
34
Double Latching
  • Circuit delay speculation, error detection,
    correction
  • Exploits typical-case amidst dynamic
    uncertainties below uArch-level
  • Demonstrated good energy savings
  • Worst-case (with all uncompensated uncertainties)
    is limited and must be guaranteed by design-time
    analysis
  • Can not handle huge uncertainties, doesnt fully
    exploit actual-case
  • Severe short-path constraints
  • Delay padding overhead
  • Extra bypass path, shadow latch overhead,
    meta-stability issues

D. Ernst et al., Razor A Low-Power Pipeline
Based on Circuit-Level Timing Speculation,
Micro, 2003
35
Adaptive Synchronous
  • Run-time determined clock based on HW status
  • Using hardware monitoring (test vectors delay
    measurement)
  • PLL/DLL is directed to deliver required clock
  • Only pre-determined clocks can be generated by
    PLL/DLL
  • Takes a while (1000 cycles) to complete the
    transition

BIST
test patterns
Clk Generator
IP Blocks
clks
test
time
operate
36
Case-study 4a self-timed memory
Vdd_matrix
Vdd_IO
Vdd_decoder
Address Latch 1
Address Latch 2
Interface
Orchestration of events is critical for
functional correctness of memory
WL buff
xDec
Matrix
Sense Amplifiers
CLK
DEC_ START
PRE
WL EN
Data_Out
Data_In
SA
CLK
DEC_START
PREb
WL EN
SA ACT
37
Case-study 4a self-timed memory
Vdd_matrix
Vdd_IO
Vdd_decoder
Wrong cell on is read gt incorrect output at
sense-amp
Address Latch 1
Address Latch 2
Interface
WL buff
xDec
Matrix
Sense Amplifiers
CLK
DEC_ START
PRE
WL EN
Data_Out
Data_In
SA
CLK
DEC_START
Empty
valid
DEC_OUT
Decoder is too slow due to process variations
PREb
WL EN
SA ACT
38
Case-study 4a Self-timed address decoder
stop pre-charging
x
39
...requires a self-adjusting system
Application knowledge (deadlines, workload, )
  • guarantee functional correctness
  • circuits that remain functionally correct under
    variations
  • guarantee parametric correctness
    (performance/energy)
  • speed knobs
  • energy/delay/robustness monitors
  • A method for integrating knobs/monitors into the
    system at low cost

System knobs
Run-time Controller (finds optimal knob
settings, RTOS/HW)
Hardware Status
Distributed energy/delay measurement
Conceptual View
40
Knobs for controlling performance
  • uArch-level components with run-time configurable
    parametric aspects
  • without affecting functionality
  • Right configuration is decided at run-time
  • based on HW status and delay requirements
  • Knobs or combination of knobs should
  • be fast enough to compensate for worst
    variations, but highest possible energy savings
    in case of more relaxed conditions
  • have low overhead not to upset the original
    performance/energy
  • fine control i.e. only speed up the failing
    path
  • speed - low re-configuration time
  • Translation of fine-grain performance variability
    into energy savings
  • By switching to energy-efficient configurations
    when more operation latencies can be tolerated
  • Typical knobs
  • power supply/back gating
  • (redundant logic)
  • re-configurable HW

41
1. Backbiasing/supply-based knobs
  • Excellent proven dynamic range of combined Vdd/Vt
    knobs
  • sufficient for 90nm
  • Widely researched and applied in designs
  • Intels Enhanced SpeedStep technology
  • AMDs PowerNOW! Technology
  • Transmetas Longrun2 Power Management (incl.ABB)
  • Vt knob is losing its efficiency
  • area overhead
  • back-gate becoming less effective,
  • gate leakage
  • multi-gate devices (e.g., finfets)?
  • Usually very coarse granularity single knob for
    entire chip

courtesy M.Meijer NXP
42
Towards knobs with a finer granularitymultiple
Vdd islands
  • Multiple Vdd/Vt domains enables finer grain
    control, limiting worst-casing
  • Multiple Vdd/Vt knobs are challenging
  • different islands operate at different speed -gt
    GALS-like communication fabric
  • multiple off-chip DC-DC converters incur overhead
    (too many, too many pins, too many extra
    components and complex power distribution)
  • on-chip linear power regulators are not portable
    to new technologies, incur noise and contain
    biasing currents

x
43
A low-overhead on-chip voltage controller
  • Vdd control through header and footer transistors
  • Linear resistors (active mode)
  • Power switch (standby)
  • Fine grain programmability with digital
    resistance with segmented transistor
  • Fast settling times (order of 100ns)
  • Overhead remains high
  • extra switches
  • level converters/clamp cells
  • Sensitivity to noise
  • Limited energy savings
  • CVddVswing rather than CVdd2

M. Meijer et al., On-chip Digital Power Supply
Control for System-on-chip Applications, Proc.
ISLPED 05
44
3. Re-configurable hardware knobs
  • Knob consists of multiple units of hardware
  • fast one to satisfy worst-case constraints
  • slow one to save energy
  • Exploring best combination between
    low-power/high-performance knob
  • HW of unused config should be isolated from input
    changed and/or Vdd
  • Mux/Demux operand isolation circuits defines
    optimal granularity max amount of combinations

E.g. variable-size buffers, carry chain
variants, etc.
45
Case-study 4b configurable HW to vary memory
performance
  • Buffers inside the row decoder and wordline
    drivers are interesting circuits for building
    knobs
  • Inside the critical path of the memory
  • Important contribution to both the power and
    delay as these circuits drive large capacitive
    loads
  • Limited impact on area

particularly for small memories (lt128kB cfr.
Amrutur et al, Speed and power scaling of
SRAMs, IEEE J. Solid State Circuits, vol. 35,
no. 2, pp. 175-185, Feb. 2000)
46
Case-study 4b Configurable drivers implemented
with redundant logic
fast
Cworldline/ Cdecoder/...
in
out
energy- efficient
ctrl_fast
ctrl_fast
  • Maximizing range of configurable buffers
  • sizing of buffers
  • number of stages
  • Overhead for combining circuits can be limited

47
Case-study 4b Sizing of a Pareto-optimal Buffer
1
f2
16Cmin
1
f2
f3
38Cmin
48
Case-study 4b Determining optimal stage-length
  • Energy-optimal number of stages depends on
    performance targets
  • Optimal number of stages for speed can be
    determined using classical tapered buffer design
  • More stages only increase power consumption as
    they do not further decrease delay
  • Configurable buffer is built by combining
    Pareto-optimal buffers

Cload 32Cmin
49
Case-study 4b Design issues in building a
configurable drivers
  • Tri-state buffers for selecting the buffer
    configuration.
  • Output sharing impacts for performance of low
    power buffer, not of high speed one.
  • Area savings thru
  • the use of normal inverter on intermediate stage
    of the high power inverter
  • no tri-state buffers for the initial stages of
    the low power buffer

E.g., a configurable buffer with a fast and slow
option.
Hua Wang et al, Variable tapered pareto buffer
design and implementation allowing run-time
configuration for low-power embedded SRAMs.
TVLSIS, 13(10) 1127-1135 (2005)
50
Case-study 4b Integration of the configurable
buffer inside a memory
  • Configurable buffers integrated in pre- post
    decoder
  • 1kB SRAM in 65nm BPTM
  • 10 variations assumed in both Vt/Beta
  • Area overhead is limited compared to array
    (estimated less than 5 for this small memory)
  • Three option configurable buffer
  • Spice simulations of 65nm
  • Sizing indicated on figure

51
...requires a self-adjusting system
Application knowledge (deadlines, workload, )
  • guarantee functional correctness
  • circuits that remain functionally correct under
    variations
  • guarantee parametric correctness
    (performance/energy)
  • speed knobs
  • energy/delay/robustness monitors
  • A method for integrating knobs/monitors into the
    system at low cost

System knobs
Run-time Controller (finds optimal knob
settings, RTOS/HW)
Hardware Status
Distributed energy/delay measurement
Conceptual View
52
Run-time Hardware Monitoring
  • Requirements of monitor circuits depends on
    choice of circuit style
  • Adaptive synchronous
  • Functional parametric testing
  • Excited with expected typical input vectors
    generated by compact configurable BIST HW
  • Possible by exploiting structure regularity of
    memories data-path and using symbolic
    cellular-automata methods gt research
  • Circuits which are always functionally correct
    (e.g., self-timed)
  • Parametric testing only
  • Measurement can be less tedious (e.g. delay-line)

Energy
Delay
What is the actual energy/performance of the
circuit?
53
Some performance monitoring circuits...
  • Delay-line for on-chip performance monitoring
  • calibration with accurate reference
  • Counter-based performance monitor for self-timed
    logic

external reference
1/50

logic
S
Integral Controller
clk/ completion signal

-
54
...requires a self-adjusting system
Application knowledge (deadlines, workload, )
  • guarantee functional correctness
  • circuits that remain functionally correct under
    variations
  • guarantee parametric correctness
    (performance/energy)
  • energy/delay/robustness monitors
  • speed knobs
  • A method for integrating knobs/monitors into the
    system at low cost

System knobs
Run-time Controller (finds optimal knob
settings, RTOS/HW)
Hardware Status
Distributed energy/delay measurement
Conceptual View
55
System integration challenges
  • System characteristics
  • application dynamism
  • performance constraints
  • energy budget
  • cost target
  • reliability constraints
  • ...
  • Architecture
  • sensitivity to variations

How to create a self-adaptive system at lowest
cost?
  • Technology
  • intra die variations
  • slow (random, reliability)
  • fast (IR drop, Xtalk,..)
  • global variations
  • slow (D2D)
  • fast (Power noise)
  • range of variations

56
Case study 4 An adaptive synchronous integration
(1)
  • Memories are self-timed and tuneable
  • System is assumed to operate synchronously
  • Easy to integrate in existing systems
  • Variations define which is slowest component and
    thus max. clock speed
  • uncertainties -gt access delay variations
  • slowest word determines access delay of a memory
  • Monitoring circuits to determine fmax
    (Energy/access -gt optional)
  • BIST generates test vectors
  • Increasing fmax by moving the slowest memory to
    high speed
  • at the cost of extra power.
  • Tuneable clock required to set operating
    frequency
  • determined by application/system requirements
  • configuration time is relatively high
  • Assumption
  • logic is not BTWC designed

57
Case study 4 An adaptive synchronous integration
(2)
ss
  • Run-time controller identifies energy-optimal
    knob positions of each component to achieve the
    desired fmax_at_lowest_energy
  • in case of multiple knobs per component, energy
    monitoring is needed


1.1
1

0.9
Energy

0.6
0.4

Delay
0.5
1
1.25
0.25
0.75
100Na
100Na
nn
ss
120EU
time
deadline
58
Case study 4 An adaptive synchronous integration
(2)
sf
  • Run-time controller identifies energy-optimal
    knob positions of each component to achieve the
    desired fmax_at_lowest_energy
  • in case of multiple knobs per component, energy
    monitoring is needed


1.1
1

0.9
Energy

0.6
0.4

Delay
0.5
1
1.25
0.25
0.75
DP
100Na
100Na
nn
ss
120EU
Mem1
Mem2
time
deadline
59
Case study 4 An adaptive synchronous integration
(2)
sf
  • Run-time controller identifies energy-optimal
    knob positions of each component to achieve the
    desired fmax_at_lowest_energy
  • in case of multiple knobs per component, energy
    monitoring is needed


1.1
1

0.9
Energy

0.6
0.4

Delay
0.5
1
1.25
0.25
0.75
DP
100Na
100Na
nn
ss
120EU
sf
150EU
Mem1
Mem2
time
deadline
60
Experimental results for a DAB receiver
80
60
(energy energy_nom) /energy_nom
40
DAB receiver
20
0
0.814
0.91
1
0.754
normalized deadline constraint
DAB receiver consists of 3 FUs connected to 7
configurable memories
61
Case study 4 An adaptive synchronous integration
(2)
sf
  • Run-time controller identifies energy-optimal
    knob positions of each component to achieve the
    desired fmax_at_lowest_energy
  • in case of multiple knobs per component, energy
    monitoring is needed
  • Energy benefits varies from chip-to-chip
    (depending on variations)
  • average if many components on chip


1.1
1

0.9
Energy

0.6
0.4

Delay
0.5
1
1.25
0.25
0.75
100Na
100Na
nn
120EU
ss
sf
150EU
time
deadline
62
Involving task-level information in the feedback
control loop
  • A more complex control algorithm re-configures
    the memories performance depending on the memory
    usage (application load) and current hardware
    status
  • Single solution that can deal with ALL sources of
    variations
  • from manufacturing induced ones to
    application-level ones
  • Similar to DVS like of solutions once
    calibrated
  • e.g., TCM, VDD/VT-hopping (see Ph.d. Peng Yang
    for an extended overview)


1.1
1

0.9
Energy

0.6
0.4

Delay
0.5
1
1.25
0.25
0.75
100Na
100Na
nn
ss
sf
150EU
time
deadline
63
Experimental results for a DAB receiver
80
60
(energy energy_nom) /energy_nom
40
DAB receiver
20
0
0.814
0.91
1
0.754
normalized deadline constraint
A. Papanikolaou et al., A system-level
methodology for fully compensating process
variability impact of memory organizations in
periodic applications, CODESISSS, 2005,
p117-122
64
More than compensating random process variations
  • System can adapt itself to slowly changing
    environmental parameters
  • Temperature, degradation, aging, etc.
  • Requires re-calibration of the circuit depending
    on environmental conditions
  • Worst-case margins remain necessary
  • Re-configuration of clock involves considerable
    delay
  • No accurately tracing of fast changing
    environmental value dependent conditions

64kB 0.18CMOS 250Mhz
results from a self-timed memory proposed in E.
Karl et al
65
Summarizing the benefits and challenges of
rt-compensation techniques
  • Feedback control saves energy by razing design
    margins still allowing for real-time operation
  • The required components are available
  • Delay variation resilient circuits
  • self-timed is feasible inside memories with
    limited overhead
  • the debate is still open op whats the best DVR
    for logic minimize overhead while maximize the
    razed margins
  • Coarse grain Vdd knobs to fine grain
    re-configurable HW
  • More work is required in monitortest circuits
  • A possible integration of these circuits into a
    working system has been presented.
  • removes manufacturing variations can deal with
    slow changing dynamic variations
  • which feedback system is best in a given context
    is still largely unknown

66
Acknowledgements
  • Satyakiran Munaga
  • Miguel Miranda
  • Hua Wang
  • Antonis Papanikolaou
  • Francky Catthoor
  • Wim Dehaene
  • Hugo De Man

67
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com