Design Automation for Asynchronous Circuits - PowerPoint PPT Presentation

About This Presentation
Title:

Design Automation for Asynchronous Circuits

Description:

STA helps to quantify risk (reduce margin and be structure specific) ... does not change sign-off (STA) - complete solution in verification and testing ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 68
Provided by: Comp910
Learn more at: https://www.cs.upc.edu
Category:

less

Transcript and Presenter's Notes

Title: Design Automation for Asynchronous Circuits


1
Design Automation for Asynchronous Circuits
  • Alex Kondratyev
  • Cadence Berkeley Labs,Berkeley, CA, USA

In collaboration with Jordi Cortadella, Luciano
Lavagno Kelvin Lwin and Christos Sotiriou
2
Outline
Outline
  • What do we optimize?
  • End of deterministic design
  • Technical and business implications
  • Asynchronous design with commercial tools
  • Desynchronization
  • Delay-insensitive datapath
  • Fine-grain pipelining

3
Optimization metrics
  • Late 70-s
  • Literals
  • nodes of a Boolean network
  • Levels of a Boolean network

Area
Speed
  • Nowadays
  • Literals
  • nodes of a Boolean network
  • Levels of a Boolean network
  • Wire length

Area
Speed
Tools are optimizing for area and speed!
4
Universal metrics
Power
C
5
Universal metrics
Power
?
small
2
C
P a f C V
dd
dyn
clk
Delay
?
, delay
Supply voltage
?
Power
?
?
Speed can be taken as a universal metrics
6
Outline
Outline
  • What do we optimize?
  • End of deterministic design
  • Technical and business implications
  • Asynchronous design with commercial tools
  • Desynchronization
  • Delay-insensitive datapath
  • Fine-grain pipelining

7
Timing margins
  • Algorithms/tools (approximations)
  • Modeling (process corners e.g.)
  • Architecture (unbalanced computation)

8
Algorithms/tools
False paths (lt 5)
Common path pessimism removal
Hierarchy hurts!!!
10-35 gain from floorplan flattening
(Reshape)
Bad news we do not know how far we are from
optimum ?
Good news optimum is not possible to find ?
9
Modeling
Why to panic?
New BIG players signal integrity and process
variability
10
Variability sources
  • Environment (T, Vdd) signal integrity
  • Within-die only
  • Process variations
  • (gate length L, wire width W, threshold voltage
    Vt)
  • Die-to-die (design independent)
  • Within-die (design dependent)

11
Environment SI
Temperature -40?C to 125 ?C
Supply voltage 10
VDD
VDD
IR drop decrease in the current from Vdd
Bad news
Good news
7
6
Field solvers can handle 10 variables
10 gates x 8metal layers
Abstraction, model reduction, IP reuse
help further
9
? 10 RC elements in VDD grid
Tools make IR drop sign off at 5Vdd (still ? 10
delay penalty)
12
Environment SI
Crosstalk
Conservative analysis up to 20 delay penalty
(post-layout fixes)
13
Process variations
  • Within-die
  • design dependent,
  • systematic and random!!
  • Die-to-die
  • design independent, well
  • modeled via worst-case files

Lgate
Wwire
Tt
Nassif01
14
Measuring variability
chips
Microprocessor at-speed functional testing
frequency
Bin1
Bin2
Bin3
ASIC no delay testing, no binning
Strategically placed oscillators
Problem Up to 15 delay variation in RO
(Nassif03) Vertical/horizontal (4), spacing
poli-SI (7), distance (5)
15
Modeling variability
Model for gate delay (linear wrt variability
sources)
Independence of sources (within a group - model
reduction (PCA or SVD))
For a single variability source L L
L
spatial
random
var
(is modeled by random normally distributed
variables N(0,?))
Variation of path delay D ? d (L
)
var
var
var
16
Statistical timing analysis
?
Reconvergence needs some care
  • Numerical computation of a distribution
  • Approximate convolution (5 accuracy)
  • Use upper and lower bounds (10 diff. Blaauw03)

Algorithms have linear complexity!
17
What it buys?
Trading yield
STA helps to quantify risk (reduce margin and
be structure specific)
STA might help to trade off confidence margin and
yield (testing???)
  • Open issues
  • why normal?
  • how to derive ??
  • how to derive sensitivity coefficients?

18
Outline
Outline
  • What do we optimize?
  • End of deterministic design
  • Technical and business implications
  • Asynchronous design with commercial tools
  • Desynchronization
  • Delay-insensitive datapath
  • Fine-grain pipelining

19
Summing this up
Clock overhead
Cycle time
Real Computation Time
Worst- average
Variability
25
30
45
Some designs work twice faster than needed by
spec!
Everything boils down to
Synchronous design is turning out to become a
costly proposition
20
Is asynchronous an option?
It is about time but must requirements to
asynchronous CAD tool
  • Competitive
  • - added value with minimal (or no) penalty
  • - scalable (capable of handling large designs)
  • Simple
  • - minimal knowledge of asynchronous design
  • - RTL input
  • Risk-free
  • - does not change sign-off (STA)
  • - complete solution in verification and
    testing
  • - backup options (synchronous implementation)

21
Outline
Outline
  • What do we optimize?
  • End of deterministic design
  • Technical and business implications
  • Asynchronous design with commercial tools
  • Desynchronization
  • Delay-insensitive datapath
  • Fine-grain pipelining

22
Sliding the trade-off curve
Automation efforts
QDI fine-grain pipelining
Template-based gate-level pipelining
QDI datapath
NCL, phased logic
Penalties?
Bundled data
desynchronization
EMI, skew penalty
Variability
Average speed
gates blocks
23
Desyncronization flow
  • Think synchronous
  • Design synchronousone clock and edge-triggered
    flip-flops
  • De-synchronize (automatically)
  • Run it asynchronously

Asynchronous for dummies
24
Synchronous circuit
L
L
L
L
0
0
1
1
CLK
0
0
L
L
25
De-synchronization
L
L
L
L
0
0
1
1
0
0
L
L
26
De-synchronization
Distributed controllers substitute the clock
network
C
C
C
C
C
C
The data path remains intact !
27
A
B
C
D
28
A
B
C
D
A
B
C
D
A
B
C
D
A-
B-
C-
D-
Overlapping is also acceptable
29
Concurrent model
30
For any netlist
31
Synchronization layer
32
Synchronization layer
33
Synchronization layer
This
This is a circuit marked graph (CMG)
34
Properties of CMGs
  • Any CMG is live and safe
  • Safeness no data overwriting
  • Liveness no deadlock

A
B
C
A-
B-
C-
35
(No Transcript)
36
Flow equivalence Guernic, Talpin, Lann, 2003
A
B
37
Flow equivalence
CLK
A 1 3 0 2 1
5 3 1 6 0
B 5 1 2 3 1
4 2 4 3 1
Synchronous behavior
A 1 3 0 2
1 5 3 1 6 0
B 5 1 2 3 1 4
2 4 3 1
De-synchronized behavior
38
Flow equivalence
CLK
A 1 3 0 2 1
5 3 1 6 0
B 5 1 2 3 1
4 2 4 3 1
Synchronous behavior
A 1 3 0 2
1 5 3 1 6 0
B 5 1 2 3 1 4
2 4 3 1
De-synchronized behavior
Theorem The de-synchronization model
preserves flow-equivalence
39
Timing equivalence
del_a
del_b
del_c
A
B
C
D
del_b del_a del_c del_d
A
del_a
del_a
B
del_b
del_b
C
del_c
del_c
D
A
B-
C
D-
Synchronous-like behavior
del_c
del_a
del_b
A-
B
C-
D
40
Timing equivalence
del_a
del_b
del_c
A
B
C
D
del_b gt del_a del_c del_d
A
del_a
del_a
B
del_b
del_b
C
del_c
del_c
D
A
B-
C
D-
B keeps the same period and settles the rest
del_c
del_a
del_b
A-
B
C-
D
41
Compatibility
Synchronous T ? T T T
T
setup
comb
skew
CQ
sync
Desynchronized T ? T T
T
desync
CQ
comb
controller
Statement Desynchronized design is behavior and
timing compatible to its synchronous counterpart
42
Synchronous environment
A
B
C
Clk
Clk
A
B
C
Clk
Timing arc
A-
B-
C-
Clk-
43
Implementation of a controller
  • Only local handshakes with adjacent controllers
    are necessary
  • Synthesis by using intuition, common sense,
    and petrify

44
Implementation of a controller
45
Delay matching
Combinational logic
d
46
Post-layout delay matching
Combinational logic
47
Post-layout delay matching
Combinational logic
48
Desynchronization. Gaining Trust
Synchronous RTL

49
Async DLX block diagram
50
Desynchronization. Gaining Trust
Synchronous RTL
Synchronous
Desynchronized

Cycle 4.4ns Power 70.9mW Area 372,656?m
Cycle 4.45ns Power 71.2mW Area 378,058?m
51
DLX lessons. Positive
  • Asynchronous design with no area, power, delay
    penalties
  • 30 less EMI
  • Partial tolerance of variability
  • (matched delays scale with the rest of the
    gates)
  • Binning!!!

req
Treq gt Tclk ? Error
52
DLX lessons. Negative
  • Asynchronous design with no area, power, delay
    advantage
  • Clock power is saved but latched designs have
    higher loads
  • PR constraints of de-sync design are non-trivial
  • Matched delay variability might hurt

Hard work to come out even with synchronous
53
Can we do better?
  • Clustering
  • Timing
  • optimization

S
M
M
S
  • Retiming of
  • M-latches

54
Sliding the trade-off curve
Automation efforts
QDI fine-grain pipelining
Template-based gate-level pipelining
QDI datapath
NCL, phased logic
Bundled data
desynchronization
EMI, skew penalty
Variability
Average speed
gates blocks
55
Introduction to NCL
2-phase functioning (evaluate (DATA) precharge
(NULL))
Self-timed register interaction (acknowledgement
of phases)
Reg.
Reg.
Combinational logic
CD
NULL
Micropipeline with delay-insensitive (DI) datapath
56
NCL Design Flow
57
First Attempt. Pattern Matching
(delay-insensitive 2-rail implementation)
Huge area penalty!!!
58
From 2 to 3-rail Scheme
Not DI scheme!!!
59
From 2 to 3-rail Scheme
Rationale behind delay-insensitivity of 3-rail
scheme
  • 2-rail circuit is hazard-free under monotonic
    input changes
  • All inputs changes are observable at outputs

60
NCLX flow (MUX )
61
NCL lessons. Positive
  • Very low EMI
  • High security of computation
  • Automatic stand-by mode
  • Tolerance to variability

62
NCL lessons. Negative
  • Big area overhead 2.7-3.0x
  • No performance advantage
  • (average case performance is swallowed by the
    penalty from NULL)
  • Completion introduces further penalties (power
    and delay)

63
Can we do better?
  • Timing optimization of completion network
  • (may recover about 25 area and power)
  • Partial recovery of single-rail nodes in
    datapath
  • Fast NULL
  • 4-rail data communication to save power

64
Phased Logic
Linden94
Even Phase
00
11
LSB is value bit (v) MSB is timing bit (t)
Odd Phase
01
10
Value 1
Value 0
t
v
odd1
odd0
even1
odd1
even0
even0
even1

A signal changes phase or value (only one bit
changes)
65
Phased logic gate
A PL gate has an internal state Even or Odd. A
PL gate fires when all inputs match the gate
phase.
E
GatePhase E
O
Gate is not ready to fire
O
After Firing
Gate ready to fire
E
E
GatePhase O
GatePhase E
E
O
E
E
66
LUT-4 based implementation
D
-
latch

a_v

new_v
b_v


v

LUT4

D

Q


c_v

d_v

Q

EN


R r
-
bit

Input completion detection


fi


a_v

v_rbit

reset

a_t

D
-
lat
ch

b_v



gate_phase
new_t


delay
t

b_t


D

Q

G1


C

c_v


t_b


Q


c_t
EN

d_v

out_phase gate_phase

-

reset

R r
bit
d_t

G
2




fo

t_rbit

reset


fo_b

out_phase

G3


  • Functionality v(a_v, b_v, c_v, d_v) Phase
    a_t, b_t, c_t, d_t, t

Area penalty!
67
DI-datapath summary
  • NCL and PL show a way to tolerate variability
  • Both have significant penalties
  • May be good for niche applications (smart cards,
    mixed signals)
  • Average case speed is masked by DI-coordination
    overhead

New optimization approaches
Fine-grain pipelining
68
Sliding the trade-off curve
Automation efforts
QDI fine-grain pipelining
Template-based gate-level pipelining
QDI datapath
NCL, phased logic
Bundled data
desynchronization
EMI, skew penalty
Variability
Average speed
gates blocks
Write a Comment
User Comments (0)
About PowerShow.com