Title: Logic design of asynchronous circuits
1Logic design ofasynchronous circuits
- Part III
- Advanced topics on synthesis
2Outline
- Logic decomposition
- Hazard-free decomposition
- Signal insertion
- Technology mapping
- Optimization based on timing information
- Relative timing
- Timing assumptions and constraints
- Other synthesis paradigms
- HDLs, CSP, burst-mode, ...
3Design flow
Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
4No Hazards
5Decomposition May Lead to Hazards
1000
1100
1100
0100
0110
6Decomposition
- Acknowledgement
- Global acknowledgement
- Generating candidates
- Hazard-free signal insertion
- Event insertion
- Signal insertion
7Global acknowledgement
8How about 2-input gates ?
9How about 2-input gates ?
c
z
b
a
a
y
b
d
10How about 2-input gates ?
0
c
0
z
b
a
a
y
b
d
11How about 2-input gates ?
c
z
b
a
a
y
b
d
12How about 2-input gates ?
c
z
y
d
13Strategy for logic decomposition
- Each decomposition defines a new internal signal
- Method Insert new internal signals such that
- After resynthesis, some large gates are
decomposed - The new specification is hazard-free
- Generate candidates for decomposition using
standard logic factorization techniques - Algebraic factorization
- Boolean factorization (boolean relations)
14Decomposition example
15Decomposition example
y-
1001
1011
z-
w-
1000
0001
w
y
x
w-
z-
1010
0000
0101
0011
w-
z-
y
x
0010
0100
x-
y
x
z
0110
0111
16Decomposition example
s1
y-
s
1001
1011
z-
s-
w
1001
1000
z-
s-
y
w-
0011
0001
1000
1010
y
s-
x
w-
z-
x-
0000
0101
1010
w-
z-
y
x
0111
0010
0100
s
y
x
s0
z
0111
0110
17Decomposition example
s1
y-
y-
1001
1011
z-
s-
s-
w
1001
1000
z-
s-
y
w-
z-
w-
w
0011
0001
1000
1010
y
s-
x
w-
z-
x-
0000
0101
1010
y
x
x-
w-
z-
y
x
0111
0010
0100
s
s
y
x
z
s0
z
0111
0110
18Decomposition example
y-
1011
z-
w-
1000
0001
w
y
x
w-
z-
1010
0000
0101
0011
w-
z-
y
x
0010
0100
x-
y
x
z
0110
0111
yz1
yz0
19Decomposition example
y-
y-
s1
1001
1011
s-
s-
w
1001
z-
w-
0011
0001
1000
z-
w-
w
y
x
w-
z-
x-
0000
0101
1010
w-
z-
y
x
y
x
x-
0111
0010
0100
s
y
x
s
s0
z
z
0111
0110
z- is delayed by the new transition s- !
20Decomposition example
y-
s1
1001
1011
s-
w
1001
z-
w-
0011
0001
1000
y
x
w-
z-
x-
0000
0101
1010
w-
z-
y
x
0111
0010
0100
s
y
x
y
y
y
y
y
y
y
s0
z
0111
0110
21Decomposition (Algebraic, Boolean relations)
F
22Decomposition (Algebraic, Boolean relations)
F
until no more progress
Hazard-free ? (Event insertion)
23Signal insertion for function F
Insertion by input borders
State Graph
24Event insertion
25Event insertion
SR(x)
b
x
x
x
x
26Properties to preserve
a is persistent
27Boolean decomposition
f F (x1,,xn)
f G(H(x1,,xn))
Our problem Given F and G, find H
28h1
f
h2
This is a Boolean Relation
29a
F
c
y
d
30a
c
y
d
31a
c
y
d
a
32a
c
y
d
a
d
c
33Technology mapping
- Merging small gates into larger gates introduces
no new hazards - Standard synchronous technique can be applied,
e.g. BDD-based boolean matching - Handles sequential gates and combinational
feedbacks - Due to hazards there is no guarantee to find
correct mapping (some gates cannot be decomposed) - Timing-aware decomposition can be applied in
these rare cases
34Design flow
Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
35Timing assumptions in design flow
- Speed-independent wire delays after a
forksmaller than fan-out gate delays - Burst-mode circuit stabilizes betweentwo
changes at the inputs - Timed circuits Absolute bounds on gate /
environment delays are known a priori (before
physical design)
36Relative Timing Circuits
- Assumptions a before b
- for concurrent events reduces reachable state
space - for ordered events permits early enabling
- both increase dont care space for logic
synthesis gt simplify logic (better area and
timing) - Assume - if useful - guarantee approach
assumptions are used by the tool to derive a
circuit and required timing constraints that must
be met in physical design flow - Applied to design of the Rotating Asynchronous
Pentium Processor(TM) Instruction Decoder
(K.Stevens, S.Rotem et al. Intel Corporation)
37Relative Timing Asynchronous Circuits
Speed-independent C-element
b
c
a
38State Graph (Read cycle)
DSr
DTACK-
LDS
LDTACK-
LDTACK-
LDTACK-
DSr
DTACK-
LDS-
LDS-
LDS-
LDTACK
DSr
DTACK-
D
D-
DSr-
DTACK
39Lazy Transition Systems
ER (LDS)
LDS
LDS-
LDS-
LDS-
FR (LDS-)
DTACK-
ER (LDS-)
Event LDS- is lazy firing subset of enabling
40Timing assumptions
- (a before b) for concurrent events
concurrency reduction for firing and
enabling - (a before b) for ordered events
early enabling - (a simultaneous to b wrt c) for triples of
events combination of the above
41Speed-independent Netlist
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
42Adding timing assumptions (I)
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
43Adding timing assumptions (I)
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
44State space domain
DSr
LDTACK-
45State space domain
DSr
LDTACK-
46State space domain
DSr
LDTACK-
Two more unreachable states
47Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
0
0
0
0/1?
-
-
48Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
-
0
0
1
-
-
One more DC vector for all signals
One state conflict is removed
49Netlist with one constraint
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
50Netlist with one constraint
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
51Timing assumptions
- (a before b) for concurrent events
concurrency reduction for firing and
enabling - (a before b) for ordered events
early enabling - (a simultaneous to b wrt c) for triples of
events combination of the above
52Ordered events early enabling
b
b
a
c
c
F
G
a
b
c
53Adding timing assumptions (II)
DSr
DTACK-
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
DSr
LDTACK
54State space domain
LDS-
D-
DSr-
Reachable space is unchanged
For LDS- enabling can be changed in one state
55Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
-
0
0
1
-
-
56Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
-
1
1
-
-
-
-
-
0
0
-
0
0
1
-
-
One more DC vector for one signal LDS
If used LDS DSr, otherwise LDS DSr D
57Before early enabling
DSr
DTACK-
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
DSr
LDTACK
58Netlist with two constraints
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
DSr
LDS
LDTACK
Both timing assumptions are used for optimization
and become constraints
59Value of Relative Timing
- RT circuits provides up to 2-3x (1.3-2x)
delayarea reduction with respect to SI circuits
synthesized without (with) concurrency reduction - Automatic generation of timing assumptions gt
foundation for automatic synthesis of RT circuits
with area/performance comparable/better than
manual - Back-annotation of timing constraints gt minimal
required timing information for the back-end
tools - Timing-aware state encoding allows significant
area/performance optimization
60Design Flow with Timing
Specification(STG user assumptions)
Reachability analysis
Lazy State Graph
Timing-aware state encoding
Automatic Timing Assumptions
Lazy SG withCSC
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Required Timing Constraints
Gate netlist
61FIFO example
ro
li
FIFO
lo
ri
62Speed-Independent Implementation
without concurrency reduction 3 state signals are
required
63SI implementation with concurrency reduction
x
li
ro-
lo-
ri-
ri
li
-
gC
x
gC
ro
lo
li-
lo
ro
ri
x-
64RT implementation
ri
li
x
lo
ro
65RT implementation
x
li
lo-
ro-
ri-
To satisfy the constraint Delay(x- ) lt Delay
(ri )
and Delay(lo) Delay(x- ) lt Delay(ro ) Delay
(ri )
li-
lo
ro
ri
x-
All constraints are either satisfied by default
or easy to satisfy by sizing
66Other synthesis paradigms outline
- Synthesis from HDL (Verilog) Lavagno et al,
Async00 - Subset for asynchronous specification
- Data-path/control partitioning
- Circuit architecture. Control generation
- Synthesis from asynchronous HDL (CSP, Tangram)
- CSP for control generation A. Martin et al,
Caltech - Tangram for silicon compilation K. van Berkel et
al, Philips - Control synthesis using FSMs K. Yun, S. Nowick
- Burst-mode machines
- Comparison with STGs
67Motivation
- Language-based design key enabler to synchronous
logic success - Use HDL as single language for
- specification
- logic simulation and debugging
- synthesis
- post-layout simulation
- HDL must support multiple levels of abstraction
68Control-data partitioning
- Splitting of asynchronous control and synchronous
data path - Automated insertion of bundling delays
CONTROL UNIT
DATA PATH
request
delay
acknowledge
69Design flow
HDL specification
Synthesizable HDL (data)
Control/data splitting
STG (control)
Synthesis (Synopsys)
Logic delays
Synthesis (petrify)
Timing analysis (Synopsys)
HDL implementation
Logic implementation
Delay insertion
70Asynchronous Verilog subset by example
always begin wait(start) R SMP 3 RES
SMP 4 R if(RES7 1) RES 0 else
begin if(RES6 1) RES 1 end done
1 wait(!start) done 0 end
SMP
R
R E S
RES
C.U.
done
start
- begin-end for sequencing, fork-join for
concurrency, if-else for input choice - Only structured mix of sequencing, concurrency
and choice can be specified
71Synthesis from asynchronous HDL
- CSP based languages
- CSP communicating sequential processes Hoare
- Two synthesis techniques
- based on program transformations Caltech
- based on direct compilation Philips
- Tools are more mature than for asynchronous
synthesis from standard HDL - Complete shift in design methodology is required
72Using CSP for control generation
- After li goes high do full handshake at the
right, then complete handshake at the left and
iterate.
ro
li
Q element
ri
lo
STG
li
ro
ri
ro-
ri-
lo
li-
lo-
liroriro-not rilonot lilo-
CSP
- sequencing operator
- ro ro goes high ro- ro goes low
- li wait until li is high not li wait
until li is low
73Using CSP for control generation
liroriro-not rilonot lilo-
CSP
weak
ri
Production rules li -gt ro ri -gt ro- not ri
-gt lo not li -gt lo-
ro
li
- Conflict ro and ro- are not mutually exclusive
(since ri and li are not) - Eliminate conflict by state signal insertion (
CSC)
74Conflict elimination
lirorixxro-not rilonot
lix-not xlo-
CSP
Production rules not x and li -gt ro x or not
li -gt ro- x and not ri -gt lo not x or ri -gt
lo- ri -gt x not li -gt x-
ro
li
x
FF
not x
lo
ri
75Buffer example in Tangram
(a?byte b!byte) begin x0 var byte
forever do a?x0 b!x0 od end
a
b
Buffer
passive port
Each circle mapped to a netlist
active port
Q element
a
b
Data path
76Summary
- Tangram program is partitioned into data path and
control - Data path is implemented as dual or single rail
- Control is mapped to composition of standard
elements ( etc) - Each standard element is mapped to a circuit
- Post-optimization is done
- Composing islands of control elements and
re-synthesis with STG can give more aggressive
optimization - Philips made a few chips using Tangram, including
a product 8051 micro-controller in low-power
pager Muna (25 wks battery life from one AAA
battery) - Similar approach used in Balsa (Manchester Univ.)
77Burst mode FSM
- Close to synchronous FSMs with binary encoded I/O
- Work in bursts
- Input transitions fire
- Output transitions fire
- State signals change
- Mostly limited to fundamental mode next input
burst cannot arrive before stabilization at the
outputs
s1
b-/x-
ab/y
a-/xy-
s2
s4
c-/y
c/y-
s3
78Extended Burst mode
- Directed dont cares (b) some concurrency is
allowed for input transitions that do not
influence an output burst - Conditional guards ltbgt if b1 then
s1
b-/x-
ab/y
ltbgta-/xy-
s2
s4
c-/y
ltbgtc/y-
s3
79Synthesis of XBM
- Next state and output functions free of
functional and logic hazards - Sequential feedbacks should not introduce new
hazards - State assignment
- one state of the BM spec to one layer of Karnaugh
map - compatible layers are merged
- layers are compatible if merging does not
introduce CSC violations or hazards - Layers are encoded using race free encoding
80XBM and STG
x-
a
b
s1
b-/x-
ab/y
y
ltbgta-/xy-
s2
s4
c-/y
ltbgtc/y-
a-
c
s3
eps
y-
c-
y-
x
y
b-