Title: Introduction to asynchronous circuit design: specification and synthesis
1Introduction to asynchronous circuit
designspecification and synthesis
- Jordi Cortadella, Universitat Politècnica de
Catalunya, Spain - Michael Kishinevsky, Intel Corporation, USA
- Alex Kondratyev, Theseus Logic, USA
- Luciano Lavagno, Università di Udine, Italy
2Outline
- I Introduction to basic concepts onasynchronous
design - II Synthesis of control circuits from STGs
- III Advanced topics on synthesis of
controlcircuits from STGs - IV Synthesis from HDL and other synthesis
paradigmsNote no references in the tutorial
3Introduction toasynchronous circuit design
specification and synthesis
- Part I
- Introduction to basic concepts on asynchronous
circuit design
4Outline
- What is an asynchronous circuit ?
- Asynchronous communication
- Asynchronous logic blocks
- Micropipelines
- Control specification and implementation
- Delay models
- Why asynchronous circuits ?
5Synchronous circuit
R
R
R
R
CL
CL
CL
CLK
Implicit synchronization
6Asynchronous circuit
Ack
R
R
R
R
CL
CL
CL
Req
Explicit synchronization Req/Ack handshakes
7Synchronous communication
1
1
0
0
1
0
- Clock edges determine the time instants where
data must be sampled - Data wires may glitch between clock edges
(set-up/hold times must be satisfied) - Data are transmitted at a fixed rate(clock
frequency)
8Dual rail
1
1
1
0
0
0
- Two wires per bit
- 00 spacer, 01 0, 10 1
- n-bit data communication requires 2n wires
- Each bit is self-timed
- Other delay-insensitive codes exist
9Bundled data
1
1
0
0
1
0
- Validity signal
- Similar to an aperiodic local clock
- n-bit data communication requires n1 wires
- Data wires may glitch when no valid
- Signaling protocols
- level sensitive (latch)
- transition sensitive (register) 2-phase /
4-phase
10Example memory read cycle
Valid address
Address
A
A
Valid data
Data
D
D
- Transition signaling, 4-phase
11Example memory read cycle
Valid address
Address
A
A
Valid data
Data
D
D
- Transition signaling, 2-phase
12Outline
- What is an asynchronous circuit ?
- Asynchronous communication
- Asynchronous logic blocks
- Micropipelines
- Control specification and implementation
- Delay models
- Why asynchronous circuits ?
13Asynchronous modules
DATA PATH
Data IN
Data OUT
start
done
req in
req out
CONTROL
ack in
ack out
- Signaling protocolreqin start computation
done reqout ackout ackinreqin- start-
reset done- reqout- ackout-
ackin-(more concurrency is also possible, e.g.
by overlapping the return-to-zero phase of step
i-1 with the evaluation phase of step i)
14Completion detection
15Asynchronous latches C element
Vdd
A
B
Z
A
B
Z
A
B
Z
A
B
Gnd
16Dual-rail logic
Dual-rail AND gate
Valid behavior for monotonic environment
17Differential cascode voltage switch logic
start
Z.t
Z.f
done
A.t
A.f
B.f
C.f
B.t
C.t
start
3-input AND/NAND gate
18Bundled-data logic blocks
logic
start
done
delay
Conventional logic matched delay
19Micropipelines (Sutherland 89)
Aout
Ain
C
L
L
L
L
logic
logic
logic
Rin
Rout
20Data-path / Control
L
L
L
L
logic
logic
logic
Rin
Rout
CONTROL
Ain
Aout
21Outline
- What is an asynchronous circuit ?
- Asynchronous communication
- Asynchronous logic blocks
- Micropipelines
- Control specification and implementation
- Delay models
- Why asynchronous circuits ?
22Control specification
A
A
B
B
A-
A input B output
B-
23Control specification
A
B
B
A
A-
B-
24Control specification
A
B-
B
A
A-
B
25Control specification
A
B
A
C
C
B
A-
B-
C-
26Control specification
A
B
A
C
C
A-
B
B-
C-
27Control specification
28A simple filter specification
IN
Rin
Ain
y 0 loop x READ (IN) WRITE (OUT,
(xy)/2) y x end loop
filter
Aout
Rout
OUT
29A simple filter block diagram
- x and y are level-sensitive latches (transparent
when R1) - is a bundled-data adder (matched delay between
Ra and Aa) - Rin indicates the validity of IN
- After Ain the environment is allowed to change
IN - (Rout,Aout) control a level-sensitive latch at
the output
30A simple filter control spec.
31A simple filter control impl.
32Control observable behavior
z
Ain-
Rin
Rx
Ry-
Rx-
Ax-
z-
Ay
Ay-
Ax
Ra
Aa
Rout
Aout
z
Rout-
Aout-
Ry
33Outline
- What is an asynchronous circuit ?
- Asynchronous communication
- Asynchronous logic blocks
- Micropipelines
- Control specification and implementation
- Delay models
- Why asynchronous circuits ?
34Taking delays into account
- Delay assumptions
- Environment 3 times units
- Gates 1 time unit
events x ? x- ? y ? z ? z- ? x- ? x ? z-
? z ? y- ?
time 3 4 5 6 7
9 10 12 13 14
35Taking delays into account
x
x
y
z
z
very slow
Delay assumptions unbounded delays
events x ? x- ? y ? z ? x- ? x ? y-
failure !
time 3 4 5 6 9
10 11
36Gate vs wire delay models
- Gate delay model delays in gates, no delays in
wires - Wire delay model delays in gates and wires
37Delay models for async. circuits
- Bounded delays (BD) realistic for gates and
wires. - Technology mapping is easy, verification is
difficult - Speed independent (SI) Unbounded (pessimistic)
delays for gates and negligible (optimistic)
delays for wires. - Technology mapping is more difficult,
verification is easy - Delay insensitive (DI) Unbounded (pessimistic)
delays for gates and wires. - DI class (built out of basic gates) is almost
empty - Quasi-delay insensitive (QDI) Delay insensitive
except for critical wire forks (isochronic
forks). - Formally, it is the same as speed independent
- In practice, different synthesis strategies are
used
BD
SI ? QDI
38Motivation (designers view)
- Modularity
- Plug-and-play interconnectivity
- Reusability
- IPs with abstract timing behaviors
- High-performance
- Average-case performance (no worst-case delay
synchronization) - No clock skew (local timing assumptions instead)
- Many interfaces are asynchronous
- Buses, networks, ...
39Motivation (technology aspects)
- Low power
- Automatic clock gating
- Electromagnetic compatibility
- No peak currents around clock edges
- Robustness
- High immunity to technology and environment
variations (in-die variations, temperature, power
supply, ...)
40Problems
- Concurrent models for specification
- CSP, Petri nets, ...
- Difficult to design
- Hazards, synchronization
- Complex timing analysis
- Difficult to estimate performance
- Difficult to test
- No way to stop the clock
41But we have some success stories...
- Philips
- AMULET microprocessors
- Sharp
- Intel (RAPPID)
- IBM (interlocked pipeline)
- Start-up companies
- Theseus Logic, Cogency, ADD
- ...
42Introduction toasynchronous circuit design
specification and synthesis
- Part II
- Synthesis of control circuitsfrom STGs
43Outline
- Overview of the synthesis flow
- Specification
- State graph and next-state functions
- State encoding
- Implementability conditions
- Speed-independent circuit
- Complex gates
- C-element architecture
44Design flow
45x
x
y
y
z
z
x-
z
x
y
z-
y-
Signal Transition Graph (STG)
46(No Transcript)
47(No Transcript)
48Next-state functions
49x
y
z
50Outline
- Overview of the synthesis flow
- Specification
- State graph and next-state functions
- State encoding
- Implementability conditions
- Speed-independent circuit
- Complex gates
- C-element architecture
51Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
52VME bus
53STG for the READ cycle
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
LDS
DSr
VME Bus Controller
LDTACK
DTACK
54Choice Read and Write cycles
55Choice Read and Write cycles
56Choice Read and Write cycles
57Choice Read and Write cycles
58Circuit synthesis
- Goal
- Derive a hazard-free circuitunder a given delay
model andmode of operation
59Outline
- Overview of the synthesis flow
- Specification
- State graph and next-state functions
- State encoding
- Implementability conditions
- Speed-independent circuit
- Complex gates
- C-element architecture
60Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
61STG for the READ cycle
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
LDS
DSr
VME Bus Controller
LDTACK
DTACK
62Binary encoding of signals
DSr
DTACK-
LDS
LDTACK-
LDTACK-
LDTACK-
DSr
DTACK-
LDS-
LDS-
LDS-
LDTACK
DSr
DTACK-
D
D-
DSr-
DTACK
63Binary encoding of signals
DSr
DTACK-
10000
LDS
LDTACK-
LDTACK-
LDTACK-
DSr
DTACK-
10010
LDS-
LDS-
LDS-
LDTACK
DSr
DTACK-
10110
01110
10110
D
D-
DSr-
DTACK
(DSr , DTACK , LDTACK , LDS , D)
64Excitation / Quiescent Regions
65Next-state function
0 ? 1
0 ? 0
1 ? 1
1 ? 0
66Karnaugh map for LDS
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
0
0
0
0/1?
-
-
67Outline
- Overview of the synthesis flow
- Specification
- State graph and next-state functions
- State encoding
- Implementability conditions
- Speed-independent circuit
- Complex gates
- C-element architecture
68Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
69Concurrency reduction
LDS
LDS-
LDS-
LDS-
10110
10110
70Concurrency reduction
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
71State encoding conflicts
LDS
LDTACK-
LDS-
LDTACK
10110
10110
72Signal Insertion
LDTACK-
LDS
LDS-
LDTACK
101101
101100
D-
DSr-
73Outline
- Overview of the synthesis flow
- Specification
- State graph and next-state functions
- State encoding
- Implementability conditions
- Speed-independent circuit
- Complex gates
- C-element architecture
74Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
75Complex-gate implementation
- Under what conditions does a hazard-free
implementation exist?
76Implementability conditions
- Consistency
- Rising and falling transitions of each signal
alternate in any trace - Complete state coding (CSC)
- Next-state functions correctly defined
- Persistency
- No event can be disabled by another event (unless
they are both inputs)
77Implementability conditions
- Consistency CSC persistency
- There exists a speed-independent circuit that
implements the behavior of the STG(under the
assumption that any Boolean function can be
implemented with one complex gate)
78Persistency
a
c
b
is this a pulse ?
Speed independence ? glitch-free output behavior
under any delay
79Speed-independent implementations
- How can the implementability conditions
- Consistency
- Complete state coding
- Persistency
- be satisfied?
- Standard circuit architectures
- Complex (hazard-free) gates
- C elements with monotonic covers
- Standard gates and latches
80(No Transcript)
81ER(d)
ER(d-)
82ab
cd
00
01
11
10
0
0
0
0
00
1
0
01
1
1
1
1
11
1
10
Complex gate
83Implementation with C elements
? S ? z ? S- ? R ? z- ? R- ?
- S (set) and R (reset) must be mutually exclusive
- S must cover ER(z) and must not intersect
ER(z-) ? QR(z-) - R must cover ER(z-) and must not intersect
ER(z) ? QR(z)
84ab
cd
00
01
11
10
0
0
0
0
00
1
0
01
1
1
1
1
11
1
10
S
d
C
R
85but ...
S
d
C
R
86Starting from state 0000 (R1 and S0)
a R- b a- c S d
S
d
C
R
87ab
cd
00
01
11
10
0
0
0
0
00
1
0
01
1
1
1
1
11
1
10
Monotonic covers
88C-based implementations
89Synthesis exercise
1011
0011
0111
Derive circuits for signals x and z (complex
gates and monotonic covers)
90Synthesis exercise
1011
wx
yz
00
01
11
10
-
1
1
0
00
0011
-
1
1
0
01
-
0
0
0
11
-
1
1
0
10
0111
Signal x
91Synthesis exercise
1011
wx
yz
00
01
11
10
-
0
0
0
00
0011
-
0
0
0
01
-
1
1
1
11
-
1
0
0
10
0111
Signal z
92Introduction toasynchronous circuit design
specification and synthesis
- Part III
- Advanced topics on synthesis of control circuits
from STGs
93Outline
- Logic decomposition
- Hazard-free decomposition
- Signal insertion
- Technology mapping
- Optimization based on timing information
- Relative timing
- Timing assumptions and constraints
- Automatic generation of timing assumptions
94Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
95No Hazards
96Decomposition May Lead to Hazards
1000
1100
1100
0100
0110
97Decomposition
- Acknowledgement
- Generating candidates
- Hazard-free signal insertion
- Event insertion
- Signal insertion
98Global acknowledgement
99How about 2-input gates ?
100How about 2-input gates ?
c
z
b
a
a
y
b
d
101How about 2-input gates ?
0
c
0
z
b
a
a
y
b
d
102How about 2-input gates ?
c
z
b
a
a
y
b
d
103How about 2-input gates ?
c
z
y
d
104Strategy for logic decomposition
- Each decomposition defines a new internal signal
- Method Insert new internal signals such that
- After resynthesis, some large gates are
decomposed - The new specification is hazard-free
- Generate candidates for decomposition using
standard logic factorization techniques - Algebraic factorization
- Boolean factorization (boolean relations)
105Decomposition example
106y-
1001
1011
z-
w-
1000
0001
w
y
x
w-
z-
1010
0000
0101
0011
w-
z-
y
x
0010
0100
x-
y
x
z
0110
0111
107s1
y-
y-
1001
1011
z-
s-
s-
w
1001
1000
z-
s-
y
w-
z-
w-
w
0011
0001
1000
1010
y
s-
x
w-
z-
x-
0000
0101
1010
y
x
x-
w-
z-
y
x
0111
0010
0100
s
s
y
x
z
s0
z
0111
0110
108s1
y-
s
1001
1011
z-
s-
w
1001
1000
z-
s-
y
w-
0011
0001
1000
1010
y
s-
x
w-
z-
x-
0000
0101
1010
w-
z-
y
x
0111
0010
0100
s
y
x
s0
z
0111
0110
109y-
1011
z-
w-
1000
0001
w
y
x
w-
z-
1010
0000
0101
0011
w-
z-
y
x
0010
0100
x-
y
x
z
0110
0111
yz1
yz0
110y-
y-
s1
1001
1011
s-
s-
w
1001
z-
w-
0011
0001
1000
z-
w-
w
y
x
w-
z-
x-
0000
0101
1010
w-
z-
y
x
y
x
x-
0111
0010
0100
s
y
x
s
s0
z
z
0111
0110
z- is delayed by the new transition s- !
111y-
s1
1001
1011
s-
w
1001
z-
w-
0011
0001
1000
y
x
w-
z-
x-
0000
0101
1010
w-
z-
y
x
0111
0010
0100
s
y
x
y
y
y
y
y
y
y
s0
z
0111
0110
112Decomposition (Algebraic, Boolean relations)
F
113Decomposition (Algebraic, Boolean relations)
F
until no more progress
Hazard-free ? (Event insertion)
114Signal insertion for function F
Insertion by input borders
State Graph
115Event insertion
116Event insertion
SR(x)
b
x
x
x
x
117Properties to preserve
a is persistent
118Boolean decomposition
f F (x1,,xn)
f G(H(x1,,xn))
Our problem Given F and G, find H
119h1
f
h2
This is a Boolean Relation
120a
F
c
y
d
121a
c
y
d
122a
c
y
d
a
123a
c
y
d
a
d
c
124Technology mapping
- Merging small gates into larger gates introduces
no new hazards - Standard synchronous technique can be applied,
e.g. BDD-based boolean matching - Handles sequential gates and combinational
feedbacks - Due to hazards there is no guarantee to find
correct mapping (some gates cannot be decomposed) - Timing-aware decomposition can be applied in
these rare cases
125Specification(STG)
Reachability analysis
State Graph
State encoding
SG withCSC
Design flow
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Gate netlist
126Timing assumptions in design flow
- Speed-independent wire delays after a
forksmaller than fan-out gate delays - Burst-mode circuit stabilizes betweentwo
changes at the inputs - Timed circuits Absolute bounds on gate /
environment delays are known a priori (before
physical design)
127Relative Timing Circuits
- Assumptions a before b
- for concurrent events reduces reachable state
space - for ordered events permits early enabling
- both increase dont care space for logic
synthesis gt simplify logic (better area and
timing) - Assume - if useful - guarantee approach
assumptions are used by the tool to derive a
circuit and required timing constraints that must
be met in physical design flow - Applied to design of the Rotating Asynchronous
Pentium Processor(TM) Instruction Decoder
(K.Stevens, S.Rotem et al. Intel Corporation)
128Relative Timing Asynchronous Circuits
Speed-independent C-element
b
c
a
129State Graph (Read cycle)
DSr
DTACK-
LDS
LDTACK-
LDTACK-
LDTACK-
DSr
DTACK-
LDS-
LDS-
LDS-
LDTACK
DSr
DTACK-
D
D-
DSr-
DTACK
130Lazy Transition Systems
ER (LDS)
LDS
LDS-
LDS-
LDS-
FR (LDS-)
DTACK-
ER (LDS-)
Event LDS- is lazy firing subset of enabling
131Timing assumptions
- (a before b) for concurrent events
concurrency reduction for firing and
enabling - (a before b) for ordered events
early enabling - (a simultaneous to b wrt c) for triples of
events combination of the above
132Speed-independent Netlist
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
133Adding timing assumptions (I)
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
134Adding timing assumptions (I)
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
135State space domain
DSr
LDTACK-
136State space domain
DSr
LDTACK-
137State space domain
DSr
LDTACK-
Two more unreachable states
138Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
0
0
0
0/1?
-
-
139Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
-
0
0
1
-
-
One more DC vector for all signals
One state conflict is removed
140Netlist with one constraint
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
map
csc
DSr
LDTACK
141Netlist with one constraint
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
142Timing assumptions
- (a before b) for concurrent events
concurrency reduction for firing and
enabling - (a before b) for ordered events
early enabling - (a simultaneous to b wrt c) for triples of
events combination of the above
143Ordered events early enabling
b
b
a
c
c
F
G
a
b
c
144Adding timing assumptions (II)
DSr
DTACK-
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
DSr
LDTACK
145State space domain
LDS-
D-
DSr-
Reachable space is unchanged
For LDS- enabling can be changed in one state
146Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
1
1
1
-
-
-
-
-
0
0
-
0
0
1
-
-
147Boolean domain
LDS 1
LDS 0
-
-
-
0
1
-
0
1
-
-
-
-
-
-
-
-
-
1
1
-
-
-
-
-
0
0
-
0
0
1
-
-
One more DC vector for one signal LDS
If used LDS DSr, otherwise LDS DSr D
148Before early enabling
DSr
DTACK-
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
LDS
DSr
LDTACK
149Netlist with two constraints
DTACK-
DSr
LDS
LDTACK
D
DTACK
DSr-
D-
LDS-
LDTACK-
D
DTACK
DSr
LDS
LDTACK
Both timing assumptions are used for optimization
and become constraints
150Deriving automatic timing assumptions
- Rule I (out of 6) a,b - non-input events
- Untimed ordering ab and a enabled before b,
but not vice versa - Derived assumption a fires before b
- Justification delay of a gate can be made
shorter than delay of two (or more) gates del(a)
lt del(c)del(b)
c
b
a
a
a
c
b
b
151Deriving automatic timing assumptions
- Rule I (out of 6) a,b - non-input events
- Untimed ordering (ab) and (a enabled before
b), but not vice versa - Derived assumption a fires before b
- Justification delay of a gate can be made
shorter than delay of two (or more) gates
c
b
a
a
a
c
b
b
- Effect I a state becomes DC for all signals
152Deriving automatic timing assumptions
- Rule I (out of 6) a,b - non-input events
- Untimed ordering (ab) and (a enabled before
b), but not vice versa - Derived assumption a fires before b
- Justification delay of a gate can be made
shorter than delay of two (or more) gates
c
b
a
a
a
c
b
b
- Effect II another state becomes local DC for
signal of event b
153Backannotation of Timing Constraints
- Timed circuits require post-verification
- Can synthesis tools help ?
- Report the least stringent set of timing
constraints required for the correctness of the
circuit - Not all initial timing assumptions may be
required - Petrify reports a set of constraints for order of
firing that guarantee the circuit correctness
154Timing constraints generation
a
c
b
d
d
d
d
b
c
a
e
e
e
c
b
Assumptions d before b and c before e and a
before d
155Timing constraints generation
a
c
b
d
d
d
d
b
c
a
e
e
e
c
b
Assumptions d before b and c before e and a
before d
156Timing constraints generation
a
c
b
d
d
d
d
b
c
a
e
e
e
c
b
Assumptions d before b and c before e and a
before d
157Timing constraints generation
1
a
c
b
d
d
d
d
b
c
a
Incorrect behavior
e
e
e
c
b
2
Assumptions d before b and c before e and a
before d
158Covering incorrect behavior
3
1
a
c
b
d
d
d
d
b
c
a
5
e
e
e
c
b
2
4
Assumptions d before b and c before e and a
before d
Other possible constraints remove states from
assumption domain gt invalid
159Covering incorrect behavior
3
1
a
c
b
d
d
d
d
b
c
a
5
c before e
e
e
e
c
b
2, 4
2
4
Assumptions d before b and c before e and a
before d
Constraints for the minimal cost solution d
before c and c before e
160Timing aware state encoding
- Solve only state conflicts reachable in the RT
assumptions domain - Generate automatic timing assumptions for
inserted state signals gt state signals can be
implemented as RT logic - State variables inserted concurrently with I/O
events gt latency and cycle time reduction
161Value of Relative Timing
- RT circuits provides up to 2-3x (1.3-2x)
delayarea reduction with respect to SI circuits
synthesized without (with) concurrency reduction - Automatic generation of timing assumptions gt
foundation for automatic synthesis of RT circuits
with area/performance comparable/better than
manual - Back-annotation of timing constraints gt minimal
required timing information for the back-end
tools - Timing-aware state encoding allows significant
area/performance optimization
162Design Flow with Timing
Specification(STG user assumptions)
Reachability analysis
Lazy State Graph
Timing-aware state encoding
Automatic Timing Assumptions
Lazy SG withCSC
Boolean minimization
Next-state functions
Logic decomposition
Decomposed functions
Technology mapping
Required Timing Constraints
Gate netlist
163FIFO example
ro
li
FIFO
lo
ri
164Speed-Independent Implementation
without concurrency reduction 3 state signals are
required
165SI implementation with concurrency reduction
x
li
ro-
lo-
ri-
ri
li
-
gC
x
gC
ro
lo
li-
lo
ro
ri
x-
166RT implementation
ri
li
x
lo
ro
167RT implementation
x
li
lo-
ro-
ri-
To satisfy the constraint Delay(x- ) lt Delay
(ri )
and Delay(lo) Delay(x- ) lt Delay(ro ) Delay
(ri )
li-
lo
ro
ri
x-
All constraints are either satisfied by default
or easy to satisfy by sizing
168Introduction toasynchronous circuit design
specification and synthesis
- Part IV
- Synthesis from HDL
- Other synthesis paradigms
169Outline
- Synthesis from standard HDL (Verilog) L. Lavagno
et al Async00 - Subset for asynchronous specification
- Data-path/control partitioning
- Circuit architecture. Control generation
- Synthesis from asynchronous HDL (CSP, Tangram)
- CSP for control generation A. Martin et al,
Caltech - Tangram for silicon compilation K. van Berkel et
al, Philips - Control synthesis using FSMs K. Yun, S. Nowick
- Burst-mode machines
- Comparison with STGs
- Disclaimer this is NOT a comprehensive review
170Motivation
- Language-based design key enabler to synchronous
logic success - Use HDL as single language for
- specification
- logic simulation and debugging
- synthesis
- post-layout simulation
- HDL must support multiple levels of abstraction
171Control-data partitioning
- Splitting of asynchronous control and synchronous
data path - Automated insertion of bundling delays
CONTROL UNIT
DATA PATH
request
delay
acknowledge
172Design flow
HDL specification
Synthesizable HDL (data)
Control/data splitting
STG (control)
Synthesis (Synopsys)
Logic delays
Synthesis (petrify)
Timing analysis (Synopsys)
HDL implementation
Logic implementation
Delay insertion
173Asynchronous Verilog subset by example
always begin wait(start) R SMP 3 RES
SMP 4 R if(RES7 1) RES 0 else
begin if(RES6 1) RES 1 end done
1 wait(!start) done 0 end
SMP
R
R E S
RES
C.U.
done
start
- begin-end for sequencing, fork-join for
concurrency, if-else for input choice - Only structured mix of sequencing, concurrency
and choice can be specified
174Controller design flow
HDL
Syntax-directed translation
Petri Net
Transformations
Reductions
Trace Expressions
Synthesis
Circuit
175Trace expressions example
176Reduction Example
a
d?a ( b f )
f
b
e
c
c
h
g h?e
d
g
177Transformation concurrency reduction
a
- Concurrency in TE
- b and f have a common
- parallel father
f
b
c
d
178Transformation concurrency reduction
a
f
b
c
d
179Synthesis
- Place-based encoding ( based on a David-cell
approach) - Transformations to improve area and performance
- Structural methods to derive a circuit Pastor
et al. Transactions on CAD, Nov98
180Place-based encoding
p2
p1
p2
p1
1100
p3
t1
ER(t1) 111-
t1
p3
0010
p4
t2
ER(t2) --11
t2
p3-
p4
0001
p4-
181Synthesis example VME bus
ldtack
p2
p1-
LDS
p8-
p11-
p3
lds
D
p1
LDTACK
LDTACK-
DSr
p2-
p7-
p4
p10-
dsr
dtack
D
DTACK-
LDS-
ldtack-
p8
p3-
Place encoding
p11
p5
DTACK
D-
p9-
p6-
dsr-
lds-
dtack-
p4-
DSr-
p9
p6
p10
p7
D-
p5-
182VME bus spec after transforms
ldtack
p2
ldtack
p1-
p8-
p11-
lds
d
p3
lds
D
dtack
p1
dsr
p2-
p7-
dsr-
p9
p9-
ldtack-
p4
p10-
dsr
dtack
ldtack-
p8
lds-
dtack-
Reductions Transforms
p3-
p11
p5
d-
p9-
p6-
dsr-
lds-
dtack-
p4-
p9
p6
p10
p7
D-
p5-
183Deriving Next state function
Next-state function of signal y ?
184Deriving Next State function
Next-state function of signal y ?
y x z
185Conclusion
- Initial prototype of automated flow without state
explosion for ASIC design - From HDLs (control / data splitting)
- Existing tools for data-path synthesis
- Direct synthesis guarantees implementation(HDL ?
Petri net, Petri-net-based encoding) - Synthesis of large controllers by efficient spec
models (Free-choice Petri nets trace
expressions) - Exploration of the design space (optimization) by
property-preserving transformations - Logic synthesis by structural methods
- Quality of design often acceptable
- Timing post-optimization can be applied
186Synthesis from asynchronous HDL
- CSP based languages
- CSP communicating sequential processes T.
Hoare - Two synthesis techniques
- based on program transformations Caltech
- based on direct compilation Philips
- Tools are more mature than for asynchronous
synthesis from standard HDL - Complete shift in design methodology is required
187Using CSP for control generation
- After li goes high do full handshake at the
right, then complete handshake at the left and
iterate.
ro
li
Q element
ri
lo
STG
li
ro
ri
ro-
ri-
lo
li-
lo-
liroriro-not rilonot lilo-
CSP
- sequencing operator
- ro ro goes high ro- ro goes low
- li wait until li is high not li wait
until li is low
188Using CSP for control generation
liroriro-not rilonot lilo-
CSP
weak
ri
Production rules li -gt ro ri -gt ro- not ri
-gt lo not li -gt lo-
ro
li
- Conflict ro and ro- are not mutually exclusive
(since ri and li are not) - Eliminate conflict by state signal insertion (
CSC)
189Conflict elimination
lirorixxro-not rilonot
lix-not xlo-
CSP
Production rules not x and li -gt ro x or not
li -gt ro- x and not ri -gt lo not x or ri -gt
lo- ri -gt x not li -gt x-
ro
li
x
FF
not x
lo
ri
190Conclusions
- Generating circuits from CSP control program is
similar to STG synthesis - One can be reduced to the other
- Particular technique may vary. Direct CSP program
transformations can be (and were) used instead of
methods based on state space generation - See reference list for more details
191Buffer example in Tangram
(a?byte b!byte) begin x0 var byte
forever do a?x0 b!x0 od end
a
b
Buffer
passive port
Each circle mapped to a netlist
active port
Q element
a
b
Data path
192Summary
- Tangram program is partitioned into data path and
control - Data path is implemented as dual or single rail
- Control is mapped to composition of standard
elements ( etc) - Each standard element is mapped to a circuit
- Post-optimization is done
- Composing islands of control elements and
re-synthesis with STG can give more aggressive
optimization - Philips made a few chips using Tangram, including
a product 8051 micro-controller in low-power
pager Muna (25 wks battery life from one AAA
battery) - Similar approach used in Balsa(Manchester Univ.,
public domain)
193Burst mode FSM
- Close to synchronous FSMs with binary encoded I/O
- Work in bursts
- Input transitions fire
- Output transitions fire
- State signals change
- Mostly limited to fundamental mode next input
burst cannot arrive before stabilization at the
outputs
s1
b-/x-
ab/y
a-/xy-
s2
s4
c-/y
c/y-
s3
194Extended Burst mode
- Directed dont cares (b) some concurrency is
allowed for input transitions that do not
influence an output burst - Conditional guards ltbgt if b1 then
s1
b-/x-
ab/y
ltbgta-/xy-
s2
s4
c-/y
ltbgtc/y-
s3
195Synthesis of XBM
- Next state and output functions free of
functional and logic hazards - Sequential feedbacks should not introduce new
hazards - State assignment
- one state of the BM spec to one layer of Karnaugh
map - compatible layers are merged
- layers are compatible if merging does not
introduce CSC violations or hazards - Layers are encoded using race free encoding
196XBM and STG
x-
a
b
s1
b-/x-
ab/y
y
ltbgta-/xy-
s2
s4
c-/y
ltbgtc/y-
a-
c
s3
eps
y-
c-
y-
x
y
b-
197Summary
- Specification XBM is subclass of STGs
- Synthesis techniques are extensions of
synchronous state assignment and logic
minimization - Timing
- environment is limited to fundamental mode
(difficult for pipelined and highly concurrent
systems) - internals are delay insensitive
- See reference list for details
198Summary
- Specification Signal Transition
Graph(formalized timing diagram) - Synthesis
- state encoding
- Boolean function derivation
- algebraic and Boolean sequential decomposition
- technology mapping
- Timing
- delay model implies timing constraints
- exploiting timing assumptions leads to
minimization and generates further assumptions - Future work
- integrated flow
- testing