Title: Is the die cast for the token game?
1Is the die cast for the token game?
- Alex Yakovlev , Frank Burns, Alex Bystrov, Delong
Shang, Danil Sokolov - University of Newcastle upon Tyne
- ICATPN02 Adelaide
2Casting dice for old and new token games
3What is this talk about?
- Firstly, about the role of Petri nets in modern
hardware design process (design flow), which is a
gamble of its own - Secondly, about searching for the right way of
deriving logic circuits (computational
structures) from Petri nets (behavioural
specifications) - However, I wont talk here about use of Petri
nets for circuit verification
4Int. Technology Roadmap for Semiconductors says
- 2010 will bring a system-on-a-chip with
- 4 billion 50-nanometer transistors, run at 10GHz
- Moores law steady growth at 60 in the number
of transistors per chip per year as the
functionality of a chip doubles every 1.5-2
years. - Technology troubles process parameter variation,
power dissipation (IBM S/390 chip operation PICA
video), clock distribution etc. present new
challenges for Design and Test - But the biggest threat of all is design cost
-
5Design productivity gap
From ITRS99
A design team of 1000 working for 3 years on a
MPU chip would cost some 1B (25 time spent on
verification, 45 on redesign after first
silicon)
6Design costs and time to market
How to reduce them?
New design approaches to facilitate design
component re-use (IP cores), but there is a
problem of timing closure
New CAD methods to minimise costs of
verification, testing and re-design
7Timing problems
Clock Frequency GHz
Global clock cannot cope with Fewer gate delays
per clock cycle Greater clock skew
Local clock
Global clock
2000
Year
8Timing problems
Clock Frequency GHz
Global clock cannot cope with Fewer gate delays
per clock cycle Greater clock skew
Local clock
Global clock
Clocks have to be localised The number of Time
Zones increases to 1000s and more
2000
Year
9Self-timed Systems
- Get rid of global clocking and build systems
based on handshaking - Globally asynchronous locally synchronous (GALS)
- Design the whole system in a self-timed way
- Whatever way is followed new CAD tools for
self-timed design are needed
10The Timing Mode Spectrum
Fully delay-insensitive
Speed-independent
Asynchronous (self-timed)
With relative timing and i/o mode
Burst-mode and fundamental mode
Globally asynchronous locally synchronous (GALS)
Multiple clock domains
Clock gating and distribution
Synchronous (globally/locally clocked)
Single clock
11GALS module with stoppable clock
Asynchronous World
Clocked Domain
Req3
Req1
R
R
CL
Ack3
Ack1
Local CLK
Req4
Req2
Ack4
Ack2
Async-to-sync Wrapper
12GALS an Example
EnIn1
EnOut1
Sync Unit 1
In1
Out1
clk1
RCIn1
RCOut1
Clock generator
ACIn1
ACOut1
Async Interface
A1
A2
R1
R2
Sync Unit 2
EnOut2
EnIn2
Out2
In2
clk2
RCOut2
RCIn2
Clock generator
ACOut2
ACIn2
13GALS Petri net model
clk1-
MutexIn1
MutexOut1
Clk10
clk1
RCIn1
ACIn1
ACOut1
RCOut1
t
t
A1
R2
RCIn1
RCOut2
R1
A2
ACOut2
ACIn2
14Main talk outline
- Motivation design flow problems
- Backend language Petri nets?
- New design flow two-level control
- Direct mapping of PNs event-based and
level-based - Direct mapping of STGs
- Case studies
- Conclusion
15Motivation
- Complex self-timed controllers still cannot be
designed fully automatically and provably correct
(cf. work at Philips, Theseus Logic, Fulcrum,
Self-Timed Solutions) - It is important to interface to HL hardware
description languages, e.g. VHDL, Verilog
(standard for digital design) and/or Tangram,
Balsa (CSP-based) - Success (90s) of behavioural synthesis for sync
design - Parts of architectural synthesis (CDFG
extraction, scheduling and allocation) are
similar to sync. design - Synthesis of RTL control/sequencer and its
implementation should be completely new for
asynchronous circuits - Need for a good intermediate (back-end) language
16Motivation (conted)
- Existing logic synthesis tools (cf. Petrify and
Minimalist) can only cope with small-scale low
level designs (state-space explosion, limited
optimisation heuristics) - Logic synthesis produces circuits whose structure
does not correspond to their behaviour structure
(bad for analysis and testing) - Syntax-direct translation techniques may be a way
forward but applied at what level?
17Motivation for use of Petri nets
- Implications to new research targets on
- Translation between HDLs and Petri nets,
particularly formal underpinning of semantical
links between front-end and back-end formats - New composition and decomposition techniques
(incl. various forms of refinement and
transformation) applied to labelled PNs - New circuit mapping and optimisation techniques
for different types of models (under various
delay-dependence or relative time assumptions and
different signalling schemes) - Combination of direct mapping with logic
synthesis (e.g. circuits with peep-hole
optimisation)
18Main talk outline
- Motivation design flow problems
- Backend language Petri nets?
- New design flow two-level control
- Direct mapping of PNs event-based and
level-based - Direct mapping of STGs
- Case studies
- Conclusion
19Intermediate language
- What is the most adequate formal language for the
intermediate (still behavioural) level? - You dont need one at all - directly map syntax
into circuit structure (Design flow 1) - Petri nets, at the level of Signal Transition
Graph (STG), and then use logic synthesis (Design
flow 2)
20Design Flow 1 (e.g. Tangram or Balsa (currently))
HDL
Syntax-direct compilation
Handshake circuit netlist
Direct mapping with Burst Mode FSM peephole
optimisation
QDI circuit netlist
21HDL syntax directed mapping
- do
- if (XA) then
- par
- OP1
- OP2
- rap
- else
- seq
- OP3
- OP4
- qes
- if
- od
Control flow is transferred between HDL syntax
constructs rather than between operations
22Pros and cons of Flow 1
- Pros
- Simple linear-size translation, guarantees high
productivity - Allows local optimisation and re-synthesis of
parts - Testing can be programmed at high-level
- Cons
- Lack of global optimisation
- Circuit structure follows the parsing tree of the
specification - this leads to low performance
23Design Flow 2 (STG logic synthesis)
STG specification
Analysis and optimisation (consistency, CSC,
relative timing) Extras (e.g. refining to FC
subclass for structural methods)
Synthesisable STG
Logic synthesis (via full State Space or
structural methods)
QDI circuit netlist
24Logic synthesis (STGs Petrify)
State graph
STG spec
States with state coding problem
Total no. of states is 24 but only 16 binary codes
25Logic synthesis (STGs Petrify)
EQN file for model decoupled-latch Estimated
area 16.00 Rout Aout' Rout csc2 Ain
csc0 csc0 Aout' csc1' csc2' Rin csc0
csc1 csc1 (csc0 Rin') Rout csc2
csc1' (csc2 csc0) Set/reset pins
reset(Rout) set(csc1)
Output from Petrify
csc0, csc1, csc2 state encoding signals
26Logic synthesis (STGs Petrify)
Resulting state graph (with csc signals) has 59
states and no coding conflicts (coding space is
27128)
27Logic synthesis (STGs Petrify)
What if the system gets bigger?
28Logic synthesis (STGs Petrify)
EQN file for model decoupled-latch.1-2
Estimated area 34.00 R1out A1out' R1out
csc2 csc3' R2out R2out A2out' csc4
Ain csc0 csc0 A1out' A2out' csc1 csc2
csc3 csc4' csc0 (csc1 csc4' csc3 Rin)
csc1 R2out' (csc0' Rin csc1) csc2
R1out' csc2 csc3 csc3 csc0' (R1out' Rin
csc2' csc3) csc4 csc1 (csc4 csc0)
Set/reset pins reset(R1out) reset(R2out)
reset(csc1) reset(csc2) reset(csc3)
Logic is asymmetric
Delay grows out of proportion
29Pros and cons of Flow 2
- Pros
- Guarantees global optimality
- Allows HDL to STG translation for more
pragmatic front-end (e.g. BlunnoLavagno
Verilog to STG translation) and allows
model-checking together with synthesis (so makes
design provably correct) - Cons
- State space size is a problem
- Solving state-coding in a good way is a problem
30Main talk outline
- Motivation design flow problems
- Backend language Petri nets?
- New design flow two-level control
- Direct mapping of PNs event-based and
level-based - Direct mapping of STGs
- Case studies
- Conclusion
31Towards new design flow
- How to combine advantages of both approaches?
- Use them at different levels
- Introduce intermediate behavioural level -
labelled Petri nets (LPNs) - Perform semantical (based on execution order)
translation of HDLs to LPNs - Use direct mapping for large LPNs
- Decompose control and use STGs and logic
synthesis at the low level (apply structural
methods e.g. Pastor at al.)
32New Design Flow
HDL Standard (VHDL, Verilog) or Async design
specific (Balsa)
CDFG and LPN Compilation (semantic)
LPNs
Verification (coherence etc.) Optimisation
(scheduling, dummies, fanin/fanout)
Synthesisable LPN
Direct mapping
DC netlist
33New Design Flow possible sources of useful
translation techniques
- HDL to PN translation
- VHDL to Extended Timed PNs (Linkoping)
- VHDL to Control Data Flow Graphs (Lyngby)
- Verilog to PN/STGs (Torino)
- B(PN2) to M-net translation (PEP tool)
-
- But none of them caters for a good PN structure
needed for direct mapping from PNs to circuits
(mostly to work via state space exploration, esp.
in model-checking)
34Design flow
HDL specification
Control/data splitting
Hierarchical control spec
Datapath spec
STG
LPN
STG to circuit synthesis (Petrify direct
mapping)
LPN to circuit synthesis (direct mapping)
Data logic synthesis
Data logic
Hierarchical control logic
Our present focus
Controldata interfacing
HDL implementation
35HDL syntax directed mapping
- do
- if (XA) then
- par
- OP1
- OP2
- rap
- else
- seq
- OP3
- OP4
- qes
- if
- od
Control flow is transferred between HDL syntax
constructs rather than between operations
36HDL-to-LPN (high-level control)
- do
- if (XA) then
- par
- OP1
- OP2
- rap
- else
- seq
- OP3
- OP4
- qes
- if
- od
High level control Labelled Petri net (LPN)
37Labelled PNs and Datapath
- LPN is defined as (PN,OP,L) underlying PN
(P,T,F,M0), operation alphabet OP and labelling
function LT-gtOP - Operations (typically assignments, comparisons,
calls to macros such as arbiters) in OP are
defined as signatures on the elements of datapath
(e.g. lists of input/output registers R and
operation units involved in the operation U),
e.g. op(i)ltR,Ugt
38Labelled PNs and Datapath
- Operations in OP are associated with req,ack
(two-way, for assignments, or multi-way for
comparisons and arbitration) handshakes hence
opr(i) and opa(i) signals - Interface with actual req and ack signals
associated with registers in R and op-units in U
is either synthesized via Petrify (low-level
control) or hardwired using MUXes and DEMUXes
39Low-level control
Low level control Signal Transition Graphs (STG)
40 Direct mapping of LPN to David cells
DC1
(XA)
(XltgtA)
dum
dum
High-level control logic directly mapped from LPN
DC4
DC2
OP1
OP2
OP3
DC5
DC3
OP4
dum
Basic David cell (DC)
41Direct mapping cell library
42Main talk outline
- Motivation design flow problems
- Backend language Petri nets?
- New design flow two-level control
- Direct mapping of PNs event-based and
level-based - Direct mapping of STGs
- Case studies
- Conclusion
43Direct mapping vs logic synthesis conceptual
difference
- Logic synthesis uses a Petri net (STG) as a
generator of an encoded state-space. The circuit
structure is not directly related to the net
structure (though some correspondence exists and
is exploited in structural logic synthesis
methods, Pastor et al.) - Direct mapping considers a PN literally, as a
prototype of the circuit structure (cf.
Varshavskys use of term modelling circuit)
44Direct mapping vs logic synthesis
- Direct mapping has linear computational
complexity but can be area inefficient (inherent
one-hot encoding) - Logic synthesis has problems with state space
explosion, and with recognition of repetitive and
regular structures (log-based encoding approach)
45Direct Translation of Petri Nets
- Previous work dates back to 70s
- Synthesis into event-based (two-phase) circuits
- S.Patil, F.Furtek (MIT)
- Synthesis into level-based (four-phase) circuits
- R. David (69, translation of FSM graphs to CUSA
cells) - L. Hollaar (82, translation from parallel
flowcharts) - V. Varshavsky et al. (90,96, translation from
PN into an interconnection of David Cells) - See various examples of synthesis in both styles
in YakovlevKoelmans (Petri net lectures, LNCS,
1998)
46Patils set of modules
Circuit equivalent
Petri net fragment
wire
place
inverter
marked place
C-element
join
C
XOR
merge
fork
fan-out
Effectively RGD arbiter
shared (conflict) place
S
switch
s
47Example
passive h/s
active h/s
Two-phase implementation (using Patils elements)
pr
gr
Buf(1)
P(ut)
G(et)
pa
ga
C
Two phase (NRZ) protocol
pr
gr
pa
ga
48Example
passive h/s
active h/s
Two-phase implementation (using Patils elements)
pr
gr
Buf(1)
P(ut)
G(et)
pa
ga
C
Two phase (NRZ) protocol
pr
gr
pa
ga
49Example
passive h/s
active h/s
Two-phase implementation (using Patils elements)
pr
gr
Buf(1)
P(ut)
G(et)
pa
ga
Two phase (NRZ) protocol
pr
gr
pa
ga
50Other useful elements
Select
Call
Toggle
51Direct synthesis example(modulo-k Up-Down
counter)
Mod-k counter LPN
Environment LPN
52Direct synthesis example(modulo-k Up-Down
counter)
Decomposition (structural view)
53Direct synthesis example(modulo-k Up-Down
counter)
structure
LPN
54Direct synthesis example(modulo-k Up-Down
counter)
structure
LPN
55Direct synthesis example(modulo-k Up-Down
counter)
56Synthesis into level-based circuits
- Davids method for asynchronous Finite State
Machines - Hollaars extensions to parallel flow charts
- Varshavskys method for 1-safe persistent Petri
nets based on associating places with latches
the method works for closed (autonomous) circuits
with no input choice, arbitration and inputs can
only be part of handshakes activated by control
logic
57Davids original approach
a
x1
yb
x1
x2
b
d
ya
yc
c
x2
x1
x2
CUSA element for storing state b
Fragment of a State Machine flow graph
58Hollaars approach
(0)
M
(1)
K
A
(1)
N
M
N
(1)
B
(1)
L
L
K
1
(0)
A
1
B
Fragment of a flow-chart (allows parallelism)
One-hot circuit cell
59Hollaars approach
1
M
0
K
A
(1)
N
M
N
0
B
(1)
L
L
K
1
(0)
A
1
B
Fragment of flow-chart
One-hot circuit cell
60Hollaars approach
1
M
0
K
A
(1)
N
M
N
1
B
(1)
L
L
K
0
(0)
A
1
B
Fragment of flow-chart
One-hot circuit cell
61Varshavskys Approach
Controlled
Operation
p1
p2
p2
p1
(0)
(1)
(1)
(0)
(1)
1
To Operation
62Varshavskys Approach
p1
p2
p2
p1
0-gt1
1-gt0
(1)
(0)
(1)
1-gt0
To Operation
63Varshavskys Approach
p1
p2
p2
p1
1-gt0
0-gt1
1-gt0
0-gt1
1
1-gt0-gt1
To Operation
64Varshavskys Approach
- This method associates places with latches
(flip-flops) so the state memory (marking) of
PN is directly mimicked in the circuits state
memory - Transitions are associated with controlled
actions (e.g. activations of data path units or
lower level control blocks by using handshake
protocols) - Modelling discrepancy (be careful!)
- in Petri nets removal of a token from pre-places
and adding tokens in post-places is instantaneous
(i.e. no intermediate states) - in circuits the move of a token has a duration
and there is an intermediate state
65Direct mapping of LPNs and STGs
66Fast David cell
Fast DC Timing assumptions GasP section
The same with negative gates
67Implementability condition for LPNs
- Autonomous control interpretation each
transition is associated with a handshake to the
controlled part (datapath) or a dummy - Implementability Any 1-safe labelled PN with
autonomous control semantics of transitions with
no loops of less than three transitions can be
directly mapped into a speed-implemented control
circuit whose behaviour is equivalent (bisimilar)
to the PN - Consistency of labelling transitions labelled by
reference to the same datapath blocks must be
conistent with the local semantics of those
blocks (e.g. must not be mutually concurrent)
68Main talk outline
- Motivation design flow problems
- Backend language Petri nets?
- New design flow two-level control
- Direct mapping of PNs event-based and
level-based - Direct mapping of STGs
- Case studies
- Conclusion
69Direct mapping of STGs
STG specification
Mapped circuit
Rout
Aout
Here all signal transitions are associated with
handshakes and handshake compression must be done
before mapping
70What about direct mapping of arbitrary STGs
out1
inp1-
inp1
inp2
out1-
out2
- Associate with each output transition a latch
(one per signal x), with each input some sampling
logic and set (for x) or reset (for x-)
handshake - pull for inputs, and push for outputs.
inp1
inp1-
push
out1
out1
Output latch
out1-
71What about direct mapping of arbitrary STGs
out1
inp1-
inp1
inp2
out1-
out2
Problem long delay between input event and
output response
mux demux logic
pull
inp2
inp2
Input sample
inp2-
push
out1
inp2
out1
out1
Output latch
out1-
72What about direct mapping of arbitrary STGs
- Another problem for direct mapping STGs may
contain self-loops (or read arcs) for testing
level-oriented inputs and outputs
x
x-
x1
y
73Low latency approach
- Can we connect inputs directly to the control
structure to minimise the i/o latency?
out1
inp2
74The problem of mapping STGs
- Given an 1-safe STG
- Target netlist of David cells, input wires and
output flip-flops - Procedure use direct mapping of elements of
underlying PN into elements of the netlist - Problem need for intermediate form of STG, where
I/O is connected to control by read arcs only
75Device environment interface
76Device environment interface
Input wire
Output latch
tracker
To derive circuit implementation we only use
tracker and i/o subnets
77Direct mapping
78Optimisation
Removing places from the tracker. Latency
reduction effect if the place between an input
and the following output is removed. Coding
conflicts are possible. Places perform state
separation.
Tracker
Tracker
79Optimisation coding conflicts
Input signal a changes twice between p1 and
p5. Keeping p3 solves the conflict and preserves
low latency.
80Irreducible input coding conflicts
- Certain input labelling cannot be implemented in
a speed-independent way, without timing
assumptions (e.g. input changes are slower than
David cell operation) or without changing the I/O
interface (introduce new outputs response to the
environment)
inp0
inp0
Inseparable states (for the tracker)
81Implementability of STGs
- Sufficient condition
- an STG with a 1-safe underlying PN with
consistent signal transition labelling
(transitions of the same signal are in precedence
and /- alternate) and monotonic input bursts
(for each connected input-labelled subgraph each
signal changes only once) - NS condition is an open problem!
82Main talk outline
- Motivation design flow problems
- Backend language Petri nets?
- New design flow two-level control
- Direct mapping of PNs event-based and
level-based - Direct mapping of STGs
- Case studies
- Conclusion
83Communication channel example
- A duplex delay-insensitive channel for low power
and pin-efficiency proposed by Steve Furber
(AINT2002) - Relatively simple data path (with handshake
access via push and pull protocols) - Sophisticated control (involves arbitration,
choice and concurrency) - Natural two-level control decomposition
- Requires low-latency (existing STG and BM
solutions produce too heavy logic)
84Channel Structure
N-of-M code
Master
Slave
N-of-M code
N-of-M codes dual-rail, 3-of-6,2-of-7
Key Protocol Symbols (e.g. in dual rail) Start
(01), Ack (10), Slave-Ack (11), Data (01 or 10)
85Protocol Specification
Protocol Automaton
Master
Slave
The protocol can be defined on an imaginary
Protocol Automaton receiving symbols from both
sides (it will hide all activity internal to
Master and Slave)
86Protocol Specification
Protocol Automaton
Master
Slave
87Controller Overview
Data path and low level control
High Level control
push
push
push
pull
pull
88 Low-level logic
Tx controller
Sending interface
89LPN model for high level control (master)
Calls to local arbiters
Slave-Ack pull
pulls
Three-way pushes
pushes
Three-way pulls
dummies inserted for direct DC mapping
90High level control (master) mapped directly from
LPN
dummies
push
pull
push
push
pull
arbiter1
push
arbiter2
pull
pull
push
push
91Towards synthesis for higher performance
push
dummy
pull
pull
Is the dummy in the right place? It is on the
cycle of (output) push and (input)
pull pull-gtdummy-gtpush-gtpull-dummy-gtpush -gt
92Towards synthesis for higher performance
Critical path
push
Non-critical path
dummy
Synthesis rule Dont insert dummies on critical
paths
pull
93Synthesis for lower I/O latency LPN level
High-level control
internal actions
pull
push
pull
Low latency shortcut
pull logic
push logic
pull logic
input
input
output
Environment (channel)
94Channel Cycle Time
Controller Implementation Simplex mode Duplex mode
Direct mapping from LPN 7.6 ns 8.3 ns
Logic synthesis from STG 12.7 ns 16.5 ns
- These results were obtained for 0.6 micro CMOS
- Further improvement can be achieved by more use
of low latency techniques (at the gate level) and
introducing aggressive relative timing, in David
cells and low level logic
95Case study VME bus controller
96Case study VME bus controller
97Case study VME bus controller
98Case study VME bus controller
- Circuit generated by logic synthesis (Petrify)
- Smaller, though comparable in size
- Transistor stacks are larger
99Case study VME bus controller
Latency comparison between our method and Petrify
solution.
Transition Petrify Fast DC
ldtack -gt d 0.35ns 0.29ns
ldtack -gt d- 0.20ns 0.16ns
d -gt dtack 0.27ns 0.27ns
dsw- -gt dtack- 0.42ns 0.44ns
ldtack- -gt lds (rd) 0.38ns 0.21ns
ldtack -gt lds (wr) 0.38ns 0.29ns
dsw- -gt lds- 0.33ns 0.26ns
Number of transistors 32 56
100Conclusion
- Hierarchical (eg. Protocol) controller synthesis
can go via back-end LPN/STG models - Direct mapping from LPNs/STGs yields fast
circuits that are easy to analyse and test - Translation from PNs to David cell netlists
implemented in tool pn2dc - Translation from VHDL specs to LPNs and STGs
implemented in tools fsm2lpn and fsm2stg - Further work needed on
- Formal link between HDLs and PNs (semantics and
equivalence), leading to better synthesis of PNs
from HDLs - Optimisation techniques at LPN/STG and circuit
levels - See our papers in Async02 and ISCAS02
101Open problems
- Formally characterise properties of PNs that
make them good for circuit design, like
optimality wrt I/O response time, worst/average
case cycle time, positions of silent (dummy)
events - Control (place/transition nets)datapath separate
versus use of high-level nets for both - Testing via Petri nest specification (faults in
PNs stuck tokens, transitions )