Is the die cast for the token game?

About This Presentation

Title:

Is the die cast for the token game?

Description:

HDL: Standard (VHDL, Verilog) or Async design specific (Balsa) LPNs ... Verilog to PN/STGs (Torino) B(PN^2) to M-net translation (PEP tool) ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 102

Provided by: alexy156

Category:

more less

Transcript and Presenter's Notes

Title: Is the die cast for the token game?

1
Is the die cast for the token game?

Alex Yakovlev , Frank Burns, Alex Bystrov, Delong
Shang, Danil Sokolov
University of Newcastle upon Tyne
ICATPN02 Adelaide

2
Casting dice for old and new token games
3
What is this talk about?

Firstly, about the role of Petri nets in modern
hardware design process (design flow), which is a
gamble of its own
Secondly, about searching for the right way of
deriving logic circuits (computational
structures) from Petri nets (behavioural
specifications)
However, I wont talk here about use of Petri
nets for circuit verification

4
Int. Technology Roadmap for Semiconductors says

2010 will bring a system-on-a-chip with
4 billion 50-nanometer transistors, run at 10GHz
Moores law steady growth at 60 in the number
of transistors per chip per year as the
functionality of a chip doubles every 1.5-2
years.
Technology troubles process parameter variation,
power dissipation (IBM S/390 chip operation PICA
video), clock distribution etc. present new
challenges for Design and Test
But the biggest threat of all is design cost

5
Design productivity gap
From ITRS99
A design team of 1000 working for 3 years on a
MPU chip would cost some 1B (25 time spent on
verification, 45 on redesign after first
silicon)
6
Design costs and time to market
How to reduce them?
New design approaches to facilitate design
component re-use (IP cores), but there is a
problem of timing closure
New CAD methods to minimise costs of
verification, testing and re-design
7
Timing problems
Clock Frequency GHz
Global clock cannot cope with Fewer gate delays
per clock cycle Greater clock skew
Local clock
Global clock
2000
Year
8
Timing problems
Clock Frequency GHz
Global clock cannot cope with Fewer gate delays
per clock cycle Greater clock skew
Local clock
Global clock
Clocks have to be localised The number of Time
Zones increases to 1000s and more
2000
Year
9
Self-timed Systems

Get rid of global clocking and build systems
based on handshaking
Globally asynchronous locally synchronous (GALS)
Design the whole system in a self-timed way
Whatever way is followed new CAD tools for
self-timed design are needed

10
The Timing Mode Spectrum
Fully delay-insensitive
Speed-independent
Asynchronous (self-timed)
With relative timing and i/o mode
Burst-mode and fundamental mode
Globally asynchronous locally synchronous (GALS)
Multiple clock domains
Clock gating and distribution
Synchronous (globally/locally clocked)
Single clock
11
GALS module with stoppable clock
Asynchronous World
Clocked Domain
Req3
Req1
R
R
CL
Ack3
Ack1
Local CLK
Req4
Req2
Ack4
Ack2
Async-to-sync Wrapper
12
GALS an Example
EnIn1
EnOut1
Sync Unit 1
In1
Out1
clk1
RCIn1
RCOut1
Clock generator
ACIn1
ACOut1
Async Interface
A1
A2
R1
R2
Sync Unit 2
EnOut2
EnIn2
Out2
In2
clk2
RCOut2
RCIn2
Clock generator
ACOut2
ACIn2
13
GALS Petri net model
clk1-
MutexIn1
MutexOut1
Clk10
clk1
RCIn1
ACIn1
ACOut1
RCOut1
t
t
A1
R2
RCIn1
RCOut2
R1
A2
ACOut2
ACIn2
14
Main talk outline

Motivation design flow problems
Backend language Petri nets?
New design flow two-level control
Direct mapping of PNs event-based and
level-based
Direct mapping of STGs
Case studies
Conclusion

15
Motivation

Complex self-timed controllers still cannot be
designed fully automatically and provably correct
(cf. work at Philips, Theseus Logic, Fulcrum,
Self-Timed Solutions)
It is important to interface to HL hardware
description languages, e.g. VHDL, Verilog
(standard for digital design) and/or Tangram,
Balsa (CSP-based)
Success (90s) of behavioural synthesis for sync
design
Parts of architectural synthesis (CDFG
extraction, scheduling and allocation) are
similar to sync. design
Synthesis of RTL control/sequencer and its
implementation should be completely new for
asynchronous circuits
Need for a good intermediate (back-end) language

16
Motivation (conted)

Existing logic synthesis tools (cf. Petrify and
Minimalist) can only cope with small-scale low
level designs (state-space explosion, limited
optimisation heuristics)
Logic synthesis produces circuits whose structure
does not correspond to their behaviour structure
(bad for analysis and testing)
Syntax-direct translation techniques may be a way
forward but applied at what level?

17
Motivation for use of Petri nets

Implications to new research targets on
Translation between HDLs and Petri nets,
particularly formal underpinning of semantical
links between front-end and back-end formats
New composition and decomposition techniques
(incl. various forms of refinement and
transformation) applied to labelled PNs
New circuit mapping and optimisation techniques
for different types of models (under various
delay-dependence or relative time assumptions and
different signalling schemes)
Combination of direct mapping with logic
synthesis (e.g. circuits with peep-hole
optimisation)

18
Main talk outline

Motivation design flow problems
Backend language Petri nets?
New design flow two-level control
Direct mapping of PNs event-based and
level-based
Direct mapping of STGs
Case studies
Conclusion

19
Intermediate language

What is the most adequate formal language for the
intermediate (still behavioural) level?
You dont need one at all - directly map syntax
into circuit structure (Design flow 1)
Petri nets, at the level of Signal Transition
Graph (STG), and then use logic synthesis (Design
flow 2)

20
Design Flow 1 (e.g. Tangram or Balsa (currently))
HDL
Syntax-direct compilation
Handshake circuit netlist
Direct mapping with Burst Mode FSM peephole
optimisation
QDI circuit netlist
21
HDL syntax directed mapping

do
if (XA) then
par
OP1
OP2
rap
else
seq
OP3
OP4
qes
if
od

Control flow is transferred between HDL syntax
constructs rather than between operations
22
Pros and cons of Flow 1

Pros
Simple linear-size translation, guarantees high
productivity
Allows local optimisation and re-synthesis of
parts
Testing can be programmed at high-level
Cons
Lack of global optimisation
Circuit structure follows the parsing tree of the
specification - this leads to low performance

23
Design Flow 2 (STG logic synthesis)
STG specification
Analysis and optimisation (consistency, CSC,
relative timing) Extras (e.g. refining to FC
subclass for structural methods)
Synthesisable STG
Logic synthesis (via full State Space or
structural methods)
QDI circuit netlist
24
Logic synthesis (STGs Petrify)
State graph
STG spec
States with state coding problem
Total no. of states is 24 but only 16 binary codes
25
Logic synthesis (STGs Petrify)
EQN file for model decoupled-latch Estimated
area 16.00 Rout Aout' Rout csc2 Ain
csc0 csc0 Aout' csc1' csc2' Rin csc0
csc1 csc1 (csc0 Rin') Rout csc2
csc1' (csc2 csc0) Set/reset pins
reset(Rout) set(csc1)
Output from Petrify
csc0, csc1, csc2 state encoding signals
26
Logic synthesis (STGs Petrify)
Resulting state graph (with csc signals) has 59
states and no coding conflicts (coding space is
27128)
27
Logic synthesis (STGs Petrify)
What if the system gets bigger?
28
Logic synthesis (STGs Petrify)
EQN file for model decoupled-latch.1-2
Estimated area 34.00 R1out A1out' R1out
csc2 csc3' R2out R2out A2out' csc4
Ain csc0 csc0 A1out' A2out' csc1 csc2
csc3 csc4' csc0 (csc1 csc4' csc3 Rin)
csc1 R2out' (csc0' Rin csc1) csc2
R1out' csc2 csc3 csc3 csc0' (R1out' Rin
csc2' csc3) csc4 csc1 (csc4 csc0)
Set/reset pins reset(R1out) reset(R2out)
reset(csc1) reset(csc2) reset(csc3)
Logic is asymmetric
Delay grows out of proportion
29
Pros and cons of Flow 2

Pros
Guarantees global optimality
Allows HDL to STG translation for more
pragmatic front-end (e.g. BlunnoLavagno
Verilog to STG translation) and allows
model-checking together with synthesis (so makes
design provably correct)
Cons
State space size is a problem
Solving state-coding in a good way is a problem

30
Main talk outline

Motivation design flow problems
Backend language Petri nets?
New design flow two-level control
Direct mapping of PNs event-based and
level-based
Direct mapping of STGs
Case studies
Conclusion

31
Towards new design flow

How to combine advantages of both approaches?
Use them at different levels
Introduce intermediate behavioural level -
labelled Petri nets (LPNs)
Perform semantical (based on execution order)
translation of HDLs to LPNs
Use direct mapping for large LPNs
Decompose control and use STGs and logic
synthesis at the low level (apply structural
methods e.g. Pastor at al.)

32
New Design Flow
HDL Standard (VHDL, Verilog) or Async design
specific (Balsa)
CDFG and LPN Compilation (semantic)
LPNs
Verification (coherence etc.) Optimisation
(scheduling, dummies, fanin/fanout)
Synthesisable LPN
Direct mapping
DC netlist
33
New Design Flow possible sources of useful
translation techniques

HDL to PN translation
VHDL to Extended Timed PNs (Linkoping)
VHDL to Control Data Flow Graphs (Lyngby)
Verilog to PN/STGs (Torino)
B(PN2) to M-net translation (PEP tool)
But none of them caters for a good PN structure
needed for direct mapping from PNs to circuits
(mostly to work via state space exploration, esp.
in model-checking)

34
Design flow
HDL specification
Control/data splitting
Hierarchical control spec
Datapath spec
STG
LPN
STG to circuit synthesis (Petrify direct
mapping)
LPN to circuit synthesis (direct mapping)
Data logic synthesis
Data logic
Hierarchical control logic
Our present focus
Controldata interfacing
HDL implementation
35
HDL syntax directed mapping

do
if (XA) then
par
OP1
OP2
rap
else
seq
OP3
OP4
qes
if
od

Control flow is transferred between HDL syntax
constructs rather than between operations
36
HDL-to-LPN (high-level control)

do
if (XA) then
par
OP1
OP2
rap
else
seq
OP3
OP4
qes
if
od

High level control Labelled Petri net (LPN)
37
Labelled PNs and Datapath

LPN is defined as (PN,OP,L) underlying PN
(P,T,F,M0), operation alphabet OP and labelling
function LT-gtOP
Operations (typically assignments, comparisons,
calls to macros such as arbiters) in OP are
defined as signatures on the elements of datapath
(e.g. lists of input/output registers R and
operation units involved in the operation U),
e.g. op(i)ltR,Ugt

38
Labelled PNs and Datapath

Operations in OP are associated with req,ack
(two-way, for assignments, or multi-way for
comparisons and arbitration) handshakes hence
opr(i) and opa(i) signals
Interface with actual req and ack signals
associated with registers in R and op-units in U
is either synthesized via Petrify (low-level
control) or hardwired using MUXes and DEMUXes

39
Low-level control
Low level control Signal Transition Graphs (STG)
40
Direct mapping of LPN to David cells
DC1
(XA)
(XltgtA)
dum
dum
High-level control logic directly mapped from LPN
DC4
DC2
OP1
OP2
OP3
DC5
DC3
OP4
dum
Basic David cell (DC)
41
Direct mapping cell library
42
Main talk outline

Motivation design flow problems
Backend language Petri nets?
New design flow two-level control
Direct mapping of PNs event-based and
level-based
Direct mapping of STGs
Case studies
Conclusion

43
Direct mapping vs logic synthesis conceptual
difference

Logic synthesis uses a Petri net (STG) as a
generator of an encoded state-space. The circuit
structure is not directly related to the net
structure (though some correspondence exists and
is exploited in structural logic synthesis
methods, Pastor et al.)
Direct mapping considers a PN literally, as a
prototype of the circuit structure (cf.
Varshavskys use of term modelling circuit)

44
Direct mapping vs logic synthesis

Direct mapping has linear computational
complexity but can be area inefficient (inherent
one-hot encoding)
Logic synthesis has problems with state space
explosion, and with recognition of repetitive and
regular structures (log-based encoding approach)

45
Direct Translation of Petri Nets

Previous work dates back to 70s
Synthesis into event-based (two-phase) circuits
S.Patil, F.Furtek (MIT)
Synthesis into level-based (four-phase) circuits
R. David (69, translation of FSM graphs to CUSA
cells)
L. Hollaar (82, translation from parallel
flowcharts)
V. Varshavsky et al. (90,96, translation from
PN into an interconnection of David Cells)
See various examples of synthesis in both styles
in YakovlevKoelmans (Petri net lectures, LNCS,
1998)

46
Patils set of modules
Circuit equivalent
Petri net fragment
wire
place
inverter
marked place
C-element
join
C
XOR
merge
fork
fan-out
Effectively RGD arbiter
shared (conflict) place
S
switch
s
47
Example
passive h/s
active h/s
Two-phase implementation (using Patils elements)
pr
gr
Buf(1)
P(ut)
G(et)
pa
ga
C
Two phase (NRZ) protocol
pr
gr
pa
ga
48
Example
passive h/s
active h/s
Two-phase implementation (using Patils elements)
pr
gr
Buf(1)
P(ut)
G(et)
pa
ga
C
Two phase (NRZ) protocol
pr
gr
pa
ga
49
Example
passive h/s
active h/s
Two-phase implementation (using Patils elements)
pr
gr
Buf(1)
P(ut)
G(et)
pa
ga
Two phase (NRZ) protocol
pr
gr
pa
ga
50
Other useful elements
Select
Call
Toggle
51
Direct synthesis example(modulo-k Up-Down
counter)
Mod-k counter LPN
Environment LPN
52
Direct synthesis example(modulo-k Up-Down
counter)
Decomposition (structural view)
53
Direct synthesis example(modulo-k Up-Down
counter)
structure
LPN
54
Direct synthesis example(modulo-k Up-Down
counter)
structure
LPN
55
Direct synthesis example(modulo-k Up-Down
counter)
56
Synthesis into level-based circuits

Davids method for asynchronous Finite State
Machines
Hollaars extensions to parallel flow charts
Varshavskys method for 1-safe persistent Petri
nets based on associating places with latches
the method works for closed (autonomous) circuits
with no input choice, arbitration and inputs can
only be part of handshakes activated by control
logic

57
Davids original approach
a
x1
yb
x1
x2
b
d
ya
yc
c
x2
x1
x2
CUSA element for storing state b
Fragment of a State Machine flow graph
58
Hollaars approach
(0)
M
(1)
K
A
(1)
N
M
N
(1)
B
(1)
L
L
K
1
(0)
A
1
B
Fragment of a flow-chart (allows parallelism)
One-hot circuit cell
59
Hollaars approach
1
M
0
K
A
(1)
N
M
N
0
B
(1)
L
L
K
1
(0)
A
1
B
Fragment of flow-chart
One-hot circuit cell
60
Hollaars approach
1
M
0
K
A
(1)
N
M
N
1
B
(1)
L
L
K
0
(0)
A
1
B
Fragment of flow-chart
One-hot circuit cell
61
Varshavskys Approach
Controlled
Operation
p1
p2
p2
p1
(0)
(1)
(1)
(0)
(1)
1
To Operation
62
Varshavskys Approach
p1
p2
p2
p1
0-gt1
1-gt0
(1)
(0)
(1)
1-gt0
To Operation
63
Varshavskys Approach
p1
p2
p2
p1
1-gt0
0-gt1
1-gt0
0-gt1
1
1-gt0-gt1
To Operation
64
Varshavskys Approach

This method associates places with latches
(flip-flops) so the state memory (marking) of
PN is directly mimicked in the circuits state
memory
Transitions are associated with controlled
actions (e.g. activations of data path units or
lower level control blocks by using handshake
protocols)
Modelling discrepancy (be careful!)
in Petri nets removal of a token from pre-places
and adding tokens in post-places is instantaneous
(i.e. no intermediate states)
in circuits the move of a token has a duration
and there is an intermediate state

65
Direct mapping of LPNs and STGs
66
Fast David cell
Fast DC Timing assumptions GasP section
The same with negative gates
67
Implementability condition for LPNs

Autonomous control interpretation each
transition is associated with a handshake to the
controlled part (datapath) or a dummy
Implementability Any 1-safe labelled PN with
autonomous control semantics of transitions with
no loops of less than three transitions can be
directly mapped into a speed-implemented control
circuit whose behaviour is equivalent (bisimilar)
to the PN
Consistency of labelling transitions labelled by
reference to the same datapath blocks must be
conistent with the local semantics of those
blocks (e.g. must not be mutually concurrent)

68
Main talk outline

Motivation design flow problems
Backend language Petri nets?
New design flow two-level control
Direct mapping of PNs event-based and
level-based
Direct mapping of STGs
Case studies
Conclusion

69
Direct mapping of STGs
STG specification
Mapped circuit
Rout
Aout
Here all signal transitions are associated with
handshakes and handshake compression must be done
before mapping
70
What about direct mapping of arbitrary STGs
out1
inp1-
inp1
inp2
out1-
out2

Associate with each output transition a latch
(one per signal x), with each input some sampling
logic and set (for x) or reset (for x-)
handshake - pull for inputs, and push for outputs.

inp1
inp1-
push
out1
out1
Output latch
out1-
71
What about direct mapping of arbitrary STGs
out1
inp1-
inp1
inp2
out1-
out2
Problem long delay between input event and
output response
mux demux logic
pull
inp2
inp2
Input sample
inp2-
push
out1
inp2
out1
out1
Output latch
out1-
72
What about direct mapping of arbitrary STGs

Another problem for direct mapping STGs may
contain self-loops (or read arcs) for testing
level-oriented inputs and outputs

x
x-
x1
y
73
Low latency approach

Can we connect inputs directly to the control
structure to minimise the i/o latency?

out1
inp2
74
The problem of mapping STGs

Given an 1-safe STG
Target netlist of David cells, input wires and
output flip-flops
Procedure use direct mapping of elements of
underlying PN into elements of the netlist
Problem need for intermediate form of STG, where
I/O is connected to control by read arcs only

75
Device environment interface
76
Device environment interface
Input wire
Output latch
tracker
To derive circuit implementation we only use
tracker and i/o subnets
77
Direct mapping
78
Optimisation
Removing places from the tracker. Latency
reduction effect if the place between an input
and the following output is removed. Coding
conflicts are possible. Places perform state
separation.
Tracker
Tracker
79
Optimisation coding conflicts
Input signal a changes twice between p1 and
p5. Keeping p3 solves the conflict and preserves
low latency.
80
Irreducible input coding conflicts

Certain input labelling cannot be implemented in
a speed-independent way, without timing
assumptions (e.g. input changes are slower than
David cell operation) or without changing the I/O
interface (introduce new outputs response to the
environment)

inp0
inp0
Inseparable states (for the tracker)
81
Implementability of STGs

Sufficient condition
an STG with a 1-safe underlying PN with
consistent signal transition labelling
(transitions of the same signal are in precedence
and /- alternate) and monotonic input bursts
(for each connected input-labelled subgraph each
signal changes only once)
NS condition is an open problem!

82
Main talk outline

Motivation design flow problems
Backend language Petri nets?
New design flow two-level control
Direct mapping of PNs event-based and
level-based
Direct mapping of STGs
Case studies
Conclusion

83
Communication channel example

A duplex delay-insensitive channel for low power
and pin-efficiency proposed by Steve Furber
(AINT2002)
Relatively simple data path (with handshake
access via push and pull protocols)
Sophisticated control (involves arbitration,
choice and concurrency)
Natural two-level control decomposition
Requires low-latency (existing STG and BM
solutions produce too heavy logic)

84
Channel Structure
N-of-M code
Master
Slave
N-of-M code
N-of-M codes dual-rail, 3-of-6,2-of-7
Key Protocol Symbols (e.g. in dual rail) Start
(01), Ack (10), Slave-Ack (11), Data (01 or 10)
85
Protocol Specification
Protocol Automaton
Master
Slave
The protocol can be defined on an imaginary
Protocol Automaton receiving symbols from both
sides (it will hide all activity internal to
Master and Slave)
86
Protocol Specification
Protocol Automaton
Master
Slave
87
Controller Overview
Data path and low level control
High Level control
push
push
push
pull
pull
88
Low-level logic
Tx controller
Sending interface
89
LPN model for high level control (master)
Calls to local arbiters
Slave-Ack pull
pulls
Three-way pushes
pushes
Three-way pulls
dummies inserted for direct DC mapping
90
High level control (master) mapped directly from
LPN
dummies
push
pull
push
push
pull
arbiter1
push
arbiter2
pull
pull
push
push
91
Towards synthesis for higher performance
push
dummy
pull
pull
Is the dummy in the right place? It is on the
cycle of (output) push and (input)
pull pull-gtdummy-gtpush-gtpull-dummy-gtpush -gt
92
Towards synthesis for higher performance
Critical path
push
Non-critical path
dummy
Synthesis rule Dont insert dummies on critical
paths
pull
93
Synthesis for lower I/O latency LPN level
High-level control
internal actions
pull
push
pull

Low latency shortcut
pull logic
push logic
pull logic
input
input
output

Environment (channel)
94
Channel Cycle Time
Controller Implementation Simplex mode Duplex mode
Direct mapping from LPN 7.6 ns 8.3 ns
Logic synthesis from STG 12.7 ns 16.5 ns

These results were obtained for 0.6 micro CMOS
Further improvement can be achieved by more use
of low latency techniques (at the gate level) and
introducing aggressive relative timing, in David
cells and low level logic

95
Case study VME bus controller
96
Case study VME bus controller
97
Case study VME bus controller
98
Case study VME bus controller

Circuit generated by logic synthesis (Petrify)
Smaller, though comparable in size
Transistor stacks are larger

99
Case study VME bus controller
Latency comparison between our method and Petrify
solution.
Transition Petrify Fast DC
ldtack -gt d 0.35ns 0.29ns
ldtack -gt d- 0.20ns 0.16ns
d -gt dtack 0.27ns 0.27ns
dsw- -gt dtack- 0.42ns 0.44ns
ldtack- -gt lds (rd) 0.38ns 0.21ns
ldtack -gt lds (wr) 0.38ns 0.29ns
dsw- -gt lds- 0.33ns 0.26ns
Number of transistors 32 56
100
Conclusion

Hierarchical (eg. Protocol) controller synthesis
can go via back-end LPN/STG models
Direct mapping from LPNs/STGs yields fast
circuits that are easy to analyse and test
Translation from PNs to David cell netlists
implemented in tool pn2dc
Translation from VHDL specs to LPNs and STGs
implemented in tools fsm2lpn and fsm2stg
Further work needed on
Formal link between HDLs and PNs (semantics and
equivalence), leading to better synthesis of PNs
from HDLs
Optimisation techniques at LPN/STG and circuit
levels
See our papers in Async02 and ISCAS02

101
Open problems

Formally characterise properties of PNs that
make them good for circuit design, like
optimality wrt I/O response time, worst/average
case cycle time, positions of silent (dummy)
events
Control (place/transition nets)datapath separate
versus use of high-level nets for both
Testing via Petri nest specification (faults in
PNs stuck tokens, transitions )

Write a Comment

User Comments (0)