Title: Advanced Tutorial on Hardware Design and Petri nets
1Advanced Tutorial on Hardware Design and Petri
nets
- Jordi Cortadella Univ. Politècnica de Catalunya
- Luciano Lavagno Università di Udine
- Alex Yakovlev Univ. Newcastle upon Tyne
2Tutorial Outline
- Introduction
- Modeling Hardware with PNs
- Synthesis of Circuits from PN specifications
- Circuit verification with PNs
- Performance analysis using PNs
3Introduction.Outline
- Role of Hardware in modern systems
- Role of Hardware design tools
- Role of a modeling language
- Why Petri nets are good for Hardware Design
- History of relationship between Hardware Design
and Petri nets - Asynchronous Circuit Design
4Role of Hardware in modern systems
- Technology soon allows putting 1 billion
transistors on a chip - Systems on chip is a reality 1 billion
operations per second - Hardware and software designs are no longer
separate - Hardware becomes distributed, asynchronous and
concurrent
5Role of Hardware design tools
- Design productivity is a problem due to chip
complexity and time to market demands - Need for well-integrated CAD with simulation,
synthesis, verification and testing tools - Modelling of system behaviour at all levels of
abstraction with feedback to the designer - Design re-use is a must but with max technology
independence
6Role of Modelling Language
- Design methods and tools require good modelling
and specification techniques - Those must be formal and rigorous and easy to
comprehend (cf. timing diagrams, waveforms,
traditionally used by logic designers) - Todays hardware description languages allow high
level of abstraction - Models must allow for equivalence-preserving
refinements - They must allow for non-functional qualities such
as speed, size and power
7Why Petri nets are good
- Finite State Machine is still the main formal
tool in hardware design but it may be inadequate
for distributed, concurrent and asynchronous
hardware - Petri nets
- simple and easy to understand graphical capture
- modelling power adjustable to various types of
behaviour at different abstraction levels - formal operational semantics and verification of
correctnes (safety and liveness) properties - possibility of mechanical synthesis of circuits
from net models
8A bit of history of their marriage
- 1950s and 60s Foundations (Muller Bartky,
Petri, Karp Miller, ) - 1970s Toward Parellel Computations (MIT,
Toulouse, St. Petersburg, Manchester ) - 1980s First progress in VLSI and CAD,
Concurrency theory, Signal Transition Graphs
(STGs) - 1990s First asynchronous design (verification
and synthesis) tools SIS, Forcage, Petrify - 2000s Powerful asynchronous design flow
9Introduction to Asynchronous Circuits
- What is an asynchronous circuit?
- Physical (analogue) level
- Logical level
- Speed-independent and delay-insensitive circuits
- Why go asynchronous?
- Why control logic?
- Role of Petri nets
- Asynchronous circuit design based on Petri nets
10What is an asynchronous circuit
- No global clock circuits are self-timed or
self-clocked - Can be viewed as hardwired versions of parallel
and distributed programs statements are
activated when their guards are true - No special run-time mechanism the program
statements are physical components logic gates,
memory latches, or hierarchical modules - Interconnections are also physical components
wires, busses
11Synchronous Design
Clock
Data input
Data
Register Sender
Register Receiver
Clock
Logic
Tsetup
Thold
Timing constraint input data must stay unchanged
within a setup/hold window around clock event.
Otherwise, the latch may fail (e.g. metastability)
12Asynchronous Design
Req(est)
Ack(nowledge)
Data input
Data
Register Sender
Register Receiver
Req
Logic
Ack
Req/Ack (local) signal handshake protocol instead
of global clock Causal relationship Handshake
signals implemented with completion detection in
data path
13Physical (Analogue) level
- Strict view an asynchronous circuit is a
(analogue) dynamical system e.g. to be
described by differential equations - In most cases can be safely approximated by logic
level (0-to-1 and 1-to-0 transitions)
abstraction even hazards can be captured - For some anomalous effects, such as metastability
and oscillations, absolute need for analogue
models - Analogue aspects are not considered in this
tutorial (cf. reference list)
14Logical Level
- Circuit behaviour is described by sequences of up
(0-to-1) and down (1-to-0) transitions on inputs
and outputs - The order of transitions is defined by causal
relationship, not by clock (a causes b, directly
or transitively) - The order is partial if concurrency is present
- A class of async timed (yet not clocked!)
circuits allows special timing order relations (a
occurs before b, due to delay assumptions)
15Simple circuit example
ack1
req1
C
x
out(xy)(ab)
y
req3
ack2
ack3
req2
a
out
b
16Simple circuit example
ack1
req1
C
x
out(xy)(ab)
y
req3
ack2
ack3
req2
a
out
b
x
y
out
a
b
Data flow graph
17Simple circuit example
ack1
req1
C
x
out(xy)(ab)
y
req3
ack2
ack3
req2
a
out
b
x
req1
ack1
req3
ack3
y
out
ack2
a
req2
b
Data flow graph
Control flow graph Petri net
18Muller C-element
Key component in asynchronous circuit design
like a Petri net transition
x1
yx1x2(x1x2)y
C
y
x2
19Muller C-element
Key component in asynchronous circuit design
like a Petri net transition
x1
yx1x2(x1x2)y
C
y
x2
20Muller C-element
Key component in asynchronous circuit design
like a Petri net transition
0
x1
0
yx1x2(x1x2)y
C
y
0
x2
Set-part
Reset-part
21Muller C-element
Key component in asynchronous circuit design
like a Petri net transition
0-gt1
x1
0
yx1x2(x1x2)y
C
y
0
x2
Set-part
Reset-part
22Muller C-element
Key component in asynchronous circuit design
behaves like a Petri net transition
0-gt1
x1
0
yx1x2(x1x2)y
C
y
0-gt1
x2
Set-part
Reset-part
excited
23Muller C-element
Key component in asynchronous circuit design
behaves like a Petri net transition
1
x1
0
yx1x2(x1x2)y
C
y
1
x2
Set-part
Reset-part
excited
24Muller C-element
Key component in asynchronous circuit design
behaves like a Petri net transition
1
x1
1
yx1x2(x1x2)y
C
y
1
x2
Set-part
Reset-part
stable (new value)
25Muller C-element
Key component in asynchronous circuit design
behaves like a Petri net transition
1
x1
1
yx1x2(x1x2)y
C
y
1
x2
Set-part
Reset-part
26Muller C-element
Key component in asynchronous circuit design
behaves like a Petri net transition
1-gt0
x1
1
yx1x2(x1x2)y
C
y
1
x2
Set-part
Reset-part
27Muller C-element
Key component in asynchronous circuit design
behaves like a Petri net transition
1-gt0
x1
1
yx1x2(x1x2)y
C
y
1-gt0
x2
Set-part
Reset-part
excited
28Muller C-element
Key component in asynchronous circuit design
behaves like a Petri net transition
0
x1
0
yx1x2(x1x2)y
C
y
0
x2
Set-part
Reset-part
stable (new value)
29Muller C-element
Key component in asynchronous circuit design
like a Petri net transition
x1
yx1x2(x1x2)y
C
y
x2
It acts symmetrically for pairs of 0-1 and 1-0
transitions waits for both input events to occur
30Muller C-element
Key component in asynchronous circuit design
like a Petri net transition
x1
yx1x2(x1x2)y
C
y
x2
It acts symmetrically for pairs of 0-1 and 1-0
transitions waits for both input events to occur
31Muller C-element
Power
NMOS circuit implementation
y
x1
x2
x1
x2
Ground
32Muller C-element
Power
y
x1
x2
x1
x2
Ground
33Muller C-element
Power
y
x1
x2
x1
x2
Ground
34Why asynchronous is good
- Performance (work on actual, not max delays)
- Robustness (operationally scalable no clock
distribution important when gate-to-wire delay
ratio changes) - Low Power (change-based computing fewer
signal transitions) - Low Electromagnetic Emission (more even
power/frequency spectrum) - Modularity and re-use (parts designed
independently well-defined interfaces) - Testability (inherent self-checking via ack
signals)
35Obstacles to Async Design
- Design tool support commercial design tools are
aimed at clocked systems - Difficulty of production testing production
testing is heavily committed to use of clock - Aversion of majority of designers, trained with
clock biggest obstacle - Overbalancing effect of periodic (every 10 years)
asynchronous euphoria
36Why control logic
- Customary in hardware design to separate control
logic from datapath logic due to different design
techniques - Control logic implements the control flow of a
(possibly concurrent) algorithm - Datapath logic deals with operational part of the
algorithms - Datapath operations may have their (lower level)
control flow elements, so the distinction is
relative - Examples of control-dominated logic a bus
interface adapter, an arbiter, or a modulo-N
counter - Their behaviour is a combination of partial
orders of signal events - Examples of data-dominated logic are a register
bank or an arithmetic-logic unit (ALU)
37Role of Petri Nets
- We concentrate here on control logic
- Control logic is behaviourally more diverse than
data path - Petri nets capture causality and concurrency
between signalling events, deterministic and
non-deterministic choice in the circuit and its
environment - They allow
- composition of labelled PNs (transition or place
sync/tion) - refinement of event annotation (from abstract
operations down to signal transitions) - use of observational equivalence (lambda-events)
- clear link with state-transition models in both
directions
38Design flow with Petri nets
Abstract behaviour synthesis
Abstract behavioural model Labelled Petri nets
(LPNs)
Signalling refinement
Timing diagrams
Verification and Performance analysis
Logic behavioural model Signal Transition Graphs
(STGs)
STG-based logic synthesis (deriving boolean
functions)
Syntax-direct translation (deriving circuit
structure)
Decomposition and gate mapping
Circuit netlist
Library cells
39Tutorial Outline
- Introduction
- Modeling Hardware with PNs
- Synthesis of Circuits from PN specifications
- Circuit verification with PNs
- Performance analysis using PNs
40Modelling.Outline
- High level modelling and abstract refinement
processor example - Low level modelling and logic synthesis
interface controller example - Modelling of logic circuits event-driven and
level-driven parts - Properties analysed
41High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
42High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode
One-word Instruction Execute
Memory Read
Two-word Instruction Execute
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode
Instruction Register Load
43High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode
One-word Instruction Execute
Memory Read
Two-word Instruction Execute
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode
Instruction Register Load
44High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode
One-word Instruction Execute
Memory Read
Two-word Instruction Execute
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode
Instruction Register Load
45High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode
One-word Instruction Execute
Memory Read
Two-word Instruction Execute
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode
Instruction Register Load
46High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode
One-word Instruction Execute
Memory Read
Two-word Instruction Execute
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode
Instruction Register Load
47High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode
One-word Instruction Execute
Memory Read
Two-word Instruction Execute
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode
Instruction Register Load
48High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode
One-word Instruction Execute
Memory Read
Two-word Instruction Execute
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode
Instruction Register Load
49High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode
One-word Instruction Execute
Memory Read
Two-word Instruction Execute
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode
Instruction Register Load
50High-level modellingProcessor Example
Instruction Fetch
Instruction Execution (not exactly yet!)
One-word Instruction Decode
One-word Instruction Execute
Memory Read
Two-word Instruction Execute
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode
Instruction Register Load
51High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode
One-word Instruction Execute
Memory Read
Two-word Instruction Execute
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode
Instruction Register Load
52High-level modellingProcessor Example
Instruction Fetch
Instruction Execution (now it is!)
One-word Instruction Decode
One-word Instruction Execute
Memory Read
Two-word Instruction Execute
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode
Instruction Register Load
53High-level modellingProcessor Example
Instruction Fetch
Instruction Execution
One-word Instruction Decode
One-word Instruction Execute
Memory Read
Two-word Instruction Execute
Program Counter Update
Memory Address Register Load
Two-word Instruction Decode
Instruction Register Load
54High-level modellingProcessor Example
- The details of further refinement, circuit
implementation (by direct translation) and
performance estimation (using UltraSan) are in - A. Semenov, A.M. Koelmans, L.Lloyd and A.
Yakovlev. Designing an asynchronous processor
using Petri Nets, IEEE Micro, 17(2)54-64, March
1997 - For use of Coloured Petri net models and use of
Design/CPN in processor modeling - F.Burns, A.M. Koelmans and A. Yakovlev.
Analysing superscala processor architectures with
coloured Petri nets, Int. Journal on Software
Tools for Technology Transfer, vol.2, no.2, Dec.
1998, pp. 182-191. -
55Low-level ModellingInterface Example
- Insert VME bus figure 1 timing diagrams
56Low-level ModellingInterface Example
- Insert VME bus figure 2 - STG
57Low-level ModellingInterface Example
- Details of how to model interfaces and design
controllers are in - A.Yakovlev and A. Petrov,
- complete the reference
58Low-level ModellingInterface Example
- Insert VME bus figure 3 circuit diagram
59Logic Circuit Modelling
Event-driven elements
Petri net equivalents
C
Muller C-element
Toggle
60Logic Circuit Modelling
Level-driven elements
Petri net equivalents
y(0)
x0
x(1)
y1
y0
x1
NOT gate
x0
x(1)
z(0)
z1
y0
y(1)
b
NAND gate
x1
z0
y1
61Event-driven circuit example
- Insert the eps file for fast fwd pipeline cell
- control
62Level-driven circuit example
- Insert the eps file for the example with
- two inverters and OR gate
63Properties analysed
- Functional correctness (need to model
environment) - Deadlocks
- Hazards
- Timing constraints
- Absolute (need for Time(d) Petri nets)
- Relative (compose with a PN model of order
conditions)
64Adequacy of PN modelling
- Petri nets have events with atomic action
semantics - Asynchronous circuits may exhibit behaviour that
does not fit within this domain due to inertia
a b
a
a
00
10
01
b
11
b
65Other modelling examples
- Examples with mixed event and level based
signalling - Lazy token ring arbiter spec
- RGD arbiter with mutex
66Lazy ring adaptor
Lr
R
dum
dum
G
Rr
La
D
Ra
t0 (token isnt initially here)
t1
t0
67Lazy ring adaptor
Lr
R
R
dum
G
D
dum
Rr
Lr
G
Rr
Ring adaptor
Ra
La
La
D
Ra
t0-gt1-gt0 (token must be taken from the right and
past to the left
t1
t0
68Lazy ring adaptor
Lr
R
R
dum
G
D
dum
Rr
Lr
G
Rr
Ring adaptor
Ra
La
La
D
Ra
t1 (token is already here)
t1
t0
69Lazy ring adaptor
Lr
R
R
dum
G
D
dum
Rr
Lr
G
Rr
Ring adaptor
Ra
La
La
D
Ra
t0-gt1 (token must be taken from the right)
t1
t0
70Lazy ring adaptor
Lr
R
R
dum
G
D
dum
Rr
Lr
G
Rr
Ring adaptor
Ra
La
La
D
Ra
t1 (token is here)
t1
t0
71Tutorial Outline
- Introduction
- Modeling Hardware with PNs
- Synthesis of Circuits from PN specifications
- Circuit verification with PNs
- Performance analysis using PNs
72Synthesis.Outline
- Abstract synthesis of LPNs from transition
systems and characteristic trace specifications - Handshake and signal refinement (LPN-to-STG)
- Direct translation of LPNs and STGs to circuits
- Examples
- Logic synthesis from STGs
- Examples
73Synthesis from trace specs
- Modelling behaviour in terms of characteristic
predicates on traces (produce LPN snippets) - Construction of LPNs as compositions of snippets
- Examples n-place buffer, 2-way merge
74Synthesis from transition systems
- Modelling behaviour in terms of a sequential
capture transition system - Synthesis of LPN (distributed and concurrent
object) from TS (using theory of regions) - Examples one place buffer, counterflow pp
75Synthesis from process-based languages
- Modelling behaviour in terms of a process
- (-algebraic) specifications (CSP, )
- Synthesis of LPN (concurrent object with explicit
causality) from process-based model (concurrency
is explicit but causality implicit) - Examples modulo-N counter
76Refinement at the LPN level
- Examples of refinements, and introduction of
silent events - Handshake refinement
- Signalling protocol refinement (return-to-zero
versus NRZ) - Arbitration refinement
- Brief comment on what is implemented in Petrify
and what isnt yet
77Translation of LPNs to circuits
- Examples of refinements, and introduction of
silent events
78Why direct translation?
- Logic synthesis has problems with state space
explosion, repetitive and regular structures
(log-based encoding approach) - Direct translation has linear complexity but can
be area inefficient (inherent one-hot encoding) - What about performance?
79Direct Translation of Petri Nets
- Previous work dates back to 70s
- Synthesis into event-based (2-phase) circuits
(similar to micropipeline control) - S.Patil, F.Furtek (MIT)
- Synthesis into level-based (4-phase) circuits
(similar to synthesis from one-hot encoded FSMs) - R. David (69, translation FSM graphs to CUSA
cells) - L. Hollaar (82, translation from parallel
flowcharts) - V. Varshavsky et al. (90,96, translation from
PN into an interconnection of David Cells)
80Synthesis into event-based circuits
- Patils translation method for simple PNs
- Furteks extension to 1-safe net
- Pragmatic extensions to Patils set (for
non-simple PNs) - Examples modulo-N counter, Lazy ring adapter
81Synthesis into level-based circuits
- Davids method for FSMs
- Holaars extensions to parallel flow charts
- Varshavskys method for 1-safe Petri nets
- Examples counter, VME bus, butterfly circuit
82Davids original approach
a
x1
yb
x1
x2
b
d
ya
yc
c
x2
x1
x2
CUSA for storing state b
Fragment of flow graph
83Hollaars approach
(0)
M
(1)
K
A
(1)
N
M
N
(1)
B
(1)
L
L
K
1
(0)
A
1
B
Fragment of flow-chart
One-hot circuit cell
84Hollaars approach
1
M
0
K
A
(1)
N
M
N
0
B
(1)
L
L
K
1
(0)
A
1
B
Fragment of flow-chart
One-hot circuit cell
85Hollaars approach
1
M
0
K
A
(1)
N
M
N
1
B
(1)
L
L
K
0
(0)
A
1
B
Fragment of flow-chart
One-hot circuit cell
86Varshavskys Approach
Controlled
Operation
p1
p2
p2
p1
(0)
(1)
(1)
(0)
(1)
1
To Operation
87Varshavskys Approach
p1
p2
p2
p1
0-gt1
1-gt0
(1)
(0)
(1)
1-gt0
88Varshavskys Approach
p1
p2
p2
p1
1-gt0
0-gt1
1-gt0
0-gt1
1
1-gt0-gt1
89Translation in brief
This method has been used for designing control
of a token ring adaptor Yakovlev et al.,Async.
Design Methods, 1995 The size of control was
about 80 David Cells with 50 controlled hand
shakes
90Direct translation examples
- In this work we tried direct translation
- From STG-refined specification (VME bus
controller) - Worse than logic synthesis
- From a largish abstract specification with high
degree of repetition (mod-6 counter) - Considerable gain to logic synthesis
- From a small concurrent specification with dense
coding space (butterfly circuit) - Similar or better than logic synthesis
91Example 1 VME bus controller
Result of direct translation (DC unoptimised)
92VME bus controller
After DC-optimisation (in the style of Varshavsky
et al WODES96)
93David Cell library
94Data path control logic
Example of interface with a handshake control
(DTACK, DSR/DSW)
95Example 2 Flat mod-6 Counter
- TE-like Specification
- ((p?q!)5p?c!)
- Petri net (5-safe)
q!
5
p?
5
c!
96Flat mod-6 Counter
Refined (by hand) and optimised (by Petrify)
Petri net
97Flat mod-6 counter
Result of direct translation (optimised by hand)
98David Cells and Timed circuits
(a) Speed-independent
(b) With Relative Timing
99Flat mod-6 counter
(a) speed-independent
(b) with relative timing
100Butterfly circuit
STG after CSC resolution
Initial Specification
a
b
x
a
a-
z
y
x-
b-
a-
b
b-
y-
z-
101Butterfly circuit
Speed-independent logic synthesis solution
102Butterfly circuit
Speed-independent DC-circuit
103Butterfly circuit
DC-circuit with aggressive relative timing