Title: Esterel tutorial
1Esterel tutorial
- Mike Kishinevsky (Intel)
- Gerard Berry (Esterel Technology)
- Satnam Singh (Microsoft)
UPC July 5, 2005
2Outline
- Esterel basics
- Hardware and software compilation
- Verification
- Late design changes (ECO)
3Synchronous languages approach
- Time advances in lock step with one or more
clocks - Abstraction of synchronous hardware and discrete
control software - Deterministic concurrency
- Concurrent processes always end up in a unique
fix point state - Explicit well controlled non-determinism is
allowed for modeling needs - Reactive
- No input changes within a cycle gt no internal
and output changes - unless receivers look into past or emitters emit
to the future - Safety correct-by-construction implementation
that can be checked - Convince customers, designers, certification
authorities of safety - Solid mathematical foundation
- Support formal reasoning, verification
- Reviews Proceedings of the IEEE Sept. 1991,
Jan. 2003
4Behavior of Synchronous System
Cycle based read inputs
compute reaction
produce outputs Synchronous within the same
cycle propagate control
propagate signals
5Delay models
- Synchronous languages zero delay
- Esterel, Lustre, Argos, SyncCharts, Signal, PBS,
etc.. - Behavioral determinism
- Choose the right order for dependent actions
- Nice algebra gt useful idealization
Will not discuss today
- Asynchronous languages arbitrary delay
- Petri Nets, CSP, Occam, Internet, etc.
- Behavioral non-determinism
- Determinism for sub-classes (e.g.
delay-insensitive, speed-independent) - More complex than synchrony
- Real computing and communication some delay
- Any implementation has some inertia and cost
- Internal non-determinism is unavoidable
- but, does not imply external non-determinism
(e.g. RTL logic)
6Zero delay example Newtonian Mechanics
Concurrency Determinism Calculations are
feasible
7Predictable delay examples sound, light, waves
- Wait long enough, same result as 0-delay !
- Zero delay and predictable delay are fully
compatible - Constructive semantics is the unification
- A theory of causality for reactive systems
- Clocked digital circuits paradigm
8Synchronous Reactive Systems
signals
signals
control
data
values
values
Esterel v7
9Signals
- Two possible states of a signal during clock
cycle - present emitted by somebody (encoded by 1)
- absent otherwise (encoded
by 0) - Signal format ltpresent_bitgt ltvalue_of_typegt
- present_bit is reactive (does not keep the value
during next cycle) - value is persistent (carry the value to the next
cycle) - Signal types
- Pure (no value) control
- Value only (no present bit) data
- Valued control bit (like a valid bit) and a
data - Signal location input, output, input-output,
local - Full support for scoping of local signals
10Some Esterel statements
- Combinational
- emit S
- if S then else end
- loop
- Sequential
- pause
- await S
- sustain S
-
- Control flow
- Sequence
- Concurrency
- abort
- if
- loop
- Data flow expressions
- ?A lt 0
- ?B ?C
- call P()
- ?D f()
11Sequencing
emit A emit B pause emit C
12Sequencing
emit A emit B pause emit C
A
13Sequencing
emit A emit B pause emit C
A
B
14Sequencing
wait for a cycle
emit A emit B pause emit C
A
B
15Sequencing
emit A emit B pause emit C
C
A
B
16Looping
loop emit A emit B pause emit C end loop
C
A
B
17Looping
loop emit A emit B pause emit C end loop
C
A
B
18Looping
loop emit A emit B pause emit C end loop
C
A
B
19Looping
loop emit A emit B pause emit C end loop
C A
A
B
20Looping
loop emit A emit B pause emit C end loop
- Loop back in the same cycle
- Non-instantaneous body
- Loop invariant cannot reenter if the body still
executes
C A B
A
B
21Decision
emit A emit B pause loop if C then
emit D else Q end if if E then emit F end
if pause end loop
C
C
E
E
D
D
Q F
A
F
B
22Concurrency
await A emit C await B emit D
emit E
A
B
D E
C
- Start parallel statements in the same cycle
- Terminate parallel block once all branches
terminated
B
A
C E
D
A
B
C
D
E
23Preemption
abort pause pause emit A when B emit
C
- Normal termination
- Aborted termination
- Aborted termination emit A preempted
A C
B
C
B
C
24When to react?
await A emit B
await immediate A emit B
A
A
A
B
B
A
A
A
B
B
- Non-immediate (default) form does not react to
signals arrived during the initial instance
(before the first tick)
25When to kill?
abort pause emit A pause emit B when
C emit D
weak abort pause emit A pause emit
B when C emit D
C
C
A D
D
C
C
B D
D
A
A
- Strong abort (default) kills all emissions during
the abort cycle - Weak abort gives signal emissions the last will
26Four (react, kill) possibilities
When to react to A
weak abort P when A
abort P when A
next
abort P when immediate A
weak abort P when immediate A
now
When to kill P
next
now
27Esterel more concise than Verilog
loop await case icu_miss do
if (not cacheble) then await
(normal_ack or error_ack) else
abort await 4 normal_ack
when error_ack end end
case (pcsu_powedown and not jmp_e
and not valid_diag_window) do
await (pcsu_powerdown and not jmp_e)
end end pause end loop
Example from S. Edwards
28Esterel more concise than Verilog
Write to memory as soon as Addr and Data have
arrived. Wait for memory Latency before
iterating. Restart behavior each Replay.
29Esterel more concise than Verilog
Write to memory as soon as Addr and Data have
arrived.
Verilog explicit FSM
Esterel write things once
await Addr await Data emit
Write(funcW(?Addr,?Data))
A
D
A, D/ W( )
A/ W( )
D/ W( )
30Esterel more concise than Verilog
Write to memory as soon as Addr and Data have
arrived. Wait for memory Latency before
iterating.
Esterel write things once
Verilog explicit FSM
loop await Addr await Data
emit Write(funcW(?Addr,?Data)) await
Latency tick end loop
A
D
A, D/ W()
A/ W()
D/ W()
L0
X L-1
X 0
X gt 0 / XX-1
31Esterel more concise than Verilog
Write to memory as soon as Addr and Data have
arrived. Wait for memory Latency before
iterating. Restart behavior each Replay.
Verilog explicit FSM
Esterel write things once
R
loop abort await Addr await Data
emit Write(funcW(?Addr,?Data)) await
Latency tick when Replay end loop
R
A
D
R
A, D/ W()
A/ W()
D/ W()
L0 or R
X L-1
X 0 or R
X gt 0 / XX-1
32SyncCharts Graphical Esterel
SyncChart C. Andre,
Esterel code
loop await A await B emit O each
R
Implemented in Esterel Studio
33Extensions in Esterel v7 language
Goal remove the limitations of Esterel v5 much
more expressive, but very same semantics
- Mix of Esterel imperative and Lustre equational
styles - Better modularity, (mild) object orientation
- data, interface, and module units, data and
interface inheritance - Structured ports, arrays, more signal kinds
- value, temp, registered, etc.
- Static code replication (for ... dopar)
- Support for Moore machines
- Numerical encodings
- binary, onehot, Gray, etc.
- Multi-clock, clock-gating
- 100 synthesizable to RTL/C/SystemC, modular
optimization
34(No Transcript)
35(No Transcript)
36ZBT SSRAM
SDRAM
ROM
DDRSDRAM
OPB
ZBT SSRAMController
SDRAMController
DDR SDRAMController
External BusController
OPB Bridge
On-ChipPeripheral
CoreConnect OPB(On-Chip Peripheral Bus)
CoreConnect Processor Local Bus (PLB) Arbiter
On-ChipPeripheral
405 PPC
I-Cache PLB
OPB Bridge
D-Cache PLB
High-SpeedPeripheral
37(No Transcript)
38Esterel Studio
39Code generation
VHDL, Verilog -gt hardware implementation
void uart_device_driver () .....
Esterel design
uart.c
C -gt software implementation
40Serial ATA
- New standard for inside-the-box storage
connection with cable length lt1m - 100 SW compatible drop in replacement for ATA
with additional capabilities (hot plug) - Fast low voltage differential signaling w/ 8b/10b
encoding - 1.5Gbps -gt 3.0Gbps -gt 6.0Gbps
- Star topology (point-to-point, no hubs)
- Cost competitive with parallel ATA
- Long term scalable solution
41Serial ATA Architecture
Transport Translates taskfile accesses to
sequences of interface operations.
Link Manages interface operations including
transmission/reception of frames.
SATA Host Controller
Transport
SW Interface
PHY Transmits/receives serial signal and converts
to/from digital.
Task File
Link
PHY
42Esterel hierarchical managing of complexity
43How Esterel different from RTL
phy_ready
pmack
/pmack
JK-flop
mod7 reset counter
/pmack
clear
PMACK
1
1
0
1
1
adv suff ! phy_ready / clear
suff
0
2
0
set
1
3
0
7
PMWAIT
adv
pmack
pmack
clear
clear
adv
adv
- Explicit communication via three signals
- Hard to ensure proper use in time reset of data
path - forgotten if not phy_ready at last counting
cycle
44The same spec in Esterel is correct-by-constructio
n
/ PMACK state / abort sustain pmack when
case not phy_ready case 7 adv end
abort / PMWAIT state /
PMACK
/pmack
lt1gt
lt2gt
not phy_ready
7 adv
- No explicit communication
- Compiler does the job, not designer
- Behaviors and signals have local scope
- If (not phy_ready) then automatic correct reset
of counting data path - Sequential events and actions can be embedded
into control (e.g. await 7 adv)
PMWAIT
45Verification by simulation
46Verification with Observers
Inputs
Observed system
System model
Observer
BUG
Outputs
BUG is possibly emitted
BUG is always emitted
Verifier
BUG is never emitted
47Verification engines
- 2 proof engines available inside Esterel Studio
- Built-in verifier TiGeR
- BDD technique
- Prover Plug-in
- SAT numerical techniques
- Handles control data
48Formal verification
Of the OPB slave interface proving that it
wont cause bus timeouts
Proven in less than 2 seconds
49Small Kernel
nothing pause emit S if S then p else q
end suspend p when S p q loop p end p q trap
T in p end exit T signal S in p end
- - empty statement
- - pause for one clock (flop)
- - emit signal S
- branch on presence of S
- stall p for a cycle if S present
- do q after p without delay
- - iterate p forever
- - do p and q in parallel
- - set a trap in p
- - exit from a trap
- - local signal declaration
50Bootstraping of other statements
trap T in P exit T terminate when
P does loop pause ignore
S at first inst. present S then exit T end
end loop end trap
weak abort P when S
Synthesis algorithms may choose to map statements
to circuits directly (for optimization)
51Three methods
- Esterel gt FSM gt encode gt netlist
- does not scale
- Esterel gt netlist (syntax-directed) gt optimize
- Main method in (v5 compiler)
- Optimization both combinational and sequential
- Modular compilation to scale (v6, v7)
- Esterel gt program graph gt encode locally gt
netlist gt optimize - Might give better trade-offs and still scale well
- Columbia U. compiler (Edwards, )
52Syntax directed translation scheme
- Each statement p structurally corresponds to a
box
- GO start p (first cycle)
- RES continue p at the next cycle from the
previous state - SUSP freeze for a cycle (registers keep values)
- KILL reset registers
- E and E signals received and emitted
- SEL at least one register set statement alive
- 1-hot coded completion code
- K0 normal terminate
- K1 pause for a cycle
- K2,K3, - exit enclosing traps
E
E'
SEL
GO
K0
RES
p
K1
SUSP
K2
KILL
...
Exclusive relation GO RES SUSP
53Example signal emission
S
GO
K0
RES
K1
0
KILL
SUSP
SEL
0
emit S
54Example the parallel
E
E
SEL
max
GO
RES
SUSP
KILL
K
GO
K
RES
SUSP
SEL
KILL
55Syntax directed translation by example
loop abort await Addr await Data
call Write (?Addr, ?Data) await
Latency tick when Replay end loop
Addr
GO
SEL
RES
K0
56Syntax directed translation by example
Addr
GO
loop abort await Addr await Data
call Write (?Addr, ?Data) await
Latency tick when Replay end loop
RES
Data
57Syntax directed translation by example
Addr
GO
SEL
loop abort await Addr await Data
call Write (?Addr, ?Data) await
Latency tick when Replay end loop
RES
K0
Data
Write()
58Syntax directed translation by example
Addr
GO
SEL
RES
K0
Data
Write() CLatency
DSZ C
59Syntax directed translation by example
Addr
0
1
Replay
Data
Write() CLatency
DSZ C
60Sequential Optimization Scheme
- Build the netlist
- good start, but too fat
- Remove redundant registers
- not too many, just the fat
- syntactic dont care (group-hot)
- reachable states
- Optimize the logic
- depth for hardware
- area for software
Implementation SIS 1.3 TiGeR
61Automata to circuits, bad solution 1 one-hot
encoding
B
Lots of registers, lots of gates...
62Automata to circuits, bad solution 2 dense
encoding
- for n states, log(n) registers are enough
- but combinational logic needs to encode / decode
states - 2log(n) n combinational gates worst case
very sensitive to the actual encoding of
states only n! permutations to check... no good
heuristics.
63I
O
combinational logic
state registers
R
64Group-hot state encoding
Group-hot structural encoding one register per
explicit delay. Concurrent threads - independent
codes 1-hot 4bits Log 2bits Structural 3bits -
scales best.
loop await A await B emit
0 each R
1 1 0
1
0 1 0
1 0 0
1
1
0 0 1
0
65Sequential optimization algorithms
- Detect registers that are always equal or
opposite - Detect registers that are functions of other
registers for all reachable states gt logic - Multiplex registers that are exclusive over time
On large hierarchical FSMs very good register /
logic ratio
66Hierarchical encoding translation
Columbia University compiler
67Software Compilers
- Automata based
- Esterel program specifies a finite state machine
- Code this FSM in C
- Esterel V3 compiler (INRIA/CMA, 1992)
- The fastest code, but does not scale
- Netlist based
- Esterel program can be mapped to logic netlist
- Sort the netlist and print as C code one cycle
computation require one pass through the netlist - Esterel v5 compiler
- Scales well (linear from the size of the
program), but relatively slow since computes all
equations even if not needed in a cycle - V7 extends for arrays with mapping to for loops
68Software Compilers
- Discrete-Event Based
- Partition Esterel program into basic blocks
- Dispatch them by a fixed scheduler
- Much faster than netlist based, since run only
those blocks that contributes to a cycle
computation - SAXO-RT Weil et al. 2000
- Program dependency graph based
- Translate Esterel program to a concurrent
control-flow graph - Analyze static data dependencies and schedule
- Generate sequential CFG based on the schedule and
translate to C - Somewhat faster than discrete-event, but could
not handle false combinational cycles Edwards,
2000, Synopsys - Potop-Butucaru, 2003 optimized based on a
different internal representation and static
analysis
69Challenges of implementing v7
- Array / replication handling 2 modes
- compile-time expansion to element-level (bit,
int, etc) - expensive, but can be deeply optimized
- no expansion, generates arrays and loops in C/HDL
- much more tricky, but keeps object code size
linear - Numbers / bitvectors handling (ongoing)
- all arithmetic operations fully exact, no bits
dropped - arbitrary precision, full control on
implementation size - multiple numbering systems (binary, onehot, Gray,
user-defined), semantics encoding-independent - Completely identical behavior in C and HDL
70Cyclic Circuit Analysis
1. Characterization of delay-independent
stabilization Fixpoints in 3-valued logic
(based on Seger .) 2. Algorithms to check for
stabilization BDD-based (can be
expensive) can yield an equivalent
acyclic circuit which can be
exponentially bigger (worst case)
implemented in Esterel v5
71Industrial applications
Esterel Dassault Aviation (avionics)
Cadence (codesign) Synopsys (circuit
synthesis) Motorola, Marelli (automobile)
Thomson (protocols) ATT (switching
software) Texas Instruments (chip design)
Lustre, SCADE, Signal Aérospatiale,
SAAB (avionics) Schneider Electric (nuclear
plants) Volvo, EDF, Thomson, Snecma,...
72ECO late changes
- Late after RTL signoff
- Reason last minute bug fixes or spec changes
- Changes must be done at gate-level netlist
- High-level formalisms and tools
- less bugs gt reduce the need for ECO
- cannot completely suppress the need for ECO gt
netlist changes are harder since RTL is
machine generated
73Holy Grail ECO Compiler
- Change the spec
- Compile incrementally to a gate netlist
- Guarantee (close to) minimal changes
- HARD
- must go across long tool chain
- Esterel compiler, seq and comb logic
optimization, technology mapping
74Real Life ECO Assistant
- Help in patching the generated netlist
- Where to patch?
- How to patch?
- Is patch equivalent to appropriate source
change? - Solution
- Traceability
- Reversible sequential optimization
- Sequential equivalence
75The ECO Flow
76Traceable Circuit Generation
770 0 Root (4) 1 0 Present B (7, 2) 2 0
Resume lt6gt 3 0 Present A (5 ,4) 4 0
Pause (3) lt2gt 5 0 Emit X (7) 6 0 Watch 1
lt0gt 7 0 Emit Y (8) 8 0 Pause (7) lt0gt
78HDL names vs. Esterel names
await_at_state1 A
79ECO 2 add a transition
- Next (state) Stay (state) or Enter (state)
- Reduce stay condition for source state
- Expand enter condition for target state
80ECO 2 add a transition
Reduce stay condition
81ECO 2 add a transition
Expand enter condition
82ECO 2 patched netlist
83Reversible optimization ?
I
O
combinational logic
state registers
R
84Reversible optimization observe state
I
O
combinational logic
state registers
state registers
R
85Reversible optimization observe state
I
O
combinational logic
new state registers
R
86Optimized circuit discard old registers and
re-optimize combinationally
I
O
combinational logic
new state registers
R
87Optimized circuit discard old registers and
re-optimize combinationally
I
O
combinational logic
new state registers
R
88Optimized circuit discard old registers and
re-optimize combinationally
I
O
combinational logic
new state registers
R
89Register correspondence for ECO assistance
I
O
combinational logic
new state registers
R
90Register correspondence for ECO assistance
combinational logic
new state registers
R
91Register Correspondence Table
92ECO 2 for optimized design exiting BAD1
- reconstruct missing reg BAD1
- reduce stay condition for all encoding bits of
BAD1
variable in_BAD1 std_logic
variable eco_exit_BAD1 std_logic
in_BAD1 eco_1 and eco_23 and
eco_27 eco_exit_BAD1 in_BAD1 and tx_adv and
prim_pmreq_det New_NS_eco_1 lt Old_NS_eco_1
and not eco_exit_BAD1 New_NS_eco_23 lt
Old_NS_eco_23 and not eco_exit_BAD1 New_NS_eco_27
lt Old_NS_eco_27 and not eco_exit_BAD1
93ECO 2 for optimized design entering BAD3
- reconstruct missing reg BAD3
- expand enter condition for encoding bits of BAD3
- reused bits start from previous modifications
variable BAD1_to_BAD2 std_logic variable
BAD1_to_IDLE std_logic variable exit_BAD1
std_logic -- old exit BAD1
condition BAD1_to_BAD2 in_BAD1 and
movePrim BAD1_to_IDLE in_BAD1 and
phy_ready_deasserted exit_BAD1
BAD1_to_BAD2 or BAD1_to_IDLE New_NS_eco_1 lt
(Old_NS_eco_1 and not eco_exit_BAD1 )
or (eco_exit_BAD1
and not exit_BAD1) New_NS_eco_13 lt
Old_NS_eco_13
or (eco_exit_BAD1 and not exit_BAD1)
94Quality of reversible optimization
seq. optimize
discard old regs
comb. optimize
C
D
E
F
95ECO verification
- For control, capacity of sequential verification
matches capacity of (reachability based)
sequential optimization - For reversible optimization sequential
equivalence reduced to combinational
equivalence
96Conclusion
- ECO is one of the road-blocks in deploying HLD
- ECO assistant is implemented in Esterel Studio
5.0 - traceability
- reversible sequential optimization
- little to no degradation in quality
- sequential ECO problem becomes combinational
- sequential equivalence
- ensures ECO and sequential optimization
correctness - Request to Academia work on expanding
capacity limits - Core ideas applicable to other HLD flows e.g.
Bluespec, Lava for sequential ex., behavioral
synthesis - Available to Universities (free academic
program)