Title: HigherLevel Hardware Synthesis
1Higher-LevelHardware Synthesis
- Richard Sharp
- Intel Research _at_ Cambridge
- 12th March 2003
2talkoverview
- High-Level Synthesis (HLS)
- Limitations of current HLS
- The FLaSH synthesis system
- The SAFL Hardware Description Language
- Compiling SAFL to Hardware
- Soft Scheduling
- Program Transformation Hw/Sw Co-Design
- Combining behaviour and structure
- Case Study
- Conclusions and further work
3classical high-levelsynthesis (1/2)
High-level specification
While (c ! 5) x x 6 c x x c
x if cltx then c 12 else c cx
High-level synthesis
Module m3(in1,out,clk) always _at_(posedge clk)
begin t lt in1t end conv
i1_conv(t,out) End module
Structural hardware description
Logic synthesis, place and route, timing
analysis, technology mapping etc.
4classical high-levelsynthesis (2/2)
2
1
f(x) f(y) g(z)
f
g
allocation
f(x) f(y) g(z)
binding
4
t0
3
f(x)
g(z)
static scheduling
t1
f(x)
f(y)
f(y)
t2
g(z)
t3
t4
5limitations of HLS
- Low-level structuring primitives
- e.g. Behavioural-Verilog still has modules
- All scheduling performed statically
- Whereas real hardware often schedules dynamically
- Black-box approach
- HLS tools are not as smart as engineers yet
- Artificial separation of control/data-flow
- C is not a good language for hardware description
6limitations of HLS low-level structuring
module m1 (p1, p2, ..) ltbodygt end module
module m2 (p1, p2, ..) ltbodygt end module
m2
m1
- Inter-module interfaces coded explicitly
- Low-level details scattered through code
- Global analysis and optimisation difficult
- Global transformation difficult
- Low-level design decisions become fixed in
specification. Not High-Level!
7limitations of HLS static scheduling (1/2)
- Operator execution times must be statically
bounded - Fine for simple arithmetic functions etc...
- ... but cannot schedule a bus transaction of
data-dependent length statically
?
8limitations of HLS static scheduling (2/2)
- Contention resolved by statically serialising
access to shared functional units
- Inefficient if contention is unlikely
- What if neither Task1 or Task2 terminate?
Task1
Task2
Task1
Task2
f
(time)
Static scheduling can unnecessarily inhibit
parallelism possibilities
9limitations of HLS black box approach
- HLS tools are not as good as engineers
- But tool removes all control from designer
- Design not quite right gt useless
- Until technology matures human involvement is
essential
?
Spec Constraints
Structural Hardware Design
10limitations of HLS separating control/data
- Syntactic stratification of statements and
expressions
Expressions
- All data-flow realised by writing/reading
variables - Good model for software (true for machine code)
- Inappropriate model for hardware
Statements
Compilers can remove some unnecessary
dependencies but they dont do very well in
practice!
11Talk overview
- High-Level Synthesis (HLS)
- Limitations of current HLS
- The FLaSH synthesis system
- Overview
- The SAFL Hardware Description Language
- Compiling SAFL to Hardware
- Soft Scheduling
- Program Transformation Hw/Sw Co-design
- Combining behaviour and structure
- Case Study
- Conclusions and further work
12the FLaSH synthesis system (overview)
SAFL
Magma
SAFL
Programming Language Design
Program Transformation
Global Analysis (architecture neutral)
Compiler Design and Synthesis
Synchronous hardware
Other design Styles
Architecture- specific optimisation
Architecture- specific optimisation
Standard tools for mapping RTL-Verilog to silicon
13the FLaSH synthesis system (overview)
SAFL
Magma
SAFL
Programming Language Design
Program Transformation
Global Analysis (architecture neutral)
Compiler Design and Synthesis
Synchronous hardware
Other design Styles
Architecture- specific optimisation
Architecture- specific optimisation
Standard tools for mapping RTL-Verilog to silicon
14the SAFL language
statically allocated functional language
- Concurrent - Statically Allocated - Resource
Aware
Hardware-specific properties
- Functional - Call-by Value - First-order
General properties
The FLaSH Silicon Compiler translates SAFL into
RTL-Verilog.
15SAFL syntax
- e i (integer constant)
- x (variable)
- f(e,,e) (user-def fn app)
- a(e,,e) (primitive fn app)
- if e then e else e
- let val x_1 e val x_n e in e end
- d fun f(x_1, , x_n) e
- p d p d
16a SAFL example
- fun mult(x, y, acc)
- if (x0 y0) then acc
- else mult(xltlt1, ygtgt1, if y.bit0 then accx
else acc) -
- fun cube(x) mult(x, mult(x, x, 0), 0)
- fun f(x) mult(x,5,0) (if xgt100 then
mult(x,21,0) else )
circuit structure
Points to note
mult
cube
- One fun gt one hardware block
- SAFL captures allocation, binding and
system-level structure - Program transformation to explore resource
tradeoffs
f
17compiling SAFL to hardware (1/2)
fun f(x) fun g(x) f(x) fun h(x)
f(x)
f
h
g
Data
Control
18compiling SAFL to hardware (2/2)
A Block Diagram of a Single fun definition
Arb
Call control
Control Out
Control in
Reg
fun Body
Data in
Data out
(tail) recursive calls
19soft scheduling (1/3) inserting arbitration
- Resolve contention dynamically
Task 1
Shared Resource
Task 2
scheduling logic
Alleviates problems with static
scheduling inserting scheduling logic on every
shared resource leads to inefficient
designs Normally do this anyway but in an ad-hoc
manner!
20soft scheduling (2/3) removing arbitration
- Schedule dynamically
- Use static-analysis to remove redundant
scheduling logic
Task 1
Shared Resource
Task 2
Task 3
Inspired by soft typing
21soft scheduling (3/3) program transformation
fun large_block() ((xy)(zx))y
(zf(x)) p
fun mult(x,y) xy fun large_block()
let t1 mult(x,y) in let t2 mult(z,x)
in mult(t1t2,y) end
end (mult(z,f(x))) p
- Access to large_block scheduled dynamically
- Access to mult scheduled statically
- Opens up synthesis process to the designer no
longer black box
soft scheduling provides a single framework in
which we can deal with both small- and
large-scale resources
22talkoverview
- High-Level Synthesis (HLS)
- Limitations of current HLS
- The FLaSH synthesis system
- Overview
- The SAFL Hardware Description Language
- Compiling SAFL to Hardware
- Soft Scheduling
- Program Transformation Hw/Sw Co-Design
- Combining behaviour and structure
- Case Study
- Conclusions and further work
23Hardware/softwareco-design (1/2)
fun f()
fun g()
- SAFL can describe processors
- SAFL can describe memories
fun h()
fun i()
fun j()
fun f()
fun g()
fun h()
fun proc()
fun mem()
24Hardware/softwareco-design (2/2)
- Specialise processors depending on code
- Synthesise a network of communicating
heterogeneous processors for s/w part - Co-design just one of a library of transformations
25talkoverview
- High-Level Synthesis (HLS)
- Limitations of current HLS
- The FLaSH synthesis system
- Overview
- The SAFL Hardware Description Language
- Compiling SAFL to Hardware
- Soft Scheduling
- Program Transformation Hw/Sw Co-Design
- Combining behaviour and structure
- Case Study
- Conclusions and further work
26magma a structural HDL
- Embedded in pure-functional ML
- Similar syntax and semantics (CBV) to SAFL
- Supports synthesis/simulation
- Uses ML functors to parameterise over different
basis functions - Synthesis Static Expansion
- Only describes acyclic, combinatorial hardware
- no observable sharing problems
27a magmaexample (1/3)
functor RippleAdder (BBASIS)RP_ADD struct
type bitB.bit fun adder (x,y,c_in)
(B.xorb(c_in, B.xorb(x,y)),
B.orb( B.orb( B.andb (x,y),
B.andb(x,c_in)),
B.andb(y,c_in)))
x
y
Adder
c_in
c_out
s_out
28a magmaexample (2/3)
fun carry_chain f _ (,)
carry_chain f c_in (xxs,yys)
let val (res_bit, c_out) f (x,y,c_in)
in res_bit(carry_chain f c_out (xs,ys))
end val ripple_add carry_chain adder
B.b0
x1
y1
x2
y2
x3
y3
X4
y4
x5
y5
b0
Adder
Adder
Adder
Adder
Adder
s_out1
s_out2
s_out3
s_out4
s_out5
29a magmaexample (3/3)
- Support for simulation and synthesis
- structure SimulateAdder RippleAdder
(SimulationBasis) - SimulateAdder.ripple_add
(b1,b0,b0,b1,b1,b1,b0,b1,b1,b0,b1,b1) val
it b1,b1,b1,b1,b0,b1 SimulateAdder.bit
list - structure SynthesiseAdder RippleAdder
(SynthesisBasis) - SynthesiseAdder.ripple_add
(Magma.new_bus 5, Magma.new_bus 5)
and(w_1,w_45,w_46) and(w_2,w_1,w_44) ...
and(w_149,w_55,w_103) val it
"w_149","w_150","w_151","w_152","w_153"
30integratingSAFL and magma (1/2)
lt ( Magma code Library Block
---------------------------------- )
signature RP_ADD ... functor Magma_Code
(BBASIS)RP_ADD struct fun adder
(x,y,c_in) (B.xorb(c_in, B.xorb(x,y)),
B.orb( B.orb( B.andb (x,y),
B.andb(x,c_in)),
B.andb(y,c_in))) end gt ( SAFL
code --------------------------------------------
------- ) fun mult(x, y, acc) if (x0 y0)
then acc else mult(xltlt1, ygtgt1, if
y0 then lt carry_chain adder B.b0 gt(acc,x)
else acc)
31integratingSAFL and magma (2/2)
Execute Magma under Synthesis Interpretation
Process 1 ML Session
Magma
Verilog
Process 2 SAFL Compiler
Encounter Magma Fragment
Time
lt m gt(e_1, , e_k)
32case study (1/3) DES
- SAFL Describes DES Algorithm
- Magma Describes Wiring Permutations
- Version in FMCAD paper not pipelined
- Subsequently made a 4-stage pipelined version by
program transformation (see thesis!) - Throughput of 15.8 Mb/sec
- On Altera APEX 200K FPGA with 33MHz clock
- Theoretical max clock speed design gt 40MHz
- 2 DES blocks test harness gt 17 of FPGA
33case study (2/3) DES Dev Board
Status LEDs
APEX E20K200E
34case study (3/3) SAFL lt-gt VGA
Dual Ported RAM
SAFL Design
Verilog VGA Interface
Monitor
SAFL fn call interface
35conclusions
- Conventional HLS has a number of serious
limitations - We believe some of these limitations are due to
language design issues - Our research attempts to address these
limitations through - Designing high-level languages specifically for
h/w - Developing new techniques for analysing and
synthesising these languages
36other/future work
Done
- SAFL
- SAFL with synchronous channels, pi-calculus style
channel passing, assignment a few other bits - Semantics compiler implementation done
- Other design styles
- We proposed translations for various design
styles but did not get as far as building any
real examples. - Would love to try various flavours of
asynchronous design - Tool for SAFL Transformation
To Do