Title: High-Level Synthesis: Creating Custom Circuits from High-Level Code
1High-Level Synthesis Creating Custom
Circuits from High-Level Code
- Greg Stitt
- ECE Department
- University of Florida
2Existing FPGA Tool Flow
- Register-transfer (RT) synthesis
- Specify RT structure (muxes, registers, etc)
- Allows precise specification
- - But, time consuming, difficult, error prone
HDL
RT Synthesis
Technology Mapping
Netlist
Placement
Physical Design
Bitfile
Routing
3Future FPGA Tool Flow?
C/C, Java, etc.
High-level Synthesis
HDL
RT Synthesis
Technology Mapping
Netlist
Placement
Physical Design
Bitfile
Routing
4High-level Synthesis
- Wouldnt it be nice to write high-level code?
- Ratio of C to VHDL developers (100001 ?)
- Easier to specify
- Separates function from architecture
- More portable
- - Hardware potentially slower
- Similar to assembly code era
- Programmers could always beat compiler
- But, no longer the case
- Hopefully, high-level synthesis will catch up to
manual effort
5High-level Synthesis
- More challenging than compilation
- Compilation maps behavior into assembly
instructions - Architecture is known to compiler
- High-level synthesis creates a custom
architecture to execute behavior - Huge hardware exploration space
- Best solution may include microprocessors
- Should handle any high-level code
- Not all code appropriate for hardware
6High-level Synthesis
- First, consider how to manually convert
high-level code into circuit - Steps
- 1) Build FSM for controller
- 2) Build datapath based on FSM
acc 0 for (i0 i lt 128 i) acc ai
7Manual Example
- Build a FSM (controller)
- Decompose code into states
acc 0 for (i0 i lt 128 i) acc ai
acc0, i 0
if (i lt 128)
Done
load ai
acc ai
i
8Manual Example
- Build a datapath
- Allocate resources for each state
acc0, i 0
if (i lt 128)
Done
ai
acc
addr
i
load ai
1
128
1
acc ai
lt
i
acc 0 for (i0 i lt 128 i) acc ai
9Manual Example
- Build a datapath
- Determine register inputs
In from memory
acc0, i 0
a
0
0
if (i lt 128)
2x1
2x1
2x1
Done
ai
acc
addr
i
load ai
1
128
1
acc ai
lt
i
acc 0 for (i0 i lt 128 i) acc ai
10Manual Example
- Build a datapath
- Add outputs
In from memory
acc0, i 0
a
0
0
if (i lt 128)
2x1
2x1
2x1
Done
ai
acc
addr
i
load ai
1
128
1
acc ai
lt
i
acc 0 for (i0 i lt 128 i) acc ai
acc
Memory address
11Manual Example
- Build a datapath
- Add control signals
In from memory
acc0, i 0
a
0
0
if (i lt 128)
2x1
2x1
2x1
Done
ai
acc
addr
i
load ai
1
128
1
acc ai
lt
i
acc 0 for (i0 i lt 128 i) acc ai
acc
Memory address
12Manual Example
- Combine controllerdatapath
In from memory
Controller
a
0
0
2x1
2x1
2x1
ai
acc
addr
i
1
128
1
lt
acc 0 for (i0 i lt 128 i) acc ai
Done
Memory Read
acc
Memory address
13Manual Example
- Alternatives
- Use one adder (plus muxes)
In from memory
a
0
0
2x1
2x1
2x1
ai
acc
addr
i
1
128
lt
MUX
MUX
acc
Memory address
14Manual Example
- Comparison with high-level synthesis
- Determining when to perform each operation
- gt Scheduling
- Allocating resource for each operation
- gt Resource allocation
- Mapping operations onto resources
- gt Binding
15Another Example
x0 for (i0 i lt 100 i) if (ai gt 0)
x else x -- ai
x //output x
- Steps
- 1) Build FSM (do not perform if conversion)
- 2) Build datapath based on FSM
16High-Level Synthesis
Could be C, C, Java, Perl, Python, SystemC,
ImpulseC, etc.
High-level Code
High-Level Synthesis
Custom Circuit
Usually a RT VHDL description, but could as low
level as a bit file
17High-Level Synthesis
acc 0 for (i0 i lt 128 i) acc ai
High-Level Synthesis
18Main Steps
High-level Code
Converts code to intermediate representation -
allows all following steps to use common format.
Front-end
Syntactic Analysis
Intermediate Representation
Optimization
Determines when each operation will execute
Scheduling
Back-end
Determines physical resources to be used
Resource Allocation
Maps operations onto physical resources
Binding
Controller Datapath
19Syntactic Analysis
- Definition Analysis of code to verify syntactic
correctness - Converts code into intermediate representation
- 2 steps
- 1) Lexical analysis (Lexing)
- 2) Parsing
High-level Code
Lexical Analysis
Syntactic Analysis
Parsing
Intermediate Representation
20Lexical Analysis
- Lexical analysis (lexing) breaks code into a
series of defined tokens - Token defined language constructs
x 0 if (y lt z) x 1
Lexical Analysis
ID(x), ASSIGN, INT(0), SEMICOLON, IF, LPAREN,
ID(y), LT, ID(z), RPAREN, ID(x), ASSIGN, INT(1),
SEMICOLON
21Lexing Tools
- Define tokens using regular expressions - outputs
C code that lexes input - Common tool is lex
/ braces and parentheses / "" YYPRINT
return LBRACE "" YYPRINT return RBRACE
"," YYPRINT return COMMA "" YYPRINT
return SEMICOLON "!" YYPRINT return
EXCLAMATION "" YYPRINT return LBRACKET
"" YYPRINT return RBRACKET "-"
YYPRINT return MINUS / integers 0-9
yylval.intVal atoi( yytext ) return INT
22Parsing
- Analysis of token sequence to determine correct
grammatical structure - Languages defined by context-free grammar
Correct Programs
Grammar
x 0 y 1
x 0
Program Exp
if (a lt b) x 10
Exp Stmt SEMICOLON IF LPAREN Cond
RPAREN Exp Exp Exp
if (var1 ! var2) x 10
Cond ID Comp ID
x 0 if (y lt z) x 1
x 0 if (y lt z) x 1 y 5 t 1
Stmt ID ASSIGN INT
Comp LT NE
23Parsing
Incorrect Programs
Grammar
x 3 5
Program Exp
Exp S SEMICOLON IF LPAREN Cond RPAREN
Exp Exp Exp
x 5
x 5
if (x5 gt y) x 2
Cond ID Comp ID
x y
S ID ASSIGN INT
Comp LT NE
24Parsing Tools
- Define grammar in special language
- Automatically creates parser based on grammar
- Popular tool is yacc - yet-another-compiler-comp
iler
program functions 1 functions
function 1 functions function
1 function HEXNUMBER LABEL COLON
code 2
25Intermediate Representation
- Parser converts tokens to intermediate
representation - Usually, an abstract syntax tree
Assign
x 0 if (y lt z) x 1 d 6
x
if
0
assign
cond
assign
y
z
x
lt
1
d
6
26Intermediate Representation
- Why use intermediate representation?
- Easier to analyze/optimize than source code
- Theoretically can be used for all languages
- Makes synthesis back end language independent
C Code
Java
Perl
Syntactic Analysis
Syntactic Analysis
Syntactic Analysis
Intermediate Representation
Scheduling, resource allocation, binding,
independent of source language - sometimes
optimizations too
Back End
27Intermediate Representation
- Different Types
- Abstract Syntax Tree
- Control/Data Flow Graph (CDFG)
- Sequencing Graph
- Etc.
- We will focus on CDFG
- Combines control flow graph (CFG) and data flow
graph (DFG)
28Control flow graphs
- CFG
- Represents control flow dependencies of basic
blocks - Basic block is section of code that always
executes from beginning to end - I.e. no jumps into or out of block
acc0, i 0
acc 0 for (i0 i lt 128 i) acc ai
if (i lt 128)
Done
acc ai i
29Control flow graphs
- Your turn
- Create a CFG for this code
i 0 while (j lt 10) if (x lt 5) y
2 else if (z lt 10) y 6
30Data Flow Graphs
- DFG
- Represents data dependencies between operations
c
b
a
d
x ab y cd z x - y
-
y
z
x
31Control/Data Flow Graph
- Combines CFG and DFG
- Maintains DFG for each node of CFG
acc 0 for (i0 i lt 128 i) acc ai
0
0
acc
i
acc0 i0
if (i lt 128)
acc
ai
i
1
Done
acc ai i
i
acc
32High-Level Synthesis Optimization
33Synthesis Optimizations
- After creating CDFG, high-level synthesis
optimizes graph - Goals
- Reduce area
- Improve latency
- Increase parallelism
- Reduce power/energy
- 2 types
- Data flow optimizations
- Control flow optimizations
34Data Flow Optimizations
- Tree-height reduction
- Generally made possible from commutativity,
associativity, and distributivity
a
b
c
d
c
d
a
b
c
d
b
a
b
a
c
d
35Data Flow Optimizations
- Operator Strength Reduction
- Replacing an expensive (strong) operation with
a faster one - Common example replacing multiply/divide with
shift
0 multiplications
1 multiplication
bi ai ltlt 3
bi ai 8
c b ltlt 2 a b c
a b 5
a b 13
c b ltlt 2 d b ltlt 3 a c d b
36Data Flow Optimizations
- Constant propagation
- Statically evaluate expressions with constants
x 0 y x 15 z y 10
x 0 y 0 z 10
37Data Flow Optimizations
- Function Specialization
- Create specialized code for common inputs
- Treat common inputs as constants
- If inputs not known statically, must include if
statement for each call to specialized function
int f (int x) y x 15 return y
10
int f (int x) y x 15 return y
10
int f_opt () return 10
Treat frequent input as a constant
for (I0 I lt 1000 I) f(0)
for (I0 I lt 1000 I) f_opt(0)
38Data Flow Optimizations
- Common sub-expression elimination
- If expression appears more than once, repetitions
can be replaced
a x y . . . . . . . . . . . . b
c 25 x y
a x y . . . . . . . . . . . . b
c 25 a
x y already determined
39Data Flow Optimizations
- Dead code elimination
- Remove code that is never executed
- May seem like stupid code, but often comes from
constant propagation or function specialization
int f (int x) if (x gt 0 ) a b 15
else a b / 4 return a
int f_opt () a b 15 return a
Specialized version for x gt 0 does not need else
branch - dead code
40Data Flow Optimizations
- Code motion (hoisting/sinking)
- Avoid repeated computation
for (I0 I lt 100 I) z x y bi
ai z
z x y for (I0 I lt 100 I) bi
ai z
41Control Flow Optimizations
- Loop Unrolling
- Replicate body of loop
- May increase parallelism
for (i0 i lt 128 i) aI bI cI
for (i0 i lt 128 i2) aI bI
cI aI1 bI1 cI1
42Control Flow Optimizations
- Function Inlining
- Replace function call with body of function
- Common for both SW and HW
- SW - Eliminates instructions/branches
- HW - Eliminates unnecessary control states
for (i0 i lt 128 i) aI f( bI, cI
) . . . . int f (int a, int b) return a b
15
for (i0 i lt 128 i) aI bI cI
15
43Control Flow Optimizations
- Conditional Expansion
- Replace if with logic expression
- Execute if/else bodies in parallel
y ab If (a) x bd Else x bd
y ab x a(bd) abd
DeMicheli
Can be further optimized to
y ab x y d(ab)
44Example
x 0 y a b if (x lt 15) z a b -
c else z x 12 output z 12
45High-Level SynthesisScheduling
46Scheduling
- Scheduling assigns a start time to each operation
- Start times must not violate dependencies in DFG
- Start times must meet performance constraints
- Alternatively, resource constraints
47Examples
a
b
c
d
c
d
a
b
Cycle1
Cycle1
Cycle2
Cycle2
Cycle3
Cycle3
c
d
a
b
Cycle1
Cycle2
48Scheduling Problems
- Several types of scheduling problems
- Usually some combination of performance and
resource constraints - Problems
- Unconstrained
- Not very useful, every schedule is valid
- Minimum latency
- Latency constrained
- Mininum-latency, resource constrained
- i.e. Find the scheduling with the shortest
latency, that uses less than a specified of
resources - NP-Complete
- Mininum-resource, latency constrained
- i.e. find the schedule that meets the latency
constraint (which may be anything), and uses the
minimum of resources - NP-Complete
49Minimum Latency Scheduling
- ASAP (as soon as possible) algorithm
- Find a node whose predecessors are scheduled
- Schedule node one cycle later than max cycle of
predecessor - Repeat until all nodes scheduled
c
d
e
a
b
f
g
h
-
lt
Cycle1
Cycle2
Cycle3
Minimum possible latency - 4 cycles
Cycle4
50Minimum Latency Scheduling
- ALAP (as late as possible) algorithm
- Run ASAP, get minimum latency L
- Find a node whose successors are scheduled
- Schedule node one cycle before than min cycle of
predecessor - Nodes with no successors scheduled to cycle L
- Repeat until all nodes scheduled
c
d
e
a
b
f
g
h
-
lt
Cycle1
Cycle4
Cycle3
Cycle2
Cycle3
Cycle4
51Minimum Latency Scheduling
- ALAP
- Has to run ASAP first, seems pointless
- But, many heuristics need the mobility/slack of
each operation - ASAP gives the earliest possible time for an
operation - ALAP gives the latest possible time for an
operation - Slack difference between earliest and latest
possible schedule - Slack 0 implies operation has to be done in the
current scheduled cycle - The larger the slack, the more options a
heuristic has to schedule the operation
52Latency-Constrained Scheduling
- Instead of finding the minimum latency, find
latency less than L - Solutions
- Use ASAP, verify that minimum latency less than L
- Use ALAP starting with cycle L instead of minimum
latency (dont need ASAP)