Automatic Synthesis and Optimization of Floating Point Hardware - PowerPoint PPT Presentation

1 / 66

About This Presentation

Title:

Automatic Synthesis and Optimization of Floating Point Hardware

Description:

Hardware Description Language (HDL) based design has shortcomings ... Nelder-Mead method to minimize the cost function. 28. Float Design Environment. Cost Function ... – PowerPoint PPT presentation

Number of Views:103

Avg rating:3.0/5.0

Slides: 67

Provided by: stef267

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Synthesis and Optimization of Floating Point Hardware

1
Automatic Synthesis and Optimization of Floating
Point Hardware

Ho Chun Hok
Department of Computer Science and Engineering
The Chinese University of Hong Kong

18JUL2003
2
Overview

Introduction
Fly Modifiable Compiler
Float Floating Point Library
Function Generator
Results
Conclusion

3
Introduction

Hardware Description Language (HDL) based design
has shortcomings
Hardware designs are parallel and people think in
von-Neumann patterns
Complex to decompose a hardware design into
datapath and control signal
Errors must introduced during the translation
Debugging on the hardware is harder then on the
software
Hardware Interface for FPGA board must be
developed
A designer must have strong background on the
hardware design

4
Introduction

Elementary Functions are not supported
No floating point arithmetic
No standard mathematical library like ltmath.hgt
No log, sin, cos, 1/x,
The size of FPGA is limited.
Area is an essential factor of a design

5
Motivations

Is it possible to use single description on both
software and hardware design?
Can we optimize the floating point arithmetic on
hardware to save the resource?
On hardware design, can we introduce mathematic
library just like software do?

6
Objectives

Main goal ? Use the smallest effort to develop
hardware on FPGA
No need to familiar with hardware knowledge
The compilation from description to hardware is
transparent to the designer
Floating point arithmetic supported
Elementary mathematic library provided, like
software programming

7
Contributions

A framework with 3 modules is developed
Fly Modifiable Hardware Complier
Translate description into datapath
Float Floating Point Arithmetic Library
Provide parameterized floating point operator and
optimization engine
Function Generator
Generate any differentiable function, can be
regarded as mathematic library.

8
Contributions

Applications Developed using this framework
Greatest Common Divisor Coprocessor
Digital Sine-Cosine Generator (DSCG)
Ordinary Differential Equation Solver (ODE)
N-Body Problem Simulator
Ranged from fixed point design to floating point
one

9
Contribution Traditional Design Flow
10
Contribution Revised Design Flow
Hardware Process is transparent to designer
11
Fly Hardware Compiler
12
Introduction

Fly is easily extensible
Source code can be easily understood and modified
Support common programming constructs
Fly language supports
Register assignment
Parallel statements
If else branches
While loops
Built-in functions
Comments

13
Fly Programming Language
Main elements
14
Fly Programming Language

Compilation Technique
Uses Pages compilation technique
Each statement has associated start and end
signals
Fly constructs a one-hot state machine (i.e. the
control part of the hardware design) from the
program by cascading the signals
Fly compiler implementation simple and concise
due to use of Perl as the development language
One pass compilation
Outputs VHDL code
Can support different FPGA and ASIC design tools
Gives opportunity for synthesis tools to perform
further logic optimization

15
Application I - GCD Example

s din1 l din2
while (s ! l)
a l - s
if (a gt 0)
l a
else
s ll s swap
dout1 1

16
Resultant Datapath

s din1 l din2
else s ll s

17
Resultant Datapath

while (s ! l)

18
Resultant Datapath

if (a gt 0) l aelse

19
Host Interface (register)

s din1
dout1 l

20
Summary

Input Perl-like description of floating point
design
Output Synthesizable VHDL code for
implementation
Datapath
One-hot state machine (control signal)
host interface is introduced
The datapath is correct because of automatic
construction
Error eliminated when translating software
algorithm into datapath and control signal
Bitstream generation is transparent to user
GCD coprocessor was given as an example

21
Float Floating Point Design Environment
22
Introduction

Many applications involve floating point
operation
Graphical Transformation
Scientific Simulation
Seldom implementation of floating point
arithmetic on FPGA system
Implement the floating point arithmetic on FPGA
is possible
Larger area
Higher speed
Arbitrary size of floating point on FPGA is
possible
Allow more flexible design

23
Introduction

Float Class
Optimize the floating point algorithm during
simulation
A instant of float class represent a floating
point variable when simulate the algorithm
VHDL Floating Point Generator
Generate arbitrary sized Floating point
adder/multiplier
Integrated into fly environment

24
Float Design Environment

Float Class
Encapsulate the Floating Point data structure
Arbitrary exponent and fraction size
Implemented on Perl
Support several method on floating point
operation

25
Float Design Environment

Float Class Attribute
Sign, Exponent, Fraction,
Size of exponent, fraction
Maximum magnitude
Use to determine the minimum exponent size
required
Circuit size required for the floating point
operation

26
Float Design Environment

Float Class Method Support
add()
multiply()
setExponetSize()
setFractionSize()
setValue()
getValue()
getCircuitSize()

27
Float Design Environment

Optimization
Input accuracy, resource constraint
Output size of each floating point operator
Nelder-Mead method to minimize the cost function

28
Float Design Environment

Cost Function
Adder size
Multiplier size
Quantization Error (dB)
Cost Function

29
Float Design Environment

VHDL Floating Point Generator
Generate parameterized adder and multiplier with
arbitrary size of exponent and fraction
Fully-Pipelined Design
Latency of Multiplication 8 cycle
Latency of Addition 4 cycle
1 clock cycle throughput
Module is written in Perl as the Interface of
library
Compatible to the fly compiler through start and
end signal

30
Integration into fly compiler

CAB
Datapath for integer addition need 1 one clock
cycle to complete
CA . B
Datapath for floating point operation need more
cycle to complete, add more Flip-Flop to delay
the control signal

31
Application II - Digital Sine Cosine Generator

Let sin be the signal at time n
If

32
Application II - Digital Sine Cosine Generator

cos_theta new Float(23, 8, 0.9)
cos_theta_p1 new Float(23, 8, 1.9)
cos_theta_m1 new Float(23, 8, -0.1)
s10 new Float(23, 8, 0)
s20 new Float(23, 8, 1)
for (i 0 i lt 50 i )
s1i1 s1i cos_theta
s2i cos_theta_p1
s2i1 s1i cos_theta_m1
s2i cos_theta

33
Application III - Ordinary Differential Equation
Solver

Used modified fly compiler to solve ordinary
differential equation
Used Eulers method, h is step size
Example involves floating point addition,
subtraction and multiplication

34
Application III - Ordinary Differential Equation
Solver

h read_host(1)
t0.0y1.0dy0.0
onehalf0.5index0
while (t lt 3.0)
t1 h . onehalft2 t .- y
dy t1 . t2t t . h
y y . dyindex index 1
void write_host(y, index)

35
Summary

Float Environment is introduced
Float Class allow to determine the size of
floating point operation and maintain certain
level of accuracy
Area can be reduced through optimization
? more logic can be implemented on the FPGA
Module generation allow fly compiler supports
arbitrary-sized floating point arithmetic
Floating Point algorithm can be implemented on
FPGA with ease
Translation from floating point to fixed point is
no longer required
DSCG and ODE applications were given

36
Function Generator
37
Introduction

In software system, standard mathematical library
function is available
In hardware design, mathematic library is
required to implemented by designer
A general method which allow arbitrary
differentiable function generation is desirable
STAM approach was adopted
Integrated into fly compiler

38
STAM datapath
Symmetric Properties were removed during
implementation for simplicity
39
Implementation using VHDL

A Perl program which automates the generation of
VHDL code with STAM algorithm
The program preprocesses the VHDL design and the
STAM specification is inside the comment
BlockRAM store the table entries
The design can be used directly in the VHDL

40
VHDL Preprocessor
41
Floating Point extension

The original STAM can apply to Fixed Point
Arithmetic
Minor add-on can let the STAM handle floating
point arithmetic
Floating point arithmetic of v(-3/2) is
implemented using STAM and floating point library

42
Floating Point extension
43
Fly integration

start and end signal is attached at the entity of
power15,
A built-in function _power15() is introduced
inside fly compiler with slight modification

44
Application IVN-Body Problem Simulation

Calculate the acceleration force of each
particles by iteration
Used fly, float, and function generator in this
application

45
N-Body problem - Fly implementation

initialization, fetch xi,yi,zi
while (j lt n)
fetch xj,yj,zj from memory
xj read_host(index)
index index 1
yj read_host(index)
index index 1
zj read_host(index)
index index 2
diffx xj .- xi
diffy yj .- yi
diffz zj .- zi
x diffx . diffx
y diffy . diffy
z diffz . diffz

r1 x . y r2 z .
epsilon caculate rij rij r1 .
r2 call built-in function
power-1.5 tmp2 _power15(rij) tmpx
tmp2 . diffx tmpy tmp2 .
diffy tmpz tmp2 . diffz ax
ax . tmpx accumulate a ay ay .
tmpy az az . tmpz j j 1
46
Summary

STAM approach enhance the flexibility of fly
compiler
Arbitrary mathematical function is now support
through table lookup
Mechanism is similar to software programming
N-body problem simulation shows that a real world
problem can be solved with this framework

47
Results
48
Experiment Environment

The framework was integrated into the Pilchard
FPGA platform
Pilchard uses DIMM memory bus interface instead
of PCI bus (lower latency and higher bandwidth
than PCI)
Compilation and implementation process is
transparent to the user

49
ResultApplication I - GCD

A GCD coprocessor was implemented using the Fly
System
Implemented on Pilchard (Xilinx XCV300E-8)
Fixed point 16bit integer
Max. Frequency 126 MHz
Slices Used 135 out of 3072 slices
Computes a GCD every 1.63ms (including all
interface overheads)

50
ResultFloating Point Generator

Floating Point Operators was implemented
Implemented on Pilchard (Xilinx XCV1000E-6)
Different fraction size is measured, exponent
size is 8
Max. Frequency (Multiplier) 103MHz
Max. Frequency (Adder) 58MHz
The result used to model the area relationship

51
ResultFloating Point Generator
52
ResultDigital Sine Cosine Generator

Use Float Class in simulation to optimized the
size required for floating point operation
Use Fly compiler to produce bitstream for
implementation
Implemented on Pilchard (Xilinx XCV1000E-6)
Max. Frequency 52.38MHz
Area used 3470 out of 12288 slices

53
Result Reference Output
54
Result Quantization Error
55
Result Optimization
56
ResultOrdinary Differential Equation

Use fly compiler to generate bitstream
Floating point library was used to deal with
floating point arithmetic
Implemented on Pilchard (Xilinx XCV1000E-6)
Max Frequency 64.5MHz
Slices Used 2,349 out of 3,072 slices (for
single point precision arithmetic)
For h 1/16, need 28.7us for an execution
(including all interface overheads)

57
ResultN-Body Problem Simulation

Use fly compiler to generate bitstream
Floating point library was used to deal with
floating point arithmetic
N10
Implemented on Pilchard (Xilinx XCV1000E-6)
Max. Frequency 44.79MHz
Area 5475 out of 12288 slices

58
Summary
59
Conclusion
60
Conclusion

A framework consists of hardware compilation,
module generators, floating point arithmetic was
introduced
Allow any designer can use programming language
to implement a design
Hardware design background is no longer required

61
Conclusion

Single Description for both software and hardware
Save Time in
Software Debugging
Hardware Interfacing
Retraining and learning hardware design knowledge
Reduced Error when
Translating software design into hardware
datapath and control signal
Productivity increases

62
Conclusion

Floating Point Arithmetic
No longer need to floating point to fixed point
algorithm ? time and error reduced
A floating point library make wide range of
floating point design could be implemented on
FPGA
Parameterized floating point operation can save
resource or enhance the accuracy to suit
different design constraints

63
Conclusion

Elementary Arithmetic
Not necessary to implement mathematical library
for each design
Automatic generation save the design time
It was demonstrated that the combination of fly,
float and function generator greatly reduces the
design effort required for the development of
complex floating point application such as the
N-body problem

64
Conclusion

Future Direction
Allow different state machine generation
mechanism
Enhance the efficiency on certain implementation
Generate fully-pipelined design
Fully-utilized the datapath
Detect parallelism automatically
Speed up the design on the hardware environment
Generate arbitrary function for floating point
arithmetic

65
Publication

C.H. Ho, M.P. Leong, P.H.W. Leong, J. Becker,
M.Glesner, "Rapid Prototyping of FPGA based
Floating Point DSP Systems", in Proceedings of
IEEE International Workshop on Rapid System
Prototyping, July 2002
C.H. Ho, P.H.W. Leong, K.H. Tsoi, R. Ludewig, P.
Zipf, A.G. Ortiz, M.Glesner, "Fly - A Modifiable
Hardware Compiler", in Proceedings of
International Conference on Field Programmable
Logic and Applications, September 2002.
C.H. Ho, K.H. Tsoi, H.C. Yeung, Y.M. Lam, K.H.
Lee, P.H.W. Leong, R. Ludewig, P. Zipf, A.G.
Ortiz, M. Glesner, "Arbitrary Function
Approximation in HDLs", submitted to Proceedings
of IEEE International Conference on
Field-Programmable Technology, December 2003.