Automatic Synthesis and Optimization of Floating Point Hardware - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Automatic Synthesis and Optimization of Floating Point Hardware

Description:

Hardware Description Language (HDL) based design has shortcomings ... Nelder-Mead method to minimize the cost function. 28. Float Design Environment. Cost Function ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 67
Provided by: stef267
Category:

less

Transcript and Presenter's Notes

Title: Automatic Synthesis and Optimization of Floating Point Hardware


1
Automatic Synthesis and Optimization of Floating
Point Hardware
  • Ho Chun Hok
  • Department of Computer Science and Engineering
  • The Chinese University of Hong Kong

18JUL2003
2
Overview
  • Introduction
  • Fly Modifiable Compiler
  • Float Floating Point Library
  • Function Generator
  • Results
  • Conclusion

3
Introduction
  • Hardware Description Language (HDL) based design
    has shortcomings
  • Hardware designs are parallel and people think in
    von-Neumann patterns
  • Complex to decompose a hardware design into
    datapath and control signal
  • Errors must introduced during the translation
  • Debugging on the hardware is harder then on the
    software
  • Hardware Interface for FPGA board must be
    developed
  • A designer must have strong background on the
    hardware design

4
Introduction
  • Elementary Functions are not supported
  • No floating point arithmetic
  • No standard mathematical library like ltmath.hgt
  • No log, sin, cos, 1/x,
  • The size of FPGA is limited.
  • Area is an essential factor of a design

5
Motivations
  • Is it possible to use single description on both
    software and hardware design?
  • Can we optimize the floating point arithmetic on
    hardware to save the resource?
  • On hardware design, can we introduce mathematic
    library just like software do?

6
Objectives
  • Main goal ? Use the smallest effort to develop
    hardware on FPGA
  • No need to familiar with hardware knowledge
  • The compilation from description to hardware is
    transparent to the designer
  • Floating point arithmetic supported
  • Elementary mathematic library provided, like
    software programming

7
Contributions
  • A framework with 3 modules is developed
  • Fly Modifiable Hardware Complier
  • Translate description into datapath
  • Float Floating Point Arithmetic Library
  • Provide parameterized floating point operator and
    optimization engine
  • Function Generator
  • Generate any differentiable function, can be
    regarded as mathematic library.

8
Contributions
  • Applications Developed using this framework
  • Greatest Common Divisor Coprocessor
  • Digital Sine-Cosine Generator (DSCG)
  • Ordinary Differential Equation Solver (ODE)
  • N-Body Problem Simulator
  • Ranged from fixed point design to floating point
    one

9
Contribution Traditional Design Flow
10
Contribution Revised Design Flow
Hardware Process is transparent to designer
11
Fly Hardware Compiler
12
Introduction
  • Fly is easily extensible
  • Source code can be easily understood and modified
  • Support common programming constructs
  • Fly language supports
  • Register assignment
  • Parallel statements
  • If else branches
  • While loops
  • Built-in functions
  • Comments

13
Fly Programming Language
Main elements
14
Fly Programming Language
  • Compilation Technique
  • Uses Pages compilation technique
  • Each statement has associated start and end
    signals
  • Fly constructs a one-hot state machine (i.e. the
    control part of the hardware design) from the
    program by cascading the signals
  • Fly compiler implementation simple and concise
    due to use of Perl as the development language
  • One pass compilation
  • Outputs VHDL code
  • Can support different FPGA and ASIC design tools
  • Gives opportunity for synthesis tools to perform
    further logic optimization

15
Application I - GCD Example
  • s din1 l din2
  • while (s ! l)
  • a l - s
  • if (a gt 0)
  • l a
  • else
  • s ll s swap
  • dout1 1

16
Resultant Datapath
  • s din1 l din2
  • else s ll s

17
Resultant Datapath
  • while (s ! l)

18
Resultant Datapath
  • if (a gt 0) l aelse

19
Host Interface (register)
  • s din1
  • dout1 l

20
Summary
  • Input Perl-like description of floating point
    design
  • Output Synthesizable VHDL code for
    implementation
  • Datapath
  • One-hot state machine (control signal)
  • host interface is introduced
  • The datapath is correct because of automatic
    construction
  • Error eliminated when translating software
    algorithm into datapath and control signal
  • Bitstream generation is transparent to user
  • GCD coprocessor was given as an example

21
Float Floating Point Design Environment
22
Introduction
  • Many applications involve floating point
    operation
  • Graphical Transformation
  • Scientific Simulation
  • Seldom implementation of floating point
    arithmetic on FPGA system
  • Implement the floating point arithmetic on FPGA
    is possible
  • Larger area
  • Higher speed
  • Arbitrary size of floating point on FPGA is
    possible
  • Allow more flexible design

23
Introduction
  • Float Class
  • Optimize the floating point algorithm during
    simulation
  • A instant of float class represent a floating
    point variable when simulate the algorithm
  • VHDL Floating Point Generator
  • Generate arbitrary sized Floating point
    adder/multiplier
  • Integrated into fly environment

24
Float Design Environment
  • Float Class
  • Encapsulate the Floating Point data structure
  • Arbitrary exponent and fraction size
  • Implemented on Perl
  • Support several method on floating point
    operation

25
Float Design Environment
  • Float Class Attribute
  • Sign, Exponent, Fraction,
  • Size of exponent, fraction
  • Maximum magnitude
  • Use to determine the minimum exponent size
    required
  • Circuit size required for the floating point
    operation

26
Float Design Environment
  • Float Class Method Support
  • add()
  • multiply()
  • setExponetSize()
  • setFractionSize()
  • setValue()
  • getValue()
  • getCircuitSize()

27
Float Design Environment
  • Optimization
  • Input accuracy, resource constraint
  • Output size of each floating point operator
  • Nelder-Mead method to minimize the cost function

28
Float Design Environment
  • Cost Function
  • Adder size
  • Multiplier size
  • Quantization Error (dB)
  • Cost Function

29
Float Design Environment
  • VHDL Floating Point Generator
  • Generate parameterized adder and multiplier with
    arbitrary size of exponent and fraction
  • Fully-Pipelined Design
  • Latency of Multiplication 8 cycle
  • Latency of Addition 4 cycle
  • 1 clock cycle throughput
  • Module is written in Perl as the Interface of
    library
  • Compatible to the fly compiler through start and
    end signal

30
Integration into fly compiler
  • CAB
  • Datapath for integer addition need 1 one clock
    cycle to complete
  • CA . B
  • Datapath for floating point operation need more
    cycle to complete, add more Flip-Flop to delay
    the control signal

31
Application II - Digital Sine Cosine Generator
  • Let sin be the signal at time n
  • If

32
Application II - Digital Sine Cosine Generator
  • cos_theta new Float(23, 8, 0.9)
  • cos_theta_p1 new Float(23, 8, 1.9)
  • cos_theta_m1 new Float(23, 8, -0.1)
  • s10 new Float(23, 8, 0)
  • s20 new Float(23, 8, 1)
  • for (i 0 i lt 50 i )
  • s1i1 s1i cos_theta
  • s2i cos_theta_p1
  • s2i1 s1i cos_theta_m1
  • s2i cos_theta

33
Application III - Ordinary Differential Equation
Solver
  • Used modified fly compiler to solve ordinary
    differential equation
  • Used Eulers method, h is step size
  • Example involves floating point addition,
    subtraction and multiplication

34
Application III - Ordinary Differential Equation
Solver
  • h read_host(1)
  • t0.0y1.0dy0.0
  • onehalf0.5index0
  • while (t lt 3.0)
  • t1 h . onehalft2 t .- y
  • dy t1 . t2t t . h
  • y y . dyindex index 1
  • void write_host(y, index)

35
Summary
  • Float Environment is introduced
  • Float Class allow to determine the size of
    floating point operation and maintain certain
    level of accuracy
  • Area can be reduced through optimization
  • ? more logic can be implemented on the FPGA
  • Module generation allow fly compiler supports
    arbitrary-sized floating point arithmetic
  • Floating Point algorithm can be implemented on
    FPGA with ease
  • Translation from floating point to fixed point is
    no longer required
  • DSCG and ODE applications were given

36
Function Generator
37
Introduction
  • In software system, standard mathematical library
    function is available
  • In hardware design, mathematic library is
    required to implemented by designer
  • A general method which allow arbitrary
    differentiable function generation is desirable
  • STAM approach was adopted
  • Integrated into fly compiler

38
STAM datapath
Symmetric Properties were removed during
implementation for simplicity
39
Implementation using VHDL
  • A Perl program which automates the generation of
    VHDL code with STAM algorithm
  • The program preprocesses the VHDL design and the
    STAM specification is inside the comment
  • BlockRAM store the table entries
  • The design can be used directly in the VHDL

40
VHDL Preprocessor
41
Floating Point extension
  • The original STAM can apply to Fixed Point
    Arithmetic
  • Minor add-on can let the STAM handle floating
    point arithmetic
  • Floating point arithmetic of v(-3/2) is
    implemented using STAM and floating point library

42
Floating Point extension
43
Fly integration
  • start and end signal is attached at the entity of
    power15,
  • A built-in function _power15() is introduced
    inside fly compiler with slight modification

44
Application IVN-Body Problem Simulation
  • Calculate the acceleration force of each
    particles by iteration
  • Used fly, float, and function generator in this
    application

45
N-Body problem - Fly implementation
  • initialization, fetch xi,yi,zi
  • while (j lt n)
  • fetch xj,yj,zj from memory
  • xj read_host(index)
  • index index 1
  • yj read_host(index)
  • index index 1
  • zj read_host(index)
  • index index 2
  • diffx xj .- xi
  • diffy yj .- yi
  • diffz zj .- zi
  • x diffx . diffx
  • y diffy . diffy
  • z diffz . diffz

r1 x . y r2 z .
epsilon caculate rij rij r1 .
r2 call built-in function
power-1.5 tmp2 _power15(rij) tmpx
tmp2 . diffx tmpy tmp2 .
diffy tmpz tmp2 . diffz ax
ax . tmpx accumulate a ay ay .
tmpy az az . tmpz j j 1
46
Summary
  • STAM approach enhance the flexibility of fly
    compiler
  • Arbitrary mathematical function is now support
    through table lookup
  • Mechanism is similar to software programming
  • N-body problem simulation shows that a real world
    problem can be solved with this framework

47
Results
48
Experiment Environment
  • The framework was integrated into the Pilchard
    FPGA platform
  • Pilchard uses DIMM memory bus interface instead
    of PCI bus (lower latency and higher bandwidth
    than PCI)
  • Compilation and implementation process is
    transparent to the user

49
ResultApplication I - GCD
  • A GCD coprocessor was implemented using the Fly
    System
  • Implemented on Pilchard (Xilinx XCV300E-8)
  • Fixed point 16bit integer
  • Max. Frequency 126 MHz
  • Slices Used 135 out of 3072 slices
  • Computes a GCD every 1.63ms (including all
    interface overheads)

50
ResultFloating Point Generator
  • Floating Point Operators was implemented
  • Implemented on Pilchard (Xilinx XCV1000E-6)
  • Different fraction size is measured, exponent
    size is 8
  • Max. Frequency (Multiplier) 103MHz
  • Max. Frequency (Adder) 58MHz
  • The result used to model the area relationship

51
ResultFloating Point Generator
52
ResultDigital Sine Cosine Generator
  • Use Float Class in simulation to optimized the
    size required for floating point operation
  • Use Fly compiler to produce bitstream for
    implementation
  • Implemented on Pilchard (Xilinx XCV1000E-6)
  • Max. Frequency 52.38MHz
  • Area used 3470 out of 12288 slices

53
Result Reference Output
54
Result Quantization Error
55
Result Optimization
56
ResultOrdinary Differential Equation
  • Use fly compiler to generate bitstream
  • Floating point library was used to deal with
    floating point arithmetic
  • Implemented on Pilchard (Xilinx XCV1000E-6)
  • Max Frequency 64.5MHz
  • Slices Used 2,349 out of 3,072 slices (for
    single point precision arithmetic)
  • For h 1/16, need 28.7us for an execution
    (including all interface overheads)

57
ResultN-Body Problem Simulation
  • Use fly compiler to generate bitstream
  • Floating point library was used to deal with
    floating point arithmetic
  • N10
  • Implemented on Pilchard (Xilinx XCV1000E-6)
  • Max. Frequency 44.79MHz
  • Area 5475 out of 12288 slices

58
Summary
59
Conclusion
60
Conclusion
  • A framework consists of hardware compilation,
    module generators, floating point arithmetic was
    introduced
  • Allow any designer can use programming language
    to implement a design
  • Hardware design background is no longer required

61
Conclusion
  • Single Description for both software and hardware
  • Save Time in
  • Software Debugging
  • Hardware Interfacing
  • Retraining and learning hardware design knowledge
  • Reduced Error when
  • Translating software design into hardware
    datapath and control signal
  • Productivity increases

62
Conclusion
  • Floating Point Arithmetic
  • No longer need to floating point to fixed point
    algorithm ? time and error reduced
  • A floating point library make wide range of
    floating point design could be implemented on
    FPGA
  • Parameterized floating point operation can save
    resource or enhance the accuracy to suit
    different design constraints

63
Conclusion
  • Elementary Arithmetic
  • Not necessary to implement mathematical library
    for each design
  • Automatic generation save the design time
  • It was demonstrated that the combination of fly,
    float and function generator greatly reduces the
    design effort required for the development of
    complex floating point application such as the
    N-body problem

64
Conclusion
  • Future Direction
  • Allow different state machine generation
    mechanism
  • Enhance the efficiency on certain implementation
  • Generate fully-pipelined design
  • Fully-utilized the datapath
  • Detect parallelism automatically
  • Speed up the design on the hardware environment
  • Generate arbitrary function for floating point
    arithmetic

65
Publication
  • C.H. Ho, M.P. Leong, P.H.W. Leong, J. Becker,
    M.Glesner, "Rapid Prototyping of FPGA based
    Floating Point DSP Systems", in Proceedings of
    IEEE International Workshop on Rapid System
    Prototyping, July 2002
  • C.H. Ho, P.H.W. Leong, K.H. Tsoi, R. Ludewig, P.
    Zipf, A.G. Ortiz, M.Glesner, "Fly - A Modifiable
    Hardware Compiler", in Proceedings of
    International Conference on Field Programmable
    Logic and Applications, September 2002.
  • C.H. Ho, K.H. Tsoi, H.C. Yeung, Y.M. Lam, K.H.
    Lee, P.H.W. Leong, R. Ludewig, P. Zipf, A.G.
    Ortiz, M. Glesner, "Arbitrary Function
    Approximation in HDLs", submitted to Proceedings
    of IEEE International Conference on
    Field-Programmable Technology, December 2003.

66
Thank You
  • Q and A
Write a Comment
User Comments (0)
About PowerShow.com