Design Techniques for Million Gate, High Speed FPGAs - PowerPoint PPT Presentation

About This Presentation
Title:

Design Techniques for Million Gate, High Speed FPGAs

Description:

Designers went to college to learn digital logic design, but most have less than ... Make hardware work in parallel. Optimize late-arriving signals. Control ... – PowerPoint PPT presentation

Number of Views:177
Avg rating:3.0/5.0
Slides: 61
Provided by: michae521
Category:

less

Transcript and Presenter's Notes

Title: Design Techniques for Million Gate, High Speed FPGAs


1
Design Techniques for Million Gate, High Speed
FPGAs
Michael A. Bohm Chief Scientist Technical Fellow
Mentor Graphics
2
Agenda
  • The Problem
  • State-of-the-Art Technology
  • Design Issues
  • Performance Oriented Design

3
The Problem
How do we move mainstream designs from ASICs to
high performance FPGAs ??
4
State-of-the-Art 2000
  • Technology
  • Gate Count
  • Frequency
  • Clock Domains
  • Computer Hardware
  • Design Software
  • RTL Language
  • Design

5
Those who can not remember thepast are
condemned to repeat it.
From The Life of Reason, by George Santayana,
1906
Technology is changing rapidly. It took 21 years
to get to a 1Ghz processor. It will take 1 year
to get to a 2Ghz processor.
6
State-of-the-Art Technology
Process Geometries
7
State-of-the-Art Gate Count
Gate Count (excluding memory)
8
State-of-the-Art Frequency
System Frequency
9
State-of-the-Art Clock Domains
10
State-of-the-Art Computer Design Hardware
RAM Virtual Swap
EP20K160E XCV300 128MB 256MB
EP20K400E XCV600 256MB 400MB
EP20K600E XCV1000 512MB 800MB
EP20K1000E XCV2000 1GB 1GB
EP20K1500E XCV3200 1.5GB 2GB
11
State-of-the-Art RTL Language
System
Algorithm
RTL
Logic
Gate
  • Abstract Data Types
  • Design reusability
  • Compiled concepts
  • Design Management
  • Structure replication
  • Abstract Data Types
  • Design reusability
  • Compiled concepts
  • Design Management
  • Structure replication
  • Fixed Data Types
  • Easier to learn
  • Interpreted concepts
  • Gate Level Sign-off

12
State-of-the-Art Design
Text
  • Co-simulation within HDL simulator
  • Mix of HDL user defined C/C
  • Behavioral Synthesis
  • Tight physical correlation.

Flow Chart
13
State-of-the-Art Failures
Failures
Logical 55
Slow Path 13
Clocking 10
Power 6
Race Condition 4
Yield 4
Misc 3
IR drops 2
Mixed signal interface 1
FPGAs make a failure recoverable.
14
State-of-the-Art FPGA
  • 2001
  • 10 Million
  • 2000
  • 3 Million
  • APEX and Virtex at 3 Million Gates
  • Maximum Operating Frequency is 200Mhz (pushing
    300Mhz)
  • Large blocks of memory
  • Imbedded Processors (PowerPC, ARM, Mips)
  • Copper interconnect
  • 1999
  • 2 Million
  • 1 Million
  • 500K
  • 1998
  • 100K
  • 1997

15
The Development Gap
Design size
Design Size
Design Gap
Ability to Fabricate
Verification Gap
Ability to Design
Ability to Verify
16
System / SOC Design Methodology
Embedded Software Development
Hardware / Software Coverification
Pre-existing Hardware
Hardware Development
Pre-existing Software
Manufacturing
17
Adjusting to a New Methodology
  • Team Design
  • IP Logic
  • More software content
  • Heavy with memory
  • Less synthesis / more chip level assembly

02 - SOC 10M gates
99 - SOC 1M gates
Memory
BlockA
BlockB
97 - ASIC 50-150K gates
CPU
IP
Block1
18
Effects of the Design Flow
VHDL,Verilog C,Java
VHDL,Verilog C
201
VHD,Verilog EDIF
101
51
Higher Abstraction provides more design choices !!
31
21
19
ASIC versus FPGA design
M per re-spin!!
FPGA Design
Fab Chip
Physical Design
FPGA Synthesis
Logic Verif.
Logic Design
System Verification with fewer iterations
RTL Prototype
Software Dev. and Debug
20
A Designers Life
15
Design Specification
Beh / RTL Description
8
Functional Verification
15
7
Synthesis
Place Route
15
20
Timing Validation
20
System Verification
21
How to make a better designer
  • Provide proper training
  • Designers went to college to learn digital logic
    design, but most have less than 10 hours RTL
    training.
  • Provide a proven Design Methodology
  • Enforce Design for Quality techniques
  • Quality circuits are always easier to
    manufacture and are the most profitable.
  • Functionality is only a minor part of the design
    process. Using Performance Orient Design
    techniques are the key to a successful product
    development

22
Performance Oriented Design Techniques
The Keys to Success
  • RTL Coding Styles
  • Design Architecture trade-offs
  • Design Structure
  • Timing Optimization
  • Physical Optimization

23
Coding style impact
  • Coding style does impact performance
  • It affect FPGAs more than ASICs
  • Different level of RTL
  • Different descriptions give different results
  • Tools are also part of the equation
  • Different tools give different results
  • Learn to know your tool !!!

24
The Keys to Language Synthesis
  • Data Types
  • Packages
  • Ports
  • Hierarchy
  • Combinational Logic
  • Relational Operators
  • Arithmetic Operators
  • Sequential Logic
  • Memory
  • IOs

25
Structuring A Design
  • A design should read like a book.
  • Table of contents An explanation of the design
    structure.
  • Logical flow from beginning to end.
  • Chapters Logical breaks in a design.
  • Commentary Comments on complex structure in the
    design.

99 of all designs are unintelligible to another
designer !
26
Source Code Control
Revision Comparison
The main difference between hardware and software
is the control!
27
Hierarchy
Partitioning between logical and virtual
hierarchy is key!
28
Understand what the RTL does!!
Everytime you use and if-then-else, a 21 mux
is built.
29
Serial / Priority Structure
The 1st branch of the if is the critical
signal. On some FPGAs, this structure is faster
than a case statement.
30
Parallel Structure
All logic branches are Equal.
31
Tri-State
Internal tri-state buses are slow on most
FPGAs. Tri-states belong on the top level of the
design.
32
Bi-directional Buffer
Bi-directional bus causes timing loops. False
paths need to be marked.
33
Relational Operators
Large relational operators (gt 4-bits) are built
out of high speed carry chains on the FPGA.
34
Addition Operators
  • Adders are the 1 used operator in a design.
  • Use constants wisely
  • A2 1 with cin
  • A-2 -1 with cin
  • A8 (A(high downto 3) 1) A(2 downto 0)

35
Resource Sharing (when it really hurts)
if (B gt C) then sig lt A B else sig lt A
C end if
Resource Sharing OFF Total LUTs 64 Clock Freq
133.3 MHz (52 !!!)
Resource Sharing ON Total LUTs 32 Clock Freq
87.7 MHz
36
Multiplication Operator
  • Most expensive operator
  • Slowest operator, unless built into the FPGA.
  • When multiplying by a constant, use a CSD
    multiplier.
  • Use constants wisely
  • A2 A sra 1
  • A3 (A sra 1) A

37
Pipelined Multipliers
  • Improve timing by introducing parallelism
  • Registers, introduced by pipelining may have
    modest area impact
  • Requirements
  • Certain constructs in the input RTL source code
    description
  • Output of the multiplier must be registered.
  • Optimal pipeline stages log2(input data bus
    width)
  • A 16 bit databus gt optimal pipeline value of 4
  • 32 bit bus gt optimal pipeline value of 5.

38
A little Algebra goes a long ways
Original Code Modified Code AREA Reduction
A-B0 AB 80
A9 (A SHL 3) A 40
A lt 0 A(Ahigh) 90
A1 when en 1 else A A en 60
A when A gt 0 else -A not A 1 when A(31) else a 30
A 2 A SHL 1 100
  • Minimize all arithmetic equation to eliminate
    operators.
  • Frequency increased dramatically.

39
D Flip-flop
Most FPGAs only have an Async Set or Reset DFF.
This will be translated to sync set and async
reset for FPGAs.
40
Complex Clock Enables
  • Higher Frequency
  • Denser Logic

Clock enables with only be found with 4-6 levels
of logic. Use clock-enables instead of a gated
clock.
41
Latches
A latch is a 2 to 1 mux with the output fed back
to an input. This can put combinational loops in
your circuit depending on the FPGA Vendor.
42
Counter
Counters should either be built as a macro or
make sure the synthesis tool had counter
recognition.
43
State Machine
  • Tools have made progress with FSM compilers
  • Reachability analysis, highly optimal results
  • Extended encoding techniques
  • Without FSM one hot is often the best choice
  • Deflates the next state decoding logic cloud
  • FSM compiler without Safe State
  • Implements the functionality, however the state
    machine may not be totally bullet proof
  • The Safe option
  • default switch in the case may be ignored
  • Recovery logic is implemented to go back to the
    reset state
  • The Exact implementation
  • You want a better match with simulation
  • Performance is not an obstacle
  • Your design works in a harsh environment

44
State Machine
45
Read Only Memory (ROM)
  • Roms provide a method for setting dont cares
  • Different algorithms are used on ROM logic.
  • A rom is just a ram with initial programming.
  • Indexing into a constant array is very efficient
    for simulation and synthesis

46
Single Port Rams
47
Dual Port Rams
48
Content Addressable Memory (CAM)
  • Use a CAM when address translation is needed.
  • Use CAMs for sparsely used addresses.
  • CAMs replace large priority encoders.
  • 60 area reduction
  • 80 timing reduction

49
Checklist for performance
  • Pipeline for high performance
  • Make hardware work in parallel
  • Optimize late-arriving signals
  • Control arithmetic circuits
  • Use IP and hard-macros

50
Parallel Gates
Parallel Gates are removed during the
pre-optimize stage !!
51
Attributes
  • Attributes can be passed thru HDL code
  • Homogeneous syntax in VHDL for attributes
  • No syntax checks, just passed through !
  • Synthesis attributes helpful for...
  • Improved usability
  • Name preservation
  • Replication
  • Resource sharing
  • Speed / area control
  • FSM encoding
  • Attributes enable...
  • Mapping control
  • DLLs setup
  • IOB flop control
  • Ram initialization
  • Soft macros for speed

52
Physical Optimization
  • Floor Plan your FPGA.
  • Produces a faster circuit
  • Circuit is more predictable and repeatable.
  • Timing convergence occurs quickly.
  • Back Annotate real timing data.
  • Allows 2nd pass of synthesis works on real
    critical paths.

53
FPGA High-Level Floorplanner
  • Tight links to Exemplars synthesis tool.
  • Position blocks into regions of device
  • Generates area constraints
  • Required for new Incremental design flow
  • Useful for Design Planning

54
TimeCloser Flow
Optimization Allocation Clock
resources Allocation of some routing resources
(low skew) Timing
Optimization Critical path optimization
Logic and register replication Clustering
of critical path objects Allocation of routing
resources for hi-fan out nets
Manual Floor Planning
Place Route
Incremental PR
Back Annotation of PR delays
Critical Path optimization
(based upon real delay values)
55
Incremental Optimization using Incremental Files
Leonardo Spectrum
PR Software
EDIF Netlist constraints
Synthesize 1st pass Critical Path Optimization
Perform Initial Place
and Route
Incremental data
Save Design in
Incremental files
XDB format
Critical Path Timing Optimization
Delay File
Perform Timing
Analysis
Restore original Netlist
Top-Level EDIF Netlist
ECO or Incremental Flow
Perform incremental
Reoptimize only changed sub block
place and route with
guide files
Unique incremental flow to Leonardo Spectrum
56
Constraint Based Clustering
  1. Uses place and route timing data to improve
    device performance
  2. Reduces levels of logic on true critical paths
  3. Reduces route delay effects by using a timing
    driven clustering algorithm

57
Logic Replication
  • Reduces route delay effects using logic
    replication and route optimization
  • Useful to duplicate flip-flops and control fanout
  • However you cannot prevent automatic replication
    from the tools
  • Helps to manually control the fanout
  • Keep the name of the nets in the netlist
  • Very useful for simulation

58
Critical Path Restructuring
  1. Uses place and route timing data to improve
    device performance
  2. Reduces levels of logic on true critical paths
  3. Moves late arriving signals up it logic tree

59
User Applied Physical Constraints
  • Preserve signals
  • Assign nets to secondary routing resources
  • Specify fanout on net by net basis

60
Design Techniques for Million Gate, High Speed
FPGAs
Michael A. Bohm Chief Scientist Technical Fellow
Mentor Graphics
Write a Comment
User Comments (0)
About PowerShow.com