Field Programmable Gate Arrays - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Field Programmable Gate Arrays

Description:

'Field' as in field operations -- programmable in the field, as opposed ... oops region. cost. Raw Speed and Interrupt Latency. cost. complexity. cost / volume ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 47
Provided by: Andrew745
Category:

less

Transcript and Presenter's Notes

Title: Field Programmable Gate Arrays


1
Field Programmable Gate Arrays
  • MAS863
  • How To Make (almost) Anything
  • Andrew bunnie HuangE. Rehmi Post

2
Agenda
  • Lecture
  • Motivation and Application
  • Theory and Architecture
  • System Integration
  • Design Demo
  • How to use the tools and features
  • In-class Project
  • VGA display of moving ball

3
Introduction
  • Field Programmable Gate Array
  • Field as in field operations -- programmable in
    the field, as opposed to in the factory
  • Gate array
  • array of logic gates and storage elements
  • When and why would you use such a device?

4
Motivation
  • Computational Scenarios

response time (latency)
PIC
PCs, Workstations
cost
embedded processors
Raw Speed and Interrupt Latency
FPGAs
simple gates
cost / volume
oops region
complexity
cost
ASIC (full-custom IC)
complexity
5
Motivation
  • FPGAs span the middle ground
  • Fast design cycles
  • IP cores
  • reconfigurability
  • late binding decisions--hardware is no longer
    cast in concrete
  • High Performance
  • excellent in latency limited situations (network
    routers, real-time systems, timing generators),
    i.e. situations where lots of time resolution is
    required with a good degree of complexity
  • Can be cost effective
  • Especially in low-volume scenarios vs. ASIC

6
Applications
  • Fast-turn, low volume ASIC (uninteresting)
  • Reconfigurable Hardware Processors
  • One-connector I/O solutions
  • Rapid prototyping

7
Applications RHP
  • Direct implementation of algorithms in hardware
  • circumvents instruction fetch, decode, issue
    overhead
  • unrestricted parallelism
  • disadvantage little hardware abstraction,
    difficult to use
  • RISC framework with reconfigurable instruction
    set
  • user-defined instructions depending on process
    context
  • prevents the MMX disease
  • easier to use, more hardware abstraction, but
    lower performance

8
Applications RHP
  • Optimal ISA
  • compiler analyzes code and chooses an ISA optimal
    for the problem, and bundles the hardware
    description for the ISA with the code object
  • Configurable memory management and caching
  • useful for implementing special OS features
  • VM paging schemes directly in hardware
  • Ultimate RHP-one processor, any ISA
  • In the future - possibly adaptive processors
    which automatically optimize their architecture
    per application

9
Applications Direct Hardware
  • Ideal for implementing simple, repetitive
    operations (overhead operations)
  • time synchronization on Novell networks
  • CAM lookup tables for IP routing and neural nets
  • encryption/decryption
  • FEA (finite element analysis)
  • Relaxation networks
  • database searching
  • higher peformance with special architectures
    (embedded RAM)

10
Applications I/O solutions
  • One-connector I/O solutions
  • use a single connector with any protocol desired
  • ex a DB-25 which can do SCSI, IEEE1284 parallel,
    serial
  • ideal for space-limited applications
  • Object oriented hardware
  • devise a system such that a device plugged into
    the I/O port uploads the hardware configuration
    necessary to implement the communications
    protocol
  • protocol upgrades are a cinch
  • limited by electrical signalling compatibility
    issues
  • drawback - can be confusing to users, potentially
    damaging to hardware

11
Applications EA
  • Evolutionary algorithms
  • some research done on FPGAs already
  • tone recognition application
  • possibly requires intimate knowledge of FPGA
    hardware
  • vendor licencsing issues
  • EA apps do not map well into current FPGA
    architectures
  • however, with the right FPGA EA could yield very
    interesting results

12
Applications Rapid Prototyping
  • FPGAs are a handy thing to have on the lab bench
  • simple digital circuits no longer require wiring
    or parts ordering
  • modification and duplication of existing designs
    is relatively straightforward
  • with the right design tools, hardware design
    re-use is an additional benefit

13
General Architecture
remember
compute
compute
connect
CONFIGURE
connect
connect
connect
compute
remember
remember
connect
Terminology Granularity, Configuration, and
Routing
14
Architecture Varieties
  • Primary classifications for FPGAs
  • configuration method
  • granularity
  • routing architecture
  • Other practical considerations
  • density
  • speed
  • cost
  • design tools
  • vertical migration

15
Architecture Varieties
  • EPAC
  • Electrically Programmable Analog Circuit
  • Contains programmable gain amplifiers,
    comparators, multiplexers, DACs, track-and-hold,
    filtering components
  • Made by iMP

16
Architecture Configuration
  • Configuration method
  • In-circuit programmable methods
  • SRAM based (Xilinx 2K/3K/4K, Altera 8K/10K,
    Lucent Orca)
  • volatile, but fast configuration times
  • must reprogram on every power-up
  • some architectures offer partial reconfiguration
    (Atmel)
  • most expensive in terms of area and timing costs
  • standard CMOS process
  • EEPROM based (Altera 7K/9K)
  • nonvolatile, slow config sometimes requires
    extra voltages for programming and erasing
  • special silicon processing required

17
Architecture Configuration
  • Configuration method (contd)
  • Pre-assembly programmable methods
  • Antifuse based (Actel, Quicklink FPGAs)
  • nonvolatile, very fast links
  • permanent configuration (OTP)
  • smallest link size (lower cost)
  • special silicon processing technology required
  • (E)EPROM based (Altera 5K, 7K, Xilinx 7200, 7300)
  • nonvolatile, moderate performance
  • reprogrammable after special erase cycle
  • medium-sized link
  • special silicon processing technology required

18
Architecture Granularity
  • Granularity
  • Defined as ratio of logic per cell versus routing
  • Very fine-grained architectures
  • Partial set of n-input boolean functions per cell
  • Roughly 6-1 ratio of logic inputs to registers
    per cell
  • Atmel, Actel
  • Fine-grained architectures
  • Full set of n-input boolean functions per cell
  • Sometimes multiple n-input boolean functions per
    cell
  • Roughly 8-1 ratio of logic inputs to registers
    per cell
  • Well-suited for state machines, simple
    arithmetic, pipelined applications
  • Xilinx 3K/4K, Altera 8K/10K

19
Architecture Granularity
  • Granularity (contd)
  • Coarse-grained architectures
  • PLD-style product term arrays
  • Roughly 32-1 ratio of logic inputs to registers
    per cell
  • Well-suited for address decoding, complicated
    arithmetic operations, datapath operators,
    complex state machines
  • Poorly suited for pipelined applications and
    simple operations
  • Altera 5K/7K, Xilinx 7K
  • Dual-grained architectures or heirarchical
    architectures
  • Combines coarse and fine-grained features
  • Often exhibit separate local and global routing
    resources
  • Lucent Orca, Altera 9K

20
Architecture Routing
  • Routing method
  • Fine-grained
  • Short hops (1 to 8 logic cells spanned per track)
  • Path-dependent timing
  • Exhibits high density
  • Flexible switch matrices
  • Less logic placement constraints
  • Coarse-grained
  • Tracks span entire chip
  • Fixed timing regardless of logic placement
  • Lower density
  • Logic placement constrained by routing
    availability

21
Architecture Routing
  • Routing method (contd)
  • Heirarchical routing
  • Local, fine-grained routing between cells
  • Global, coarse-grained routing between groups of
    cells
  • Usually path-dependant timing
  • Best of both worlds, but can be difficult to
    utilize efficiently

22
Architecture Other Practical
  • Density, speed and vertical migration
  • Altera FLEX 8K is targetted at density-driven
    apps
  • Altera MAX 7K is targetted at performance-driven
    apps
  • Xilinx 4K series targets both speed and
    performance, with good vertical migration from 3K
    gates to 250K gates (Altera 10K is Xilinx 4K
    competitor)
  • Xilinx 6200 series targets reconfigurable
    hardware applications

23
Architecture Design Tools
  • Design tools - the other half of the equation
  • FPGA is useless without good design tools
  • Design tools slowly progressing to acceptable
    levels
  • Entry methods include HDL, schematic
  • Compilers are improving! Xilinxs most recent
    compiler can place and route reasonably tough
    designs in about fifteen minutes very tough
    designs will finish in a half hour or not at all.
  • Xilinx Foundation Series / M1 technology
  • Altera MAXPLUS

24
Architecture Cost
  • Cost formulas for FPGAs are complex
  • OTP FPGAs tend to be cheaper
  • Established lines are cheaper than new lines
  • Cost increases exponentially with performance and
    density
  • Some lines are targetted at cost-sensitive
    applications (Altera 7K)
  • Not all speed grade-density combos available from
    manufacturers

25
Detailed Architecture Xilinx 4000E
  • Fine-grained logic, SRAM based, with fine-grained
    routing
  • Array of CLBs embedded in single length / double
    length / quad length / longline routing resources
    PSM
  • CLB Configurable Logic Block
  • Two 4-input LUTs (LookUp Tables) and one 3-input
    LUT
  • Two SR D-type flip flops
  • Bypass paths and carry/cascade logic
  • PSM Programmable Switch Matrix
  • 10 interconnect points per matrix
  • Each interconnect contains six pass transistors
    for full connectivity between four directions
  • Located at intersections of single and double
    length lines

26
Detailed Architecture Xilinx 4000E
27
Detailed Architecture Xilinx 4000E
28
Detailed Architecture Xilinx 4000E
29
Detailed Architecture Xilinx 4000E
30
Detailed Architecture Xilinx 4000E
31
Detailed Architecture Xilinx 4000E
  • Configuration
  • total (device) reconfiguration (no partial
    reconfig)
  • several configuration modes available
  • parallel and serial modes
  • master and slave modes
  • daisy chain ability
  • device bitstreams between 50Kbits and 400Kbits
  • config rate around 10 Mbit/sec
  • max reconfig rate in a few tens of milliseconds
  • typical reconfig in a couple of seconds

32
Detailed Architecture Xilinx 4000E
  • Other features
  • distributed RAM
  • CLB LUTs can function as a 32x1, 16x1, or 16x2
    RAM
  • synchronous RAM options available
  • internal tri-state buffers
  • global routing resources
  • JTAG boundary scan
  • configuration readback
  • programmable slew rate and logic levels in IOBs
  • common per-package pinout for all devices
  • allows for easy vertical migration

33
System Integration
  • FPGAs offer flexible I/O solutions
  • laying out a board around an FPGA is very nice
  • newest FPGAs, sp. Virtex, has multi-standard I/O
    support
  • Requires a source of configuration data
  • Host computer, parallel or serial
  • Serial ROM
  • fewest wires--CCLK,DIN,INIT,PROG, sometimes DOUT
  • FLASH ROM controlled by dedicated config
    circuitry
  • Combination of both

34
Serial Programming
Slave and Master modes
35
Programming From a ROM
36
Thats Nice. How Do I Use It?!
  • Present basic design flow
  • Work through a demo implementation

37
Design Tools Process
Libraries
IP Cores
Design description (HDL, schematic)
Technology mapping
Place
Route
Errors
Timing Analysis
Bitstream
FPGA
38
Design Tools Design Entry
  • HDL
  • Verilog, VHDL or proprietary language (AHDL,
    etc.)
  • verilog is like C with multithreading and strict
    typing
  • VHDL stands for VHSIC HDL intended for detailed
    simulations commissioned by the military very
    complex
  • Ideal for large designs because of well-defined
    scoping and instantiation rules top down design
  • Also ideal for state machines, decoders/encoders,
    and odd or awkward busses
  • Hardware mapping is difficult
  • very easy to make inefficient designs subtle
    semantics choices can lead to drastic perfomance
    variations
  • hard to specify hardware-specific features such
    as carry chains
  • hard to specify placement and routing info

39
Design Tools Design Entry
  • Schematic entry
  • more intuitive, easier to observe design flow
  • helpful when trying to optimize designs for speed
    or area
  • difficult when implementing large amounts of
    miscellaneous logic (state machines)
  • heirarchical schematic tools help make large
    designs more manageable
  • global changes difficult (hard to change global
    mistakes)
  • hardware mapping is much easier
  • schematic primitives for special hardware
    features
  • schematic attributes for routing info
  • WYSIWYG design entry

40
Design Tools Hardware Mapping
  • Many options for HDL to hardware mapping
  • vendor-specific options
  • third party tools
  • EDIF is the most common intermediate language
  • When using HDLs, good hardware mapping tools are
    critical for perfomance and device utilization
  • Deep understanding of HDL is also useful
  • Schematic hardware mapping is much easier - very
    close to WYSIWYG editing
  • Hardware mappers often perform aggressive logic
    optimizations - watch your assumptions! (hazards)

41
Design Tools Place and Route
  • Place and route tools are always vendor-specific
  • Much progress remains in place and route tools
  • typical PR times for a reasonably complex design
    is around 30 minutes to an hour
  • device utilization and performance still well
    below that of hand-placed and routed designs
  • Many vendors offer hand-placement or tweaking
    tools for speed and area critical applications
  • Partial compilation of macros in the works

42
Design Tools Timing Analysis
  • Especially important for path-dependant delay
    devices
  • Designs often iterate at this point - critical
    path is extracted and optimized
  • Timing analysis tools also have a ways to go
  • difficult to analyze designs with multiple clock
    domains
  • impossile to analyze designs with combinational
    loops

43
Design Tools Bitstream management
  • Bitstreams can be merged
  • daisy-chained devices

44
Configuration
  • Applies to ISP devices only
  • Many options
  • serial ROM
  • master mode with standard ROM
  • slave of intelligent host or another FPGA in a
    daisy chain
  • serial ROM
  • very popular in ASIC-style applications
  • low pin and parts count, but sometimes slower

45
Configuration
  • Master mode with parallel ROM device
  • FPGA drives a ROMs address bits and reads data
    from ROM
  • expensive in terms of pins, but pins can be
    reused in some designs
  • sometimes faster than serial methods
  • Slave modes
  • intelligent host configures FPGA
  • host can be a PC or another FPGA in master mode
  • most flexible method
  • many FPGA architectures allow daisy chaining

46
Other Considerations
  • Design for the future
  • vertical migration
  • largest FPGA for your budget
  • Be wary of logic interface levels and new low
    voltage devices
  • Pin-locking
  • some FPGA architectures perform very poorly under
    pin-locking (Altera 8K in particular)
  • all architectures experience some performance
    loss under pin-locking
  • I/O count
  • many designs are I/O limited, not logic limited
  • System performance, not just logic performance
  • includes I/O times and routing times, clock skew
  • compare to FF toggle rates often quoted by vendors
Write a Comment
User Comments (0)
About PowerShow.com