Title: Introduction to FPGA Technology, Devices and Tools
1Introduction to FPGATechnology, Devices and Tools
2FPGA Devices Technology
3World of Integrated Circuits
Full-Custom ASICs
Semi-Custom ASICs
User Programmable
PLD
FPGA
4FPGA Field Programmable Gate Array
ASIC Application Specific Integrated Circuit
- designs must be sent
- for expensive and time
- consuming fabrication
- in semiconductor foundry
- Small development
- overhead
- No NRE (non-recurring
- engineering) costs
- Quick time to market
- No minimum quantity
- order
- Reprogrammable
- designed all the way
- from behavioral description
- to physical layout
5How can we make a programmable logic?
- One time programmable
- Fuses (destroy internal links with current)
- Anti-fuses (grow internal links)
- PROM
- Reprogrammable
- EPROM
- EEPROM
- Flash
- SRAM - volatile
6What is an FPGA?
Configurable Logic Blocks
I/O Blocks
Block RAMs
7Which Way to Go?
ASICs
FPGAs
Off-the-shelf
High performance
Low development cost
Low power
Short time to market
Low cost in high volumes
Reconfigurability
8Other FPGA Advantages
- Manufacturing cycle for ASIC is very costly,
lengthy and engages lots of manpower - Mistakes not detected at design time have large
impact on development time and cost - FPGAs are perfect for rapid prototyping of
digital circuits - Easy upgrades like in case of software
- Unique applications
- reconfigurable computing
9Major FPGA Vendors
- SRAM-based FPGAs
- Xilinx, Inc.
- Altera Corp.
- Atmel
- Lattice Semiconductor
- Flash antifuse FPGAs
- Actel Corp.
- Quick Logic Corp.
Share over 60 of the market
10 11Xilinx
- Primary products FPGAs and the associated CAD
software - Main headquarters in San Jose, CA
- Fabless Semiconductor and Software Company
- UMC (Taiwan) Xilinx acquired an equity stake in
UMC in 1996 - Seiko Epson (Japan)
- TSMC (Taiwan)
ISE Alliance and Foundation Series Design
Software
12Xilinx FPGA Families
- Old families
- XC3000, XC4000, XC5200
- Old 0.5µm, 0.35µm and 0.25µm technology. Not
recommended for modern designs. - High-performance families
- Virtex (0.22µm)
- Virtex-E, Virtex-EM (0.18µm)
- Virtex-II, Virtex-II PRO (0.13µm)
- Low Cost Family
- Spartan/XL derived from XC4000
- Spartan-II derived from Virtex
- Spartan-IIE derived from Virtex-E
- Spartan-3
13Basic Spartan-II FPGA Block Diagram
14CLB Structure
- Each slice has 2 LUT-FF pairs with associated
carry logic - Two 3-state buffers (BUFT) associated with each
CLB, accessible by all CLB outputs
15CLB Slice Structure
- Each slice contains two sets of the following
- Four-input LUT
- Any 4-input logic function,
- or 16-bit x 1 sync RAM
- or 16-bit shift register
- Carry Control
- Fast arithmetic logic
- Multiplier logic
- Multiplexer logic
- Storage element
- Latch or flip-flop
- Set and reset
- True or inverted inputs
- Sync. or async. control
16LUT (Look-Up Table) Functionality
- Look-Up tables are primary elements for logic
implementation - Each LUT can implement any function of 4 inputs
175-Input Functions implemented using two LUTs
- One CLB Slice can implement any function of 5
inputs - Logic function is partitioned between two LUTs
- F5 multiplexer selects LUT
185-Input Functions implemented using two LUTs
OUT
19Dedicated Expansion Multiplexers
- MUXF5 combines 2 LUTs to create
- Any 5-input function (LUT5)
- Or selected functions up to 9 inputs
- Or 4x1 multiplexer
- MUXF6 combines 2 slices to form
- Any 6-input function (LUT6)
- Or selected functions up to 19 inputs
- 8x1 multiplexer
- Dedicated muxes are faster and more space
efficient
20Distributed RAM
- CLB LUT configurable as Distributed RAM
- A LUT equals 16x1 RAM
- Implements Single and Dual-Ports
- Cascade LUTs to increase RAM size
- Synchronous write
- Synchronous/Asynchronous read
- Accompanying flip-flops used for synchronous read
21Shift Register
- Each LUT can be configured as shift register
- Serial in, serial out
- Dynamically addressable delay up to 16 cycles
- For programmable pipeline
- Cascade for greater cycle delays
- Use CLB flip-flops to add depth
22Shift Register
- Register-rich FPGA
- Allows for addition of pipeline stages to
increase throughput - Data paths must be balanced to keep desired
functionality
23Carry Control Logic
COUT
YB
Look-Up Table
Carry Control Logic
Y
G4 G3 G2 G1
S
D
Q
O
CK
EC
R
F5IN
BY SR
XB
Look-Up Table
Carry Control Logic
X
S
F4 F3 F2 F1
D
Q
O
CK
EC
R
CIN CLK CE
SLICE
24Fast Carry Logic
- Each CLB contains separate logic and routing for
the fast generation of sum carry signals - Increases efficiency and performance of adders,
subtractors, accumulators, comparators, and
counters - Carry logic is independent of normal logic and
routing resources
MSB
Carry Logic Routing
LSB
25Accessing Carry Logic
- All major synthesis tools can infer carry logic
for arithmetic functions - Addition (SUM lt A B)
- Subtraction (DIFF lt A - B)
- Comparators (if A lt B then)
- Counters (count lt count 1)
26Block RAM
- Most efficient memory implementation
- Dedicated blocks of memory
- Ideal for most memory requirements
- 4 to 14 memory blocks
- 4096 bits per blocks
- Use multiple blocks for larger memories
- Builds both single and true dual-port RAMs
27Dual Port Block RAM
28Dual-Port Bus Flexibility
RAMB4_S4_S16
WEA
Port A Out 4-Bit Width
Port A In 1K-Bit Depth
ENA
RSTA
DOA30
CLKA
ADDRA90
DIA30
WEB
Port B Out 16-Bit Width
Port B In 256-Bit Depth
ENB
RSTB
DOB150
CLKB
ADDRB70
DIB150
- Each port can be configured with a different data
bus width - Provides easy data width conversion without any
additional logic
29Two Independent Single-Port RAMs
RAMB4_S1_S1
Port A In 2K-Bit Depth
Port A Out 1-Bit Width
VCC, ADDR100
Port B In 2K-Bit Depth
Port B Out 1-Bit Width
GND, ADDR100
- To access the lower RAM
- Tie the MSB address bit to Logic Low
- To access the upper RAM
- Tie the MSB address bit to Logic High
- Added advantage of True Dual-Port
- No wasted RAM Bits
- Can split a Dual-Port 4K RAM into two Single-Port
2K RAM - Simultaneous independent access to each RAM
30I/O Banking
31Basic I/O Block Structure
Q
D
Three-State
EC
FF Enable
Three-StateControl
Clock
SR
Set/Reset
Q
D
Output
EC
FF Enable
Output Path
SR
Direct Input
FF Enable
Input Path
Q
D
Registered Input
EC
SR
32IOB Functionality
- IOB provides interface between the package pins
and CLBs - Each IOB can work as uni- or bi-directional I/O
- Outputs can be forced into High Impedance
- Inputs and outputs can be registered
- advised for high-performance I/O
- Inputs can be delayed
33Routing Resources
34Clock Distribution
35FPGA Nomenclature
36 37Device Families Tools
38Logic Element FLEX10K
39Logic Array Block FLEX10K
40FLEX10K Architecture
41Stratix Architecture
42Stratix Device Family
Feature EP1S10 EP1S20 EP1S25 EP1S30 EP1S40 EP1S60 EP1S80 EP1S120
Logic Elements (LEs) 10,570 18,460 25,660 32,470 41,250 57,120 79,040 114,140
M512 RAM Blocks( 512 Bits Parity) 94 194 224 295 384 574 767 1,118
M4K RAM Blocks(4 Kbits Parity) 60 82 138 171 183 292 364 520
M512 RAM Blocks(512 Kbits Parity) 1 2 2 4 4 6 9 12
Total RAM bits 920,448 1,669,248 1,944,576 3,317,184 3,423,744 5,215,104 7,427,520 10,118,016
DSP Blocks 6 10 10 12 14 18 22 28
Embedded Multipliers 48 80 80 96 112 144 176 224
PLLS 6 6 6 10 12 12 12 12
Maximum User I/O Pins 426 586 706 726 822 1,022 1,238 1,314
Engineering Sample Availability Now Use Production Use Production N/A Now N/A Now 2003
Production Device Availability March 2003 Now Now Now March 2003 April 2003 January 2003 2003
43FPGA Technology Roadmap
year 1995 1996 1997 2000 2003 2004 ?
Technology 0.6µ 0.35 µ 0.25 µ 0.18 µ 0.13 µ 0.07µ
Gate count 25K 100K 250K 1 M 100K LC 8Mb RAM 400 18X18 multipliers
Transistor count 3.5M 12M 23M 75M 430M 1B
note Xilinx Virtex-II Pro XC2VP100 (9/16/2003)
44- Advance architecture on
- modern FPGAs
45More guts
- Additional components
- RAM blocks
- Dedicated multipliers
- Tri-state buffers
- Transceivers
- Processor cores
- DSP blocks
46Dedicate Arithmetic Blocks
QuickLogic
Altera
Xilinx
47Processor Cores
48PowerPC on Vertex II Pro
- Embedded 300 MHz Harvard Architecture Core
- Low Power Consumption 0.9 mW/MHz
- Five-Stage Data Path Pipeline
- Hardware Multiply/Divide Unit
- Thirty-Two 32-bit General Purpose Registers
- 16 KB Two-Way Set-Associative Instruction Cache
- 16 KB Two-Way Set-Associative Data Cache
- Memory Management Unit (MMU)
- - 64-entry unified Translation Look-aside Buffers
(TLB) - - Variable page sizes (1 KB to 16 MB)
- Dedicated On-Chip Memory (OCM) Interface
- Supports IBM CoreConnect Bus Architecture
- Debug and Trace Support
- Timer Facilities
49ARM in Excalibur
- Industry-standard ARM922T 32-bit RISC processor
core operating up to 200MHz - ARMv4T instruction set with Thumb extensions
- Memory management unit (MMU) included for
real-time operating systems (RTOS) support - Harvard cache architecture with 64-way set
associative separate 8-Kbyte instruction and
8-Kbyte data caches - Embedded programmable on-chip peripherals
- ETM9 embedded trace module to assistant software
debugging - Flexible interrupt controller
- Universal asynchronous receiver/transmitter
(UART) - General-purpose timer
- Watchdog timer
50FPGA Tools
51Design process (1)
Design and implement a simple unit permitting to
speed up encryption with RC5-similar cipher with
fixed key set on 8031 microcontroller. Unlike in
the experiment 5, this time your unit has to be
able to perform an encryption algorithm by
itself, executing 32 rounds..
Specification (Lab Experiments)
VHDL description (Your Source Files)
Library IEEE use ieee.std_logic_1164.all use
ieee.std_logic_unsigned.all entity RC5_core is
port( clock, reset,
encr_decr in std_logic
data_input in std_logic_vector(31 downto 0)
data_output out std_logic_vector(31
downto 0) out_full in
std_logic key_input in
std_logic_vector(31 downto 0)
key_read out std_logic ) end
AES_core
Functional simulation
Synthesis
Post-synthesis simulation
52Design process (2)
Implementation
Timing simulation
Configuration
On chip testing
53Active-HDL
54Simulation Tools
Synthesis Tools
55Logic Synthesis
VHDL description
Circuit netlist
architecture MLU_DATAFLOW of MLU is signal
A1STD_LOGIC signal B1STD_LOGIC signal
Y1STD_LOGIC signal MUX_0, MUX_1, MUX_2, MUX_3
STD_LOGIC begin A1ltA when (NEG_A'0')
else not A B1ltB when (NEG_B'0') else not
B YltY1 when (NEG_Y'0') else not
Y1 MUX_0ltA1 and B1 MUX_1ltA1 or
B1 MUX_2ltA1 xor B1 MUX_3ltA1 xnor
B1 with (L1 L0) select Y1ltMUX_0 when
"00", MUX_1 when "01", MUX_2 when
"10", MUX_3 when others end MLU_DATAFLOW
56Features of synthesis tools
- Interpret RTL code
- Produce synthesized circuit netlist in a standard
EDIF format - Give preliminary performance estimates
- Some can display circuit schematics corresponding
to EDIF netlist
57Implementation
- After synthesis the entire implementation process
is performed by FPGA vendor tools - Xilinx ISE foundation 6.2i
- Altera Quartus II 4.0
- 3rd party tools for alliance version
58Circuit Compilation
1. Technology Mapping
2. Placement
Assign a logical LUT to a physical location.
3. Routing
Select wire segments And switches
for Interconnection.
59Routing Example
FPGA
Programmable Connections
60Static Timing Analyzer
- Performs static analysis of the circuit
performance - Reports critical paths with all sources of delays
- Determines maximum clock frequency
61Static Timing Analysis
- Critical Path The Longest Path From Outputs of
Registers to Inputs of Registers
- Min. Clock Period Length of The Critical Path
- Max. Clock Frequency 1 / Min. Clock Period
62Configuration
- Once a design is implemented, you must create a
file that the FPGA can understand - This file is called a bit stream a BIT file
(.bit extension) - The BIT file can be downloaded directly to the
FPGA, or can be converted into a PROM file which
stores the programming information