Title: Section I Introduction to Programmable Logic Devices
1Section I Introduction to Programmable Logic
Devices
2Programmable Logic Device Families
Source Dataquest
Programmable Logic Devices (PLDs)
Gate Arrays
Cell-Based ICs
Full Custom ICs
SPLDs (PALs)
FPGAs
Acronyms SPLD Simple Prog. Logic Device PAL
Prog. Array of Logic CPLD Complex PLD FPGA
Field Prog. Gate Array
- Common Resources
- Configurable Logic Blocks (CLB)
- Memory Look-Up Table
- AND-OR planes
- Simple gates
- Input / Output Blocks (IOB)
- Bidirectional, latches, inverters,
pullup/pulldowns - Interconnect or Routing
- Local, internal feedback, and global
3CPLDs and FPGAs
CPLD FPGA
Complex Programmable Logic Device
Field-Programmable Gate Array
Architecture PAL/22V10-like Gate
array-like More Combinational More Registers
RAM Density Low-to-medium Medium-to-high
0.5-10K logic gates 1K to 500K system
gates Performance Predictable timing
Application dependent Up to 200 MHz today
Up to 135MHz today Interconnect Crossbar
Incremental
Not shown Simple PLD (SPLD) Architecture
4PLD Industry Growth
5Programmable Logic vs. Semi-Custom ASIC Market
Total 1996 Market 9.5B
Total 2001 Market 15.8B
Mask ProgrammedGate Arrays7.4B
Mask ProgrammedGate Arrays5.6B
47
59
20
21
37
16
ProgrammableLogic Share 1.9B
Standard Logic2.0B
Standard Logic2.6B
Programmable Logic Share 5.8B
Source Dataquest, May 1997
6Who is Xilinx?
- Worlds leading innovator of complete
programmable logic solutions - Inventor of the Field Programmable Gate Array
- 600M Annual Revenues 35 annual growth
- Fabless Semiconductor and Software Company
- UMC (Taiwan) Xilinx acquired an equity stake in
UMC in 1996 - Yamaha (Japan)
- Seiko Epson (Japan)
Foundation and Alliance Series Design Software
7Xilinx vs. Competitors1997 Calendar Year Revenues
Millions
Source Company reports In-Stat. Includes
SPLD, CPLD, FPGA revenues.
8FPGA Market Share Q4 1997
Source In-Stat Research, March 1998 Altera
number includes both 8K and 10K families
9Process Density Leadership
Virtex 1 Million Gates
0.25u process
XC40250XV 500K gates
XC40150XV
Transistor Count (millions)
XC40125XV - Industrys 1st 0.25u PLD. 250K
gates, 5 LM.
3Q98
4Q98
10Xilinx Integrated Circuit Products
- XC9500 Flash-based In System Program. CPLDs
- Lowest price, best pin locking, 600 - 7K gates
- XC4000 Industrys largest fastest FPGAs
- XC4000E 0.5?, 5V, 5K - 40K gates
- XC4000EX 0.5?, 5V, 45K - 60K gates
- XC4000XL 0.35?, 3.3V devices, 5V
compatible I/O, 3K - 180K gates - XC4000XV 0.25?, 2.5V / 3.3V, 5V compatible
I/O, 250K - 500K gates - Spartan 0.5, 5V, Low Cost, 10K - 40K gates
- Virtex New FPGA architecture in 1998
- 0.25?, 5LM, 250K-1M gates, Select Block-RAM
- XC6200 Reconfigurable Processing Unit
- Dynamically and partially reconfigurable
- Low-cost solutions (Industry)
- XC3000 (no RAM), XC5200 (no RAM), HardWire
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Gates are in terms of system-level gates
11XC9500 CPLDs
- 5 volt in-system programmable (ISP) CPLDs
- 5 ns pin-to-pin
- 36 to 288 macrocells (6400 gates)
- Industrys best pin-locking architecture
- 10,000 program/erase cycles
- Complete IEEE 1149.1 JTAG capability
12Xilinx XC4000 Architecture
- High Density -gt 1M System Gates
- SRAM Based LUT for Synchronous Dual Port RAM or
Logic - ASIC-like array structure
- Built-in Tri-States
- Infinite reconfigurations, downloaded from PC or
workstation in 1 second
Configurable Logic Blocks (CLBs)
I/O Blocks (IOBs)
Programmable Interconnect
13XC6200 Reconfigurable Processing Unit
1000x improvement in reconfiguration time from
external memory
I/O
Memory
CPU
FastMAPtm assures high speed direct access to all
internal registers
Microprocessor interface built-in XC6200 is
memory mapped to look like SRAM to a host
processor
XC6200 RPU
All registers accessed via built-in
low-skew FastMAPtm busses
High capacity distributed memory permits
allocation of chip resources to logic or memory -
256kbits in XC6264
Ultrafast Partial Reconfiguration (40ns to 100s
of usec)
I/O
Up to 100,000 gates
14Exponential Growth in Density
- Nov. 1997- shipping worlds largest
FPGA, XC40125XV (10,982 logic cells, 250K
System Gates) - 1 Logic cell 4-input LUT FF
- 175,000 Logic cells 2.0 M logic gates in 2001
15Design Flow
Design Entry in schematic, ABEL, VHDL, and/or
Verilog. Vendors include Synopsys, Aldec (Xilinx
Foundation), Mentor, Cadence, Viewlogic, and 35
others.
Implementation includes Placement Routing and
bitstream generation using Xilinxs M1
Technology. Also, analyze timing, view layout,
and more.
Download directly to the Xilinx hardware
device(s) with unlimited reconfigurations !!
3
XC9500 has 10,000 write/erase cycles
16 Foundation Series Delivers Value Ease of Use
- Complete, ready-to-use software solution
- Simple, easy-to-use design environment
- Easy-to-learn schematic, state-diagram, ABEL,
VHDL, Verilog design - Synopsys
- FPGA
- Express
- Integration
17The Xilinx Student Edition
- Prentice Halls most requested new engineering
product in Q1 98 ! - Complete, affordable, and practical digital
design course environment for all students - Predeveloped and tested lab-based course
- Includes
- Foundation Series 1.3 for students computers
- Practical Xilinx Designer lab tutorial book
- Coupon for XS40-005XL and XS95-108 boards (129)
- Sold through bookstores by Prentice Hall and
www.Amazon.com, listed at 79 (ISBN 0136716296) - Integrated tutorial projects coverTTL, Boolean
Logic, State Machines, Memories, Flip Flops,
Timing, 4-bit and 8-bit processors - Upgradeable for free to F1.4 Express with VHDL
Verilog, 40K gates, VHDL labs on the web
18Section IIBasic PLD Architecture
19Section II Agenda
- Basic PLD Architecture
- XC9500 and XC4000 Hardware Architectures
- Foundation and Alliance Series Software
20Section IIBasic PLD Architecture XC9500 and
XC4000 Hardware Architectures
21XC9500 CPLDs
- 5 volt in-system programmable (ISP) CPLDs
- 5 ns pin-to-pin
- 36 to 288 macrocells (6400 gates)
- Industrys best pin-locking architecture
- 10,000 program/erase cycles
- Complete IEEE 1149.1 JTAG capability
22XC9500 - Architectural Features
- Uniform, all pins fast, PAL-like architecture
- FastCONNECT switch matrix provides 100 routing
with 100 utilization - Flexible function block
- 36 inputs with 18 outputs
- Expandable to 90 product terms per macrocell
- Product term and global three-state enables
- Product term and global clocks
- Product term and global set/reset signals
- 3.3V/5V I/O operation
- Complete IEEE 1149.1 JTAG interface
23XC9500 Function Block
Each function block is like a 36V18 !
24XC9500 Product Family
9536
9572
95108
95144
95216
95288
Macrocells
36
72
108
144
216
288
Usable Gates
800
1600
2400
3200
4800
6400
tPD (ns)
5
7.5
7.5
7.5
10
10
Registers
36
72
108
144
216
288
Max I/O
34
72
108
133
166
192
VQ44 PC44
PC44 PC84 TQ100 PQ100
PC84 TQ100 PQ100 PQ160
PQ100 PQ160
Packages
HQ208 BG352
PQ160 HQ208 BG352
25XC4000 Architecture
Programmable Interconnect
I/O Blocks (IOBs)
Configurable Logic Blocks (CLBs)
26XC4000E/X Configurable Logic Blocks
- 2 Four-input function generators (Look Up Tables)
- - 16x1 RAM or
- Logic function
- 2 Registers
- - Each can be
- configured as Flip
- Flop or Latch
- - Independent
- clock polarity
- - Synchronous and
- asynchronous
- Set/Reset
27Look Up Tables
- Combinatorial Logic is stored in 16x1 SRAM Look
Up Tables (LUTs) in a CLB - Example
Look Up Table
4-bit address
4
(2 )
2
64K !
- Capacity is limited by number of inputs, not
complexity - Choose to use each function generator as 4 input
logic (LUT) or as high speed sync.dual port RAM
28XC4000X I/O Block Diagram
Shaded areas are not included in XC4000E family.
29Xilinx FPGA Routing
- 1) Fast Direct Interconnect - CLB to CLB
- 2) General Purpose Interconnect - Uses switch
matrix
- 3) Long Lines
- Segmented across chip
- Global clocks, lowest skew
- 2 Tri-states per CLB for busses
- Other routing types in CPLDs and XC6200
30Other FPGA Resources
- Tri-state buffers for busses (BUFTs)
- Global clock high speed buffers (BUFGs)
- Wide Decoders (DECODEx)
- Internal Oscillator (OSC4)
- Global Reset to all Flip-Flops, Latches (STARTUP)
- CLB special resources
- Fast Carry logic built into CLBs
- Synchronous Dual Port RAM
- Boundary Scan
31Whats Really In that Chip?
Switch Matrix
Direct Interconnect (Green)
CLB (Red)
Long Lines (Purple)
32XC4000XL Family
4005XL 4010XL 4013XL 4020XL 4028XL Logic
Cells 466 950 1,368 1,862 2,432 Typ Gate Range
3 - 9K 7-20K 10-30K 13-40K 18-50K (Logic
Select-RAM) Max. RAM bits 6K 13K 18K 25K 33K (no
Logic) I/O 112 160 192 224 256 Initial
Packages PC84 PC84 PQ100 PQ100 PQ160 PQ160 PQ160
PQ160 PQ208 PQ208 PQ208 PQ208 HQ208 PQ240 P
Q240 HQ240 BG256 BG256 BG256 BG352 BG352
4036XL 4044XL 4052XL 4062XL 4085XL 40125XV
Logic Cells 3,078 3,800 4,598 5,472 7,448 10,9
82 Typ Gate Range 22-65K 27-80K 33-100K 40-130K
55-180K 78-250K (Logic Select-RAM) Max. RAM
bits 42K 51K 62K 74K 100K 158K (no
Logic) I/O 288 320 352 384 448 544 Initial
packages HQ208 HQ240 HQ240 HQ240 HQ240 BG352
BG432 BG432 BG432 BG432 PG411 PG411 PG411 PG475
PG559 PG559 BG560 BG560 BG560 BG560
20-25 of CLBs as RAM
25-30 of CLBs as RAM
33HardWireTM
- Unique no-risk 100 compatible mask-programmed
cost reduction of Xilinx FPGA - Cost-effective for volume applications
- Savings of 40 to 70
- Architecture-equivalent mask-programmed version
of any FPGA - Requires virtually no customer engineering
resources, test vectors, or simulation - ALL FPGA features (e.g., Configuration, Power-On
Reset, JTAG, etc.) are fully supported
34HardWire Methodology vs. Gate Array Conversion
35Cost Reduction Density Increases
1996
1997
1998
Cost
XC40250XV (500K System-level Gates)
1M Gates
XC4085XL
XC4036EX
XC4000EX
XC4000XL
XC4000XV
XC4000E
Virtex Series
HardWire
XC5200
5,000
85,000
Logic Gates
250,000
36,000
Logic Cells
7.5K
0.4K
20K
3K
Starting with Virtex, Xilinx numbering scheme
reflects approximate Logic RAM gates rather
than Logic gates only.
36CPLD or FPGA?
- CPLD
- Non-volatile
- JTAG Testing
- Wide fan-in
- Fast counters, state machines
- Combinational Logic
- Small student projects, lower level courses
- FPGA
- SRAM reconfiguration
- Excellent for computer architecture, DSP,
registered designs - ASIC like design flow
- Great for first year to graduate work
- More common in schools
- PROM required for non-volatile operation
37Section IIBasic PLD Architecture Foundation
and Alliance Series Software
38Xilinx M1-Based Software
Libraries and Interfaces for Leading EDA Vendors
Core Implementation Software - Map, Place, Route,
Bitstream generation, and analysis
Complete, Ready-to-Use Includes Schematic,
Simulation, VHDL and Verilog Synthesis
Graphical User Interface is very similar to
XACTStep v.6.0
39Design Tools
- Standard CAE entry and verification tools
- Xilinx Implementation software implements the
design - The design is optimized for best performance and
minimal size - Graphical User Interface and Command Line
Interface - Easy access to other Xilinx programs
- Manages and tracks design revisions
Foundation or Alliance
Functional Simulation
Design Entry
Simulator
Back Annotation
Schematic, State Mach., HDL Code, LogiBLOX, CORE
Gen
Verification Static Timing Analysis, In-Circuit
Testing
M1 Design Manager
Xilinx
Design Implementation
40Multi-Source IntegrationMixed-Level Flows
HDL
Schematic
- Enables multiple sources and multiple EDA vendors
in the same flow - Allows team development
- Reduces design source translations
- Design the way you are used to
- Enables rapid, accurate iterations
- Works well within existing ASIC flows
- Facilitates Design Reuse
Existing Designs
Cores
Design Source Integration
EDIF VHDL Verilog SDF
StandardsBased
Check Point Verification
Knowledge Driven Implementation
413rd Party Support Libraries
- Xilinx 3rd Party Design Entry Simulation
Support - Synopsys, Cadence, Mentor Graphics, Aldec
(Foundation) - Viewlogic, Synplicity, OrCad, Model Technologies,
Synario, Exemplar and others supply libs
interfaces - Industry standard file formats
- VHDL, Verilog, and EDIF netlist formats
- SDF Standard Delay files
- VITAL library support
- Xilinx Libraries
- Optimized components for use in any Xilinx FPGA
or CPLD - Wide range of functions
- Comparators, Arithmetic functions, memory
- DSP and PCI interfaces
- Easy to use with ABEL, VHDL, Verilog, schematic
entry
42Libraries, Macros Attributes
- Libraries are common design sets for all design
entry tools (eg. text, schematic, Foundation,
Synopsys, Viewlogic, etc.) - Library interfaces are specific to each front
end - Attributes are library element properties
- Online Libraries Guide has full listings and
descriptions
- Unified Libraries
- Boolean functions, TTL, Flip-Flops, Adders, RAM,
small functions - LogiBlox Libraries
- Variable size blocks of adders, registers, RAM,
ROM, etc. - Properties defined as attributes
43Core Design TechnologyOptimal Core Creation
Flexible Core Delivery
Data sheets
Parameterizable Cores
Web Mechanism to Download New Cores
Third Party System Tools Directly Linked With
Core Generator
44Foundation Series Express Overview
- Easy to use, yet powerful
- Based on Industry Standards, not proprietary
languages - Features
- Schematic (partnership with Aldec)
- IEEE VHDL, Verilog, ABEL
- State Diagram Editor
- Interactive Simulation
- Exclusive partnership with Synopsys, the
synthesis leader
Synopsys
Aldec
Xilinx
45Foundation Project Manager
- Integrates all tools into one environment
46Schematic Entry
47ABEL and VHDL Text Entry
- From schematic menu (or via HDL Editor), select
Hierarchy -gt New Symbol Wizard to create symbol. - Select HDL Editor Language Assistant to learn
by example, then define block. - Synthesize to EDIF.
48State Machine Graphical Editor
- Graphical editor synthesizes into ABEL or VHDL
code
49Simulation - Easy to Use and Learn
- Generate stimulus easily and quickly
- Keyboard toggling
- Simple clock stimulus
- Custom formulas
- Easy debugging
- Waveform viewer
- Signals easily added and removed
- Simulator access from schematic
- Color-coded values on schematic
- Script Editor
50Foundation Express 1.4 Features
- Express Technology
- Optimizes the design for Xilinx Architectures
- Optimized arithmetic functions
- Automatic Global Signal Mapping
- Automatic I/O Pad Mapping
- Resource Sharing
- Hierarchy Control
- Source Code Compatible With Synopsys Design
Compiler and FPGA Compiler - Verilog (IEEE 1364) and VHDL (IEEE 1076-1987)
Support - Easy, graphical constraint entry
- F1.4 is stand-alone
- F1.5 Sept / Oct 98
- Integrated into Foundation Project Manager
- Replaces Metamor
51Xilinx-Express Design Flow
52Express Input and Output
- Input files may be VHDL or Verilog format
- Timing Specifications are not used during
Synthesis - Timing Specifications can be included in the
output netlist
- Mixed Verilog/VHDL modules are accepted
- Schematics may also be used, but should not be
input into Express - Schematic files in XNF or EDIF format will be
merged into the design in Xilinx Design Manager - Output netlists are in XNF format
- Timing Specifications may be specified in Express
VHDL Verilog
Timing Requirements
Express
Reports
.XNF
53Express Design Process
- 1. Analyze - Syntax check
- 2. Implement - Create generic logic design
(Elaborate) - 3. Enter constraints and options
- 4. Synthesize - Optimize the design for specific
device - 5. Export XNF Netlist
- 6. Implement layout with Xilinx Design Manager
54Implementation - M1 Design Manager
- Manages design data
- Access reports
- Supports CPLDs, FPGAs
Flow Engine
Timing Analyzer
PROM File Formatter
Hardware Debugger
EPIC Design Editor
55Terminology
- Project
- Source file has a defined working directory and
family - Version
- A Xilinx netlist translation of the schematic
- Multiple Versions result from iterative schematic
changes - Revision
- An implementation of a Xilinx netlist
- Multiple revisions typically result from
different options - Part type
- Specified at translation can be changed in a new
revision
56Toolbox Programs
- Flow Engine
- Controls start/stop points and custom options
- Timing Analyzer
- Report on net and path delays
- PROM File Formatter
- Create file to program configuration file into
PROM - Hardware Debugger
- Download configuration file with XChecker, Serial
or JTAG Cable - EPIC Design Editor
- Device-level view of routing
57Flow Engine
- View status of tools
- Control tool options
- Implements design to the bitstream
58Section III Advanced Hardware Design
Techniques
59Section III Agenda
- Advanced Hardware Design Techniques
- General Hardware Information
- Combinational Logic Design (Look Up Tables and
other Resources) - Synchronous Logic (Flip Flops and Latches
- Memory Design (RAM and ROM)
- Input / Output Design
60Section III Advanced Hardware Design
TechniquesGeneral Hardware Information
61Resource Estimation
- Find comparable functions in macro library and
XAPP application notes - Or, use other designs to estimate device
utilization - Or, quickly implement a design and view the MAP
report file - Select Utilities -gt Report Browser -gt Map Report
- IOBs, CLBs, Global Buffers, and other components
listed separately - For unfinished designs
- Use save flags on unconnected nets, or
- Deselect Trim Unconnected Logic in
Implementation Options
62Performance Estimation
- Use block delays as estimate of net delays
- Use desired clock frequency to determine allowed
CLB depth - Compare to functional requirements and modify
design to meet performance needs - Example for 50 MHz clock frequency in XC4000XL-3
- Clock period 20 ns
- One level - 8 ns (tCO tNET tSU)
- Delay allowance 12 ns
- Each added level 6 ns (tPD tNET)
- Added levels of logic allowed 2 CLBs
63Power Consumption
- Xilinx FPGAs have flexible routing
- Power consumption can be half that of FPGAs with
less flexible routing channels
- Power kCV2F
- How many nodes change state (hard to estimate)
- Capacitive loading on CLB and IOB outputs (known)
- Power consumption is not a concern in regular
course labs - Power estimation methods
- See application notes under http//www.xilinx.com/
apps/3volt.htm
64XC4000XL 3.3 V, 0.35m, 5 Volt Compatible
5 V
3.3 V
5 V Tolerant Inputs
Any 5 V device
XC4000XL FPGA 0.35 m 3.3 V Logic 3.3 V I/O
5 V
3.3 V
Meets TTL Levels
- Accepts 5Volt inputs
- Drives standard TTL levels
- Totally compatible in 5Volt environment
- 0.25m XV family is also 5 Volt TTL compatible
when - used with 3.3Volt I/O supply, 2.5Volt core supply
65XC4000XV Virtex 2.5 V, 0.25m, 5 Volt Compatible
- Devices with 5V, 3.3V, and 2.5V power supplies
can be interfaced
66Section III Advanced Hardware Design
TechniquesCombinational Logic Design (Look Up
Tables and Other Resources)
67XC4000X Configurable Logic Blocks
- G, F, H function generators
- 2 Flip-Flops
- Individual clock polarity
- Sync. and async. Set/Reset
- Delay from F1 to Y in the XC4000X-1 is 1 nsec
68Look Up Tables
- Combinatorial Logic is stored in 16x1 SRAM Look
Up Tables (LUTs) in a CLB - Example
Look Up Table
4-bit address
4
(2 )
2
64K !
- Capacity is limited by number of inputs, not
complexity - Choose to use each function generator as 4 input
logic (LUT) or as high speed sync.dual port RAM
6916-bit Adder Examples
- Many choices for implementing an adder
- Speed vs. density trade-off controlled by user
and PLD features
Family
Type
CLBs
Levels
AppLINX
XC3000A
Bit-Serial
16
16
XAPP 022
XC3000A
Parallel
24
8
XAPP 022
XC3000A
Lookahead
30
6
XAPP 022
XC3000A
Conditional
41
3
XAPP 022
XC4000E-3
Carry
8
10.1ns
XAPP 018
XC5200-5
Carry
8
20ns
5200 DataSheet
70Arithmetic Functions
- Arithmetic Macros are optimized for density and
speed with dedicated carry logic in CLBs - Example Each CLB can form a two-bit full-adder
- Carry Logic components have vertical orientation
- Needed for speed and utilization
- Known as RPM or Relationally Placed Macro
- Examples
- ADDx adders
- ADSUx adder/subtractors
- CCx counters
- COMPMCx magnitude
- comparators
71Three-State Buffers
- Each CLB is associated with two Three-State
buffers (BUFT) - BUFTs are used independently of LUTs and
Flip-Flops - Three-State library components
- Three-state buffers BUFT, BUFT4, BUFT8, BUFT16
- Wired AND (open Drain) WAND1, WAND4, WAND8,
WAND16 - Two input OR driving Wired AND WOR2AND
- Delay varies per family
- 3.7 ns in the XC4005XL (-1)
- 13.6 ns in the XC4085XL (-1)
72Use BUFT for Buses
- Use to multiplex signals onto long routing lines
to use as buses
BUFT
73BUFTs for Multiplexers
- BUFT can can be used to build large MUXes
- Large MUXes composed of LUTs need multiple levels
of logic - Large MUXes composed of BUFTs have only one level
of logic - CLB resources are not used
- Use of BUFTs constrains placement
- Multiplexer macros use lookup tables
- Example M4_1E
- Create BUFT macros from Three-State buffer
components - BUFT, BUFT4, BUFT8, BUFT16
74Wide Decoders
- The Wide Decoder is a dedicated wired-AND
- Useful for address decoding
- IOBs or CLBs can drive the Wide Decoder
- Located along the periphery of the die
- All IOB drivers must be on same edge as the
decoder - Four decoder lines per edge
- Use DECODE macro
- DECODE4/8/16/24
- Must use a PULLUP primitive
75CLB Mapping Control in Schematic
- Allows user to force mapping of logic from
schematic into a single CLB - XC3000
- CLBMap can specify entire CLB
- XC4000/XC5000
- FMap specifies a function generator in a CLB
- HMap specifies an XC4000 H function generator in
a CLB
A0
FMAP
B0
I1
A0
I2
B0
C0
C0
O
I3
A2
A2
I4
B2
B2
76Section III Advanced Hardware Design Techniques
Synchronous Logic(Flip-Flops and Latches)
77CLB Registers
- Each register can be configured as a Flip-Flop or
Latch - Independent clock polarity
- Asynchronous
Set or Reset - Clock Enable
- Direct input from CLB input (Connections bypass
LUTs)
78Library offerings
- Unified library contains many standard
functions - Pre-defined size and functionality
- LogiBLOX templates are available
- Can be customized for bus size and function
- Types of LogiBLOX register functions
- Shift Registers
- Left/Right, Arithmetic, Logical, Circular
- Clock Dividers
- Output Duty Cycle
- Counters
- LFSR, Binary, One_Hot, Carry Logic
- Accumulators
- Xilinx CORE Generator recommended for very
complex functions (DSP, FFT, UARTs,
Multipliers...)
79Naming Conventions
80Counters
- Libraries support a wide variety of fast and
efficient counters - Counters offer trade-offs between speed, density,
and complexity - Example LogiBlox counter styles
- Binary predictable outputs, uses carry logic
- Johnson fastest practical counter, but uses more
flip-flops glitch free decoding - LFSR fast dense, but pseudo-random outputs
- One-Hot useful for generating series of enables
- Carry Chain High speed and density
- The LogiBlox synthesizer will automatically pick
the best implementation based on your design, or
you can force an implementation with the STYLE
parameter (schematic).
8116 Bit Counter Examples
- The following are implemented in XC4000XL-3
- Macro CLBs Clock
- CB16CLE/D 18 - 20 23 - 24 ns
- CC16CLED 19 19 ns
- CC16CLE 9 16 ns
- X-BLOX LFSR 9 7 ns
- Simpler functions are faster and smaller
- Carry Logic Counters are generally faster
(depends on size)
82Global Clock Buffers
- Clock Buffers are low-skew, high drive buffers
- Also known as Global Buffers
- Drive low-skew, high-speed long line resources
- Drive all Flip-Flops and Latches in FPGA
- Can also be used for high-fanout signals
- Additional clocks and high fanout signals can be
routed on long lines - Instantiation if the BUFG component is
instantiated, software will select one of these
buffers based on the design - Synthesis Clocks are identified by different
means depending on Vendor - Example Synopsys FPGA compiler connects clock
buffers to all fan-in of clock pins - Control clock buffer insertion with separate
commands - Consult Synthesis interface guide or vendor
83Global Buffer Types
- BUFGLS is used by default in the Xilinx software
if a - BUFG component is specified in the design
84Generating Clock On-Chip
- Internal configuration clock available after
configuration - Use OSC4 primitive
- Nominal values (approximately)
- 8 MHz, (500 kHz, 16 kHz, 490 Hz, 15 Hz)
- Very limited accuracy (/- 50)
85Global Reset
- All flip-flops are initialized during power up
via Global Set/Reset network - You can access Global Set/Reset network by
instantiating the STARTUP primitive - Assert GSR for global set or reset
- GSR is automatically connected to all CLB
flip-flops using dedicated routing resources - Saves general use routing resources for your
design - DO NOT CONNECT GSR to set/reset inputs on
Flip-Flops - Any signal can source the global set/reset, but
the source must be defined in the design
- Use Global Reset as much as possible
- Limit the number of flip-flops with an
asynchronous reset - Extra routing resources are used
86Avoid Gated-Clock or Asynch. Reset
- Move gating from clock pin to prevent glitch from
affecting logic.
Poor Design
TC and Q may glitch during the transition of
Qlt02gt from 011 to 100
Improved Designs
TC will not glitch during the transition of
Qlt02gt from 011 to 100
Or use MUXed data when using only 1-2 logic inputs
87Shift Registers are Fast Dense
- The CLB can handle two bits of a shift register
- Fast and dense independent of size
- Fast connections between adjacent lookup tables
88Prescale Non-Loadable Counters
- Counter speed is determined by the carry delay
from LSB to MSB - Non-loadable counters can use prescaling
- Pre-scaling restricts load timing
89Use One-Hot Encoding for State Machines
- Shift register is always fast and dense
- One-hot uses one flip-flop for each count
- Useful for state machine encoding in FPGAs
- Another alternative is a Johnson Counter
- Inverted output of last stage drives input of
first stage - Doubles the number of states versus one-hot
- Binary encoding is best for CPLDs
90State Machine Design Tips
- Split complex states
- Need to minimize number of inputs, not number of
flip-flops, in FPGAs - Use one-hot encoding for medium-size state
machines (8-16 states) - Complex states may be improved by breaking up
into additional simpler states
State A1
State A2
cond1
cond1
cond1
State B
State B
91Use binary sequence only if necessary
- CLB can generate any sequence desired at same
speed - Use Pre-Scaling on non-loadable counters to
increase speed - LSBs toggle quickly
- See Application Notes
- XAPP001 and XAPP014
- Use Gray code counters if decoding outputs
- One bit changes per transition
- Consider Linear Feedback
- Shift Register for speed when
- terminal count is all that is needed
- Or when any regular sequence
- is acceptable (e.g., FIFO)
-
92Pipeline for Speed
- Register-rich FPGAs encourage pipelining
- Pipelining improves speed
- Consider wherever latency is not an issue
- Use for terminal counts, carry lookahead, etc.
- How to estimate the clock period
- 2 x (number of combinatorial levels) x (speed
grade) - XC4000XL-3 3 levels x 2 x 3ns 18 ns clock
period
93Section III Advanced Hardware Design
TechniquesMemory Design (RAM and ROM)
94ROM is Equivalent to Logic
- When using ROM, it is simply defining logic
functions in a look-up table format - Memory might be an easier way to define logic
- Xilinx provides ROM library cells
- FPGA lookup tables are essentially blocks of RAM
- Data is written during configuration
- Data is read after configuration
- Effectively operate as a ROM
95RAM Provides 16X the Storage of Flip-Flops
- 32 bits versus 2 bits of storage
- Two 16x1 RAMS or One 32X1 Single Port Ram fit in
one CLB - One 16x1 Dual Port RAM fits in one CLB
- 32x8 shift register with RAM 11 CLBs
- Using flip-flops, takes 128 CLBs for data alone
- Address decoders not included
96RAM Types
- Synchronous RAM (SYNC_RAM)
- Synchronous Write Operation
- Synchronous Dual-Port (DP_RAM)
- Can read write to different addresses
simultaneously
97RAM Guidelines
- Less than 32 words is best
- 32x1 or 16x2 per RAM requires only one CLB
- Delays are short, (one level of logic)
- Data and output MUXes are required to expand
depth - Less than 256 words recommended per RAM
- Use external memory for 256 words or more
- Width easily expanded
- Connect the address lines to multiple blocks
- Recommendation Use less than 1/2 of max memory
resources - Maximum memory uses all logic resources of CLBs
98Memory Use
- Most synthesis tools can synthesize ROM from
behavioral HDL code, but RAMS must be
instantiated - Use library primitives
- and macros for
- standard size memory
- RAM/ROM16X1S to 32X8S
- Use S suffix for Synchronous RAM
- Use D suffix for Dual-Port RAM
- Use LogiBlox to generate arbitrary
- size memories
99How to Generate Memory
- Use LogiBlox utility to create arbitrary size RAM
or ROM - Select type ROM, Synchronous, Asynchronous, or
Dual Port RAM - Specify Depth number of words must be a multiple
of 16, ranging from 16 to 256 words - Specify Width word size ranges from 1 to 64 bits
- Specify initialization values with attribute
file - LogiBLOX also creates RAM interface
- Entity and component declaration - cut and paste
into the design (VHDL designs) - Module declaration (Verilog designs)
- Symbol Graphic (schematic entry designs)
100Memory Generator Dialog
Specify memory type, size, name and function in
the LogiBLOX GUI
Instance Name
LogiBLOX function
Memory Function
Data file for initialization
101Section III Advanced Hardware Design
TechniquesInput / Output Design
102XC4000X IOB Block Diagram
Shaded areas are not included in XC4000E family.
103How to specify IO blocks - Schematic
- User explicitly defines what resources in the IOB
are to be used - I/Os are defined with
- 1 pad primitive
- At least 1 function primitive
- Buffer, F/F ,or Latch
- 1 input element, 1 output element or both
- Inverters may also be pulled into IOBs
- IOBs are named by net between pad and function
primitives
104Primary and Secondary Global Buffers
- Eight global buffers per FPGA
- Four primary (BUFGP), Four secondary (BUFGS)
- Primary buffers must be driven by a
semi-dedicated IOB - Secondary buffers can be driven by a
semi-dedicated IOB or internal logic and have
more routing flexibility - Use BUFGS if extra 1-2ns of delay is acceptable
- Use generic BUFG primitive in your design
- Allows software to choose best type of buffer
- Allows easy migration across families
105 I/O Logic
- 4000E families have no boolean logic other than
inverters in the IOBs - XC4000EX adds optional output logic
- Can be used as a generic two-input function
generator or MUX - One input can be driven by IOB output clock
signal - Driving from FastCLK buffer provides less than 6
ns pin-to-pin delay - Requires library components beginning with O
106Use Pull-ups/Pull-downs to Prevent Floating
- Unused IOBs
- Outputs of unused IOBs are automatically disabled
- Pull-ups are automatically connected on unused
IOBs - Used IOBs
- A PULLUP or PULLDOWN primitive can be connected
to used IOBs - Inputs should not be left floating
- Add a pull-up to design inputs that may be left
floating to reduce power and noise
107Output Three-State Control
- Output enable may be inverted
- Use OBUFE macro for active-high enable
- Use OBUFT primitive for active-low enable
- Three-state control also via a
dedicated global net - Controlled by same
- STARTUP primitive
- All I/O disabled during configuration
108Fast Capture Latch
- Additional latch on input driven by outputs
clock signal - Allows capture of input by very fast clock
- Followed by standard I/O storage element for
synchonization to internal logic - Very fast setup (6.8 NS for 4000EX-3), 0 ns hold
- Available on 4000X, not 4000E family
- Example
- ILDFFDX macro includes Fast Capture Latch and
IFDX - Connect BUFGE to fast capture latch
- Opposite edge of same clock via BUFGLS drives
IFDX
109Decrease Hold time with NODELAY
- NODELAY attribute
- Removes delay element to the IFD or ILD
- Decreases setup time, add creates hold time
- Available on IFD/ILD macros in XC5200 and
XC4000E/X families
110Output MUX
- OMUX2
- Fast output signal (from output clock pin) MUXes
IOB output or clock enable pins to pad - Effectively doubles the number of device outputs
without requiring a larger, more expensive
package - Pin-to-pin delay is less than 6 ns
111Slew Rate Control
- Slew rate controls output speed
- Two slew rates
- Default slow slew rate reduces noise
- Use fast slew rate wherever speed is important
- FAST Slew rates are approximately 2x faster than
SLOW slew rates - Slew rate specification
- Instantiation in the user constraint file
- INST 1I87/obuf SLOW
- Synthesis vendor dependent
- Output drive varies by family
- 4KEX/XL families have 12 mA drive
112Choose TTL or CMOS Thresholds
- Threshold is selected during configuration
- Default is TTL
- Global selection on inputs or outputs
- Change to CMOS in Configuration Template
- 3V devices need TTL threshold when interfacing to
5V devices
113Section IV Advanced Software Design with
Xilinx M1-Based Software
114Section IV Agenda
- Design Entry Tips
- Library Types
- FPGA Express for VHDL Verilog
- M1-Based Software Flow
- Implementation Options
- Design Verification
- PLD Configuration Settings
- Design Constraints
115Section IV Advanced Software Design with
Xilinx M1-Based Software Design Entry Tips
116Design Entry Tip - Label Nets
- Label as many nets as possible
- Net names are passed to report files
- Eases debugging
- Names may change due to hierarchy or optimization
- An IOB is named by the net between the pad and
I/O function primitives - A CLB is named by the net on the output
- Flip-flops are always outputs
117Use Legal and Readable Names
- Allowable characters
- Alphanumeric A - Z, a - z, 0 - 9
- Underline _, Dash -
- Reserved characters
- Angle brackets for buses ltgt
- Slash / for hierarchy
- Dollar sign for reference designators
- Names must contain at least one non-digit
- Avoid using names that correspond to device
resources - CLB row/column locations AA, AB, etc.
- IOB pin locations P1, P2, etc.
118Component Naming Conventions
- Common component names, pin names and functions
for all families - Basic format is ltfunctiongtltwidthgtltcontrol_inputsgt
- CB4CLE Counter, Binary, 4 bits, Clear, Load,
Enable - FD16RE Flip-flops, D-type, 16 bits, Reset,
Enable - Control inputs are referenced by a single letter
- C asynchronous Clear, R synchronous Reset
- Listed in order of precedence
119Use Hierarchy in Design
- Adds structure to design
- Eases debug
- Users can build libraries of common functions
- Allows each design portion to be entered by most
efficient method - Facilitates incremental design and floorplanning
- Supports team design
120Notes
121Section IV Advanced Software Design with Xilinx
M1-Based Software Library Types
122Xilinx Libraries Overview
- Libraries contain descriptions of each component
with pin names, functionality, timing, etc. - There are two libraries
- The Unified Library contains ready made
components with non-variable function and size - The LogiBLOX Library contains templates which can
be customized for function and size - Both libraries allow easy design migration across
Xilinx devices and families
123LogiBLOX templates and GUI
- LogiBLOX is composed of two parts
- LogiBLOX Library containing templates of VARIABLE
SIZE - Templates are expanded or customized (Counters,
Adders, Registers, RAM, ROM) - Templates have many implementations (e.g. Binary,
Johnson, LFSR counters) - LogiBLOX GUI and Synthesizer to create
- A design file for implementation
- Symbol for schematic capture tool
- HDL code for instantiation in your design
- Functional simulation model
124Generic LogiBLOX Functions
- One generic model per function type(ex counter)
- Attributes can be specified - ex bus width, load, clock enable, etc.
- Arithmetic COUNTER,ADDER, SUBTRACTOR,
ACCUMULATOR - Storage SHIFT, DATA_REG, PROM, SRAM, DRAM Logic
ANDBUS, ORBUS, MUXBUS, DECODE, TRISTATE,
COMPARATOR - I/O INPUTS, OUTPUTS, BIDIR_IO
- DSP and other complex functions are also
available through CORE Generator
125LogiBLOX Module Selector
- Simple Combinatorial Logic
- Bus size from 2 to 32 bits
- Supports AND, Invert, NAND, NOR, OR, XNOR, XOR
- Any of the inputs or output can be inverted
independently - Use Decode or MASK function
- Three-State Drivers
- Bus size from 2 to 32 bits
- Optional pull-up resistors
- Constants
- Allows signals to be tied high or low
126How to use LogiBLOX in HDL code
- If a LogiBLOX function is inferred, there is
nothing more to do! - Check with the synthesis vendor. Most synthesis
tools infer simple LogiBlox components
automatically - Example Synthesis tools will infer an adder for
- X lt A B
- To instantiate a LogiBlox function, or if the
synthesis tool does not infer LogiBLOX
automatically - Use LogiBLOX GUI from command-line in
stand-alone mode lbgui -vendor - Creates a LogiBLOX module for simulation
- Creates an entity or module declaration
127Section IV Advanced Software Design with Xilinx
M1-Based Software FPGA Express for VHDL
Verilog Design
128Section Agenda
- Overview
- Design Flow
- Instantiation Guidelines
- Coding Style Guidelines
129Overview
- Xilinx leads in FPGAs - 55 market share
- Synopsys leads in VHDL/Verilog synthesis - 80
market share - One result of long term technology partnership is
FPGA Express - Xilinx is only silicon supplier with right to
distribute FPGA Express technology - Integration into Foundation Series
130Foundation Express 1.4 Features
- Express Technology
- Optimizes the design for Xilinx Architectures
- Optimized arithmetic functions
- Automatic Global Signal Mapping
- Automatic I/O Pad Mapping
- Resource Sharing
- Hierarchy Control
- Source Code Compatible With Synopsys Design
Compiler and FPGA Compiler - Verilog (IEEE 1364) and VHDL (IEEE 1076-1987)
Support - Easy, graphical constraint entry
- F1.4 is stand-alone
- F1.5 Sept / Oct 98
- Integrated into Foundation Project Manager
- Replaces Metamor
131Xilinx-Express Design Flow
132Express Input and Output
- Input files may be VHDL or Verilog format
- Mixed Verilog/VHDL modules are accepted
- Schematics may also be used, but should not be
input into Express - Schematic files in XNF or EDIF format will be
merged into the design in Xilinx Design Manager - Output netlists are in XNF format
- Timing Specifications may be specified in
Express - Timing Specifications are not used during
Synthesis - Timing Specifications can be included in the
output netlist
VHDL Verilog
Timing Requirements
Express
Reports
.XNF
133Analyze the Design (1)
- Analyze checks the HDL code for syntax errors
- Also creates internal files
- Files are automatically analyzed when selected
for a project - Do not select XNF or EDIF files
- Will be merged into the design by Design
Manager
134Analyze the Design (2)
- As the design blocks are analyzed, status is
displayed - In this example, all blocks were analyzed
successfully
No Errors or Warnings
Out of Date
Warnings
Errors
135Implement the Design
- Express Implementation maps the HDL code to
standard logic, creating a generic netlist. - At this stage, the design has not been optimized
- To implement a design, select only the top level
block, and then select the Implement icon
136Check for Errors and Warnings
- After implementation is complete, the chip symbol
plus status is displayed - View errors, warnings, and messages
- Right click inside window to save
- information to
- a text file
137Constraint Entry
- Constraints are NOT applied to Synthesis
- Constraints are written to the output netlist
(XNF) file for use by Design Manager (Xilinx
Implementation Tools) - Timing constraints control path delay
- Specify paths with timing groups, or groups of IO
or sequential elements - The INPUT Group includes all input ports at the
top level of the design - The OUTPUT Group includes all output ports at the
top level of the design - All flip-flops clocked by the same edge of a
common clock belong to a group - To define constraints select Synthesis -gt Edit
Constraints forms
138Define Clock Period
- Enter Period, Rise, and Fall Time
- Select Clock entry -gt Define
Synthesis -gt Edit Constraints -gt Clocks
Synthesis -gt Edit Constraints -gt Clocks -gt Define
139Define Global Synchronous Delays
- The clock period creates 3 types of global
constraints with the same default value - (1) All input ports to sequential Elements
- Setup of flip-flop or latch is included
- (2) Sequential Element to all output ports
- Flip-Flop Clock to Q delay is included
- (3) Sequential Element to Sequential Element
Synthesis -gt Edit Constraints -gt Paths form
140Define Individual Synchronous Delays
- Default delay from Clock specification is used in
the Paths form - Individual, or path specific delays can be
defined on the Ports form - Port delays over-write the global delays from the
Paths form - Input delay, shown here, arrives 20 ns before the
rising edge of the clock.
141Define Key Port Features
- Global Buffer defines the type of Clock
Distribution network - Use BUFG for most
applications(default) - Resistance specifies use of pullup or pulldown
resistor on unused pads - Reduces power consumption and noise
- Use IO Reg allows use of sequential elements
within IO Blocks to minimize Input or Output
delay (default) - Dependent on device type
- Pad Location is used to specify pin number of the
IO pad
Synthesis -gt Edit Constraints -gt Ports
142Control the Hierarchy
- Eliminate (default) or save hierarchical
boundaries - Flat designs yield best results because more
merging and sharing of boolean logic occurs - However, small blocks are easier to debug
- Easier to match source HDL code to synthesized
design - Synthesis goals (Speed or Area) and Effort level
can be defined for each module
143Optimize the Design
- Optimization minimizes the design for speed or
area - Select the implementation, and then select the
Optimize icon - After Optimization, check for errors and warnings
again
Main Window
144View Results
- Select File -gt Project Report to generate a
report - Report file contains
- Files and libraries used
- Settings for Synthesis
- Chip type and speed grade
- Estimated Timing
- Warning Circuit timing estimates tend to be
optimistic. Run timing analysis after routing
for most accurate timing analysis.
Report.txt file
145Verify Results (1)
- After Optimization, open Synthesis -gt Edit
Constraints to verify that correct constraints
were specified - Results are based on estimated routing delays
Synthesis -gt Edit Constraints -gt Paths (for an
optimized design)
146Verify Results (2)
- Review size of the design
- Resource use is displayed for each hierarchical
block - Resources used per hierarchical block
- Black Box instantiations cannot be analyzed by
Express
147Export Netlist
- Create the output netlist for use with the Xilinx
Design Manager (Xilinx Implementation Tools) - Output File format is XNF
- Select the optimized design, then select
Synthesis -gt Export Netlist to create the file - XNF file format is used
- Enable Export Timing Specifications to
include constraints in the output netlist
Synthesis -gt Export Netlist
148Simulation
- Not covered in this workshop
- Free VHDL / Verilog simulators
- See http//www.xilinx.com/xup/express/express1.htm
- Active VHDL Simulator, by Aldec (Most
Recommended) - VHDL Tools from RASSP
- Accolade Design Automation demo VHDL Simulator
- SimuCAD Silos III (Recommended for Verilog)
- Wellspring Verilog Simulator
- Model Technology Inc. (MTI) and major CAD vendors
sell other HDL simulators
149Instantiation Guidelines
150Instantiation and Hierarchy
- Hierarchy is created when one design is
instantiated into another design - All components in the Unified and LogiBLOX
Libraries may be instantiated - Unified library components are described in the
Libraries Guide - LogiBLOX components are described in the LogiBLOX
Reference/User Guide - Cells that must be instantiated with Express
Synthesis - RAM/ROM Readback OSC
- Bscan WOR WAND
- OAND(all IOB combinatorial logic)
151Black Box Instantiation
- What is a black box? Any element not analyzed by
Express. Examples - Existing Design Modules or Elements (XNF, EDIF,
.ngo) - LogiBLOX Components
- Pre Optimized Netlists (PCI Cores or LOGICOREs)
- Procedure for using a black box
- Create a place holder in the HDL code
- Synthesize the design without the XNF, EDIF, or
NGO files - The Xilinx Implementation Tools will resolve
(link in) all black box references - Limitations
- Express cannot check timing constraints through a
black box. - Express cannot include black box resources in
its reports. - GSR nets are not automatically inferred within
Black Boxes - Instantiate STARTUP and explicitly connect GSR
ports in HDL
152LogiBLOX CORE Generator Functions
- For HDL designs, LogiBLOX and CORE Gen generate
- Behavioral VHDL or Verilog model - for simulation
only - VHDL/Verilog Template - for component
instantiation - NGO file - for Xilinx implementation
- Most LogiBLOX functions can be inferred.
Exceptions include READBACK and RAM blocks. - Instantiation may provide better control of
design implementation
153How to Use LogiBLOX
1. Invoke LogiBLOX from Foundation 2. Select
Setup a. Specify VHDL or Verilog Template in the
Log