Title: EEL 5722 FPGA Design Fall 2003 Logic Cell Architectures
1EEL 5722FPGA DesignFall 2003Logic Cell
Architectures
2General Structure of FPLDs
A typical FPLD consists of a number of logic
cells that are arranged as a matrix and used for
implementation of logic functions. Interconnect
resources connect logic cell outputs and inputs.
3General Structure of FPLDs
An FPLD logic cell can be as simple as a
transistor or as complex as a microprocessor.
Typically it is capable of implementing
combinational and sequential logic of different
complexity. Current commercial FPLDs employ
logic cells that are based on one or more of the
following Transistor pairs Basic small gates
such as two-input NAND and XORs Multiplexers Loo
k-up tables (LUTs) Wide-fanin AND-OR structures
4Logic Cell Granularity
Logic cells can have Fine granularity The
cell consists of a few transistors and can
implement only simple gates. Coarse
granularity The cell consists of a large number
of transistors sufficient to implement large
functions.
5Transistor Pairs
The FPGA from Crosspoint Solutions uses a single
transistor pair in the logic cell.
Since the transistors are connected together in
rows, gates can be isolated by turning off the
pairs of transistors between gates.
6Transistor Pairs
7The Plessey Cell
Plessey offers an FPGA in which the basic cell
consists of a two-input NAND gate. If the latch
is not needed, the configuration is set to make
the latch transparent.
8The Plessey Cell
For example, f ab c can be implemented in
two input NAND gates as shown below.
9Other Fine-Grain Cells
Algotronix uses a cell in which a two-input
function block can perform any function of two
inputs. This is implemented using a configurable
set of multiplexers. Concurrent Logic uses a
cell containing a two-input AND gate and a
two-input XOR gate. Toshiba offers an FPGA where
the cell contain two-input NAND gates.
10Fine-Grain Cells
The main advantage of using fine-grain cells is
that the useable cells are fully utilized. It is
easier to use small logic cells efficiently. The
main disadvantage is that they require a
relatively large number of wire segments and
programmable switches. Such routing resources can
be costly in area and delay.
11Mux-Based Cells
Actels Act-1 FPGA family uses a cell containing
three 2-to-1 multiplexers, one OR gate, 8 inputs,
and one output.
The cell can implement all combinational
functions of two inputs, all functions of three
inputs with at least one positive input, many
functions of four inputs, and some ranging up to
eight inputs.
In total, 702 logic functions can be realized by
the Act-1 cell.
12Mux-Based Cells
This block implements the function f (s3
s4)(s1w s1x) (s3 s4)(s2y s2z)
13Mux-Based Cells
For example, the function f ab c can be
implemented as follows
14Mux-Based Cells
f (s3 s4)(s1w s1x) (s3 s4)(s2y
s2z) (c 0)(0 . 1 0 . 1) (c 0)(b
. 0 b . a) c(1 0) c(0 ba) c cab
(c c)(c ab) 1(c ab) ab c
15Mux Cells
The logic cell from QuickLogic FPGAs is
similar to the Actel logic cell in that it uses
a 4-to-1 multiplexer.
16Look-Up Table (LUT) Cells
Xilinx cells are based on the use of SRAM as a
look-up table. The truth table for a K-input
logic function is stored in a 2K x 1 SRAM.
The address lines of the SRAM function as inputs
and the output (data) line of the SRAM provides
the value of the logic function.
17XC2000 CLB
18XC3000 CLB
19XC4000 CLB
20Altera Flex 10KE
21LUTCells
22Logic Block Granularity and FPGA Density
For example, the function f abd bcd abc
can be implemented with LUTs as follows
3-LUT
2-LUT
4-LUT
23Logic Block Granularity and Configuration Bits
Since each K-LUT requires 2K configuration bits,
The 2-LUT implementation requires 22 x 7 28
bits The 3-LUT needs 23 x 3 24 bits The
4-LUT needs just 24 x 1 16 bits. Using
configuration bits as area measure (area cost),
the 4-LUT implementation achieves minimum logic
area.
24Logic Block Granularity and Number of Logic Blocks
25Logic Block Granularity and Routing Area Per Block
26Logic Block Granularity and Average Normalized
Area
By multiplying the block area curve and the
routing area per block curve, we obtain the
average normalized area.
27Logic Block Granularity and Delay
For example, the function f abd abc acd
can be implemented using only two-input NAND
gates as follows
The longest path requires 4 logic
levels. Assuming a 1.2? CMOS process, a 2-input
NAND gate has a delay of 0.7ns. The critical
path has a delay of 4 x 0.7 2.8ns
28Logic Block Granularity and Delay
The same function f abd abc acd can be
implemented using also 3-LUTs as follows
The longest path requires 2 logic
levels. Assuming a 1.2? CMOS process, a 3-LUT
has a delay of 1.4ns. The critical path has a
delay of 2 x 1.4 2.8ns
29Logic Block Granularity and Logic Levels
30Logic Block Granularity and Net Delay
Average fanout increases Number of switches
loading each wire increases Wires increase in
length
31Logic Block Granularity and Critical Path Delay
Ts RC
32Logic Block Granularity and Switch Delay
Ts RC
33Random Logic in FPGAs
Consider the mapping of an instance of random
logic on 3-input and 5-input LUTs.
LUT 1
LUT 2
LUT 4
LUT 3
LUT 1
LUT 2
By increasing the LUT size (increasing the number
of inputs), the number of LUTs used can be
reduced. LUTs are highly suitable for the
realization of random logic.
34Arithmetic Operations in FPGAs
LUT 4
However, consider the mapping of a 2-bit carry
adder on 3-input LUTs instead.
LUT 3
Increasing the number of inputs in the LUTs will
not reduce the number of LUTs needed to realize
the adder. Without a corresponding change in
the number of outputs, the added circuitry caused
by the increase in LUT size is useless.
LUT 2
LUT 1