Title: VLSI Arithmetic Adders
1VLSI ArithmeticAdders Multipliers
- Prof. Vojin G. Oklobdzija
- University of California
- http//www.ece.ucdavis.edu/acsel
2Introduction
- Digital Computer Arithmetic belongs to Computer
Architecture, however, it is also an aspect of
logic design. - The objective of Computer Arithmetic is to
develop appropriate algorithms that are utilizing
available hardware in the most efficient way. - Ultimately, speed, power and chip area are the
most often used measures, making a strong link
between the algorithms and technology of
implementation.
3Basic Operations
- Addition
- Multiplication
- Multiply-Add
- Division
- Evaluation of Functions
- Multi-Media
4Addition of Binary Numbers
5Addition of Binary Numbers
Full Adder. The full adder is the fundamental
building block of most arithmetic circuits
The sum and carry outputs are described
as
ai
bi
Full Adder
Cin
Cout
si
6Addition of Binary Numbers
Propagate
Generate
Propagate
Generate
7Full-Adder Implementation
- Full Adder operations is defined by equations
Carry-Propagate and Carry-Generate gi
One-bit adder could be implemented as shown
8High-Speed Addition
One-bit adder could be implemented more
efficiently because MUX is faster
9The Ripple-Carry Adder
10The Ripple-Carry Adder
From Rabaey
11Inversion Property
From Rabaey
12Minimize Critical Path by Reducing Inverting
Stages
From Rabaey
13Ripple Carry Adder
- Carry-Chain of an RCA implemented using
multiplexer from the standard cell library
Critical Path
Oklobdzija, ISCAS88
14Manchester Carry-Chain Realization of the Carry
Path
- Simple and very popular scheme for implementation
of carry signal path
15Original Design
T. Kilburn, D. B. G. Edwards, D. Aspinall,
"Parallel Addition in Digital Computers A New
Fast "Carry" Circuit", Proceedings of IEE, Vol.
106, pt. B, p. 464, September 1959.
16Manchester Carry Chain (CMOS)
- Implement P with pass-transistors
- Implement G with pull-up, kill (delete) with
pull-down - Use dynamic logic to reduce the complexity and
speed up
Kilburn, et al, IEE Proc, 1959.
17Pass-Transistor Realization in DPL
18Carry-Skip Adder
MacSorley, Proc IRE 1/61 Lehman, Burla, IRE Trans
on Comp, 12/61
19Carry-Skip Adder
Bypass
From Rabaey
20Carry-Skip Adder N-bits, k-bits/group, rN/k
groups
21Carry-Skip Adder
k
22Variable Block Adder(Oklobdzija, Barnes IBM
1985)
23Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
24Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
6
5
5
4
4
3
3
D9
1
1
Any-point-to-any-point delay 9 D as compared
to 12 D for CSKA
25Carry-chain block size determination for a 32-bit
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
26Delay Calculation for Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Delay model
27Variable Block Adder(Oklobdzija, Barnes IBM
1985)
Variable Group Length
Oklobdzija, Barnes, Arith85
28Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Variable Block Lengths
- No closed form solution for delay
- It is a dynamic programming problem
29Delay Comparison Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
30Delay Comparison Variable Block Adder
VBA
CLA
VBA- Multi-Level
31VLSI ArithmeticLecture 4
- Prof. Vojin G. Oklobdzija
- University of California
- http//www.ece.ucdavis.edu/acsel
32Review
33Variable Block Adder(Oklobdzija, Barnes IBM
1985)
34Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
35Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
6
5
5
4
4
3
3
D9
1
1
Any-point-to-any-point delay 9 D as compared
to 12 D for CSKA
36Carry-chain block size determination for a 32-bit
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
37Delay Calculation for Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Delay model
38Variable Block Adder(Oklobdzija, Barnes IBM
1985)
Variable Group Length
Oklobdzija, Barnes, Arith85
39Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Variable Block Lengths
- No closed form solution for delay
- It is a dynamic programming problem
40Delay Comparison Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
41Delay Comparison Variable Block Adder
Square Root Dependency
VBA
Log Dependency
CLA
VBA- Multi-Level
42Circuit Issues
- Adder speed can not be estimated based on
- logic gates in the critical path
- number of transistors in the path
- logic levels in the path
- Estimating Adders speed is much more complex and
many of the fast schemes may be misleading you.
43Fan-Out Dependency
44Fan-In Dependency
This looks like Logical Effort (1985)
45Delay Comparison Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
46(No Transcript)
47Carry-Lookahead Adder(Weinberger and Smith, 1958)
ARITH-13 Presenting Achievement Award to Arnold
Weinberger of IBM (who invented CLA adder in 1958)
Ref A. Weinberger and J. L. Smith, A Logic for
High-Speed Addition, National Bureau of
Standards, Circ. 591, p.3-12, 1958.
48CLA Definitions One-bit adder
49CLA Definitions 4-bit Adder
50Carry-Lookahead Adder 4-bits
Gj
Pj
51Carry-Lookahead Adder
One gate delay D to calculate p, g
One D to calculate P and two for G
Three gate delays To calculate C4(j1)
Compare that to 8 D in RCA !
52Carry-Lookahead Adder(Weinberger and Smith)
Additional two gate delays
C16 will take a total of 5D vs. 32D for RCA !
5332-bit Carry Lookahead Adder
54Carry-Lookahead Adder(Weinberger and Smith
original derivation, 1958 )
55Carry-Lookahead Adder(Weinberger and Smith
original derivation )
56Carry-Lookahead Adder (Weinberger and
Smith)please notice the similarity with
Parallel-Prefix Adders !
57Carry-Lookahead Adder (Weinberger and
Smith)please notice the similarity with
Parallel-Prefix Adders !
58Motorola CLA Implementation Example
- A. Naini, D. Bearden and W. Anderson, A 4.5nS
96b CMOS Adder Design, - Proceedings of the IEEE Custom Integrated
Circuits Conference, May 3-6, 1992.
59Critical path in Motorola's 64-bit CLA
4.8nS
1.05nS
1.7nS
3.75nS
2.7nS
2.0nS
2.35nS
60Motorola's 64-bit CLAconventional PG Block
no better situation here !
carry ripples locally 5-transistors in the path
Basically, this is MCC performance with
Carry-Skip. One should not expect any better
results than VBA.
61Motorola's 64-bit CLAModified PG Block
Intermediate propagate signals Pi0 are
generated to speed-up C3
still critical path resembles MCC
62Motorola's 64-bit CLA
63(No Transcript)
64Delay Optimized CLA
- B. Lee, V. G. Oklobdzija
- Journal of VLSI Signal Processing, Vol.3, No.4,
October 1991
65Delay Optimized CLA Lee-Oklobdzija 91
(a.) Fixed groups and levels (b.) variable-sized
groups, fixed levels (c.) variable-sized groups
and fixed levels (d.) variable-sized groups and
levels
66Two-Levels of Logic Implementation of the Carry
Block
67Two-Levels of Logic Implementation of the
Carry-Lookahead Block
68Three-Levels of Logic Implementation of the Carry
Block (restricted fan-in)
69Three-Levels of Logic Implementation of the Carry
Lookahead (restricted fan-in)
70Delay Optimized CLA Lee-Oklobdzija 91
Delay Three-level BCLA
Delay Two-level BCLA
71Delay Optimized CLA Lee-Oklobdzija 91
(a.) 2-level BCLA D8.5nS (b.) 3-level
BCLA D8.9nS
72Lings Adder
- Huey Ling, High-Speed Binary Adder
- IBM Journal of Research and Development, Vol.5,
No.3, 1981. - Used in IBM 3033, IBM 168, Amdahl V6, HP etc.
73Lings Derivations
define
gi implies Ci1 which implies Hi1 , thus gi gi
Hi1
ai bi pi gi ti
0 0 0 0 0
0 1 1 0 1
1 0 1 0 1
1 1 0 1 1
74Lings Derivations
From
and
because
fundamental expansion
Now we need to derive Sum equation
75Ling Adder
Lings equations
Variation of CLA
Ling, IBM J. Res. Dev, 5/81
76Ling Adder
Lings equation
Variation of CLA
Ling uses different transfer function. Four of
those functions have desired properties (Lings
is one of them)
see Doran, IEEE Trans on Comp. Vol 37, No.9
Sept. 1988.
77Ling Adder
Conventional
Fan-in of 5
Ling
Fan-in of 4
78Advantages of Lings Adder
- Uniform loading in fan-in and fan-out
- H16 contains 8 terms as compared to G16 that
contains 15. - H16 can be implemented with one level of logic
(in ECL), while G16 can not. - (Lings adder takes full advantage of wired-OR,
of special importance when ECL technology is
used)
79VLSI ArithmeticLecture 5
- Prof. Vojin G. Oklobdzija
- University of California
- http//www.ece.ucdavis.edu/acsel
80Review
81Lings Adder
- Huey Ling, High-Speed Binary Adder
- IBM Journal of Research and Development, Vol.5,
No.3, 1981. - Used in IBM 3033, IBM S370/168, Amdahl V6, HP
etc.
82Lings Derivations
define
gi implies Ci1 which implies Hi1 , thus gi gi
Hi1
ai bi pi gi ti
0 0 0 0 0
0 1 1 0 1
1 0 1 0 1
1 1 0 1 1
83Lings Derivations
From
and
because
fundamental expansion
Now we need to derive Sum equation
84Ling Adder
Lings equations
Variation of CLA
Ling, IBM J. Res. Dev, 5/81
85Ling Adder
Lings equation
Variation of CLA
Ling uses different transfer function. Four of
those functions have desired properties (Lings
is one of them)
see Doran, IEEE Trans on Comp. Vol 37, No.9
Sept. 1988.
86Ling Adder
Conventional
Fan-in of 5
Ling
Fan-in of 4
87Advantages of Lings Adder
- Uniform loading in fan-in and fan-out
- H16 contains 8 terms as compared to G16 that
contains 15. - H16 can be implemented with one level of logic
(in ECL), while G16 can not (with 8-way wire-OR). - (Lings adder takes full advantage of wired-OR,
of special importance when ECL technology is used
- his IBM limitation was fan-in of 4 and wire-OR
of 8)
88Ling Weinberger Notes
89Ling Weinberger Notes
90Ling Weinberger Notes
91Advantage of Lings Adder
- 32-bit adder used in IBM 3033, IBM S370/
Model168, Amdahl V6. - Implements 32-bit addition in 3 levels of logic
- Implements 32-bit AGEN BIndexDisp in 4 levels
of logic (rather than 6) - 5 levels of logic for 64-bit adder used in HP
processor
92Implementation of Lings Adder in CMOS(S.
Naffziger, A Subnanosecond 64-b Adder, ISSCC
96)
93S. Naffziger, ISSCC96
94S. Naffziger, ISSCC96
95S. Naffziger, ISSCC96
96S. Naffziger, ISSCC96
97S. Naffziger, ISSCC96
98S. Naffziger, ISSCC96
99S. Naffziger, ISSCC96
100S. Naffziger, ISSCC96
101S. Naffziger, ISSCC96
102S. Naffziger, ISSCC96
103S. Naffziger, ISSCC96
104Ling Adder Critical Path
105Ling Adder Circuits
106LCS4 Critical G Path
107LCS4 Logical Effort Delay
108Results
- 0.5u Technology
- Speed 0.930 nS
- Nominal process, 80C, V3.3V
See S. Naffziger, A Subnanosecond 64-b Adder,
ISSCC 96
109Prefix Addersand Parallel Prefix Adders
110from Ercegovac-Lang
111Prefix Adders
Following recurrence operation is defined
(g, p)o(g,p)(gpg, pp)
such that
(g0, p0)
i0
Gi, Pi
(gi, pi)o(Gi-1, Pi-1 )
1 i n
ci1 Gi
for i0, 1, .. n
(g-1, p-1)(cin,cin)
c1 g0 p0 cin
This operation is associative, but not commutative
It can also span a range of bits (overlapping and
adjacent)
112from Ercegovac-Lang
113Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
114Pyramid AdderM. Lehman, A Comparative Study of
Propagation Speed-up Circuits in Binary
Arithmetic Units, IFIP Congress, Munich,
Germany, 1962.
115Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
116Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
117Hybrid BK-KS Adder
118Parallel Prefix Adders S. Knowles 1999
operation is associative hgtijk
operation is idempotent hgtijk
produces carry cin0
119Parallel Prefix Adders Ladner-Fisher
Exploits associativity, but not idempotency.
Produces minimal logical depth
120Parallel Prefix Adders Ladner-Fisher(16,8,4,2,1)
Two wires at each level. Uniform, fan-in of
two. Large fan-out (of 16 n/2) Large capacitive
loading combined with the long wires (in the last
stages)
121Parallel Prefix Adders Kogge-Stone
Exploits idempotency to limit the fan-out to 1.
Dramatic increase in wires. The wire span
remains the same as in Ladner-Fisher. Buffers
needed in both cases K-S, L-F
122Kogge-Stone Adder
123Parallel Prefix Adders Brent-Kung
- Set the fan-out to one
- Avoids explosion of wires (as in K-S)
- Makes no sense in CMOS
- fan-out 1 limit is arbitrary and extreme
- much of the capacitive load is due to wire
(anyway) - It is more efficient to insert buffers in L-F
than to use B-K scheme
124Brent-Kung Adder
125Parallel Prefix Adders Han-Carlson
- Is a hybrid synthesis of L-F and K-S
- Trades increase in logic depth for a reduction in
fan-out - effectively a higher-radix variant of K-S.
- others do it similarly by serializing the prefix
computation at the higher fan-out nodes. - Others, similarly trade the logical depth for
reduction of fan-out and wire.
126Parallel Prefix Adders variety of possibilities
from Knowles
bounded by L-F and K-S at ends
127Parallel Prefix Adders variety of
possibilitiesKnowles 1999
- Following rules are used
- Lateral wires at the jth level span 2j bits
- Lateral fan-out at jth level is power of 2 up to
2j - Lateral fan-out at the jth level cannot exceed
that a the (j1)th level.
128Parallel Prefix Adders variety of
possibilitiesKnowles 1999
- The number of minimal depth graphs of this type
is given in - at 4-bits there is only K-S and L-F, afterwards
there are several new possibilities.
129Parallel Prefix Adders variety of possibilities
Knowles 1999
- example of a new 32-bit adder 4,4,2,2,1
130Parallel Prefix Adders variety of possibilities
Knowles 1999
- Example of a new 32-bit adder 4,4,2,2,1
131Parallel Prefix Adders variety of
possibilitiesKnowles 1999
- Delay is given in terms of FO4 inverter delay
w.c. - (nominal case is 40-50 faster)
- K-S is the fastest
- K-S adders are wire limited (requiring 80 more
area) - The difference is less than 15 between examined
schemes
132Parallel Prefix Adders variety of
possibilitiesKnowles 1999
- Conclusion
- Irregular, hybrid schmes are possible
- The speed-up of 15 is achieved at the cost of
large wiring, hence area and power - Circuits close in speed to K-S are available at
significantly lower wiring cost
133VLSI ArithmeticLecture 6
- Prof. Vojin G. Oklobdzija
- University of California
- http//www.ece.ucdavis.edu/acsel
134Review
135Prefix Addersand Parallel Prefix Adders
136from Ercegovac-Lang
137Prefix Adders
Following recurrence operation is defined
(g, p)o(g,p)(gpg, pp)
such that
(g0, p0)
i0
Gi, Pi
(gi, pi)o(Gi-1, Pi-1 )
1 i n
ci1 Gi
for i0, 1, .. n
(g-1, p-1)(cin,cin)
c1 g0 p0 cin
This operation is associative, but not commutative
It can also span a range of bits (overlapping and
adjacent)
138Parallel Prefix Adders S. Knowles 1999
operation is associative hgtijk
operation is idempotent hgtijk
produces carry cin0
139from Ercegovac-Lang
140Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
141Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
142Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
143Kogge-Stone Adder
144Brent-Kung Adder
145Hybrid BK-KS Adder
146Pyramid AdderM. Lehman, A Comparative Study of
Propagation Speed-up Circuits in Binary
Arithmetic Units, IFIP Congress, Munich,
Germany, 1962.
147Parallel Prefix Adders Ladner-Fisher
Exploits associativity, but not idempotency.
Produces minimal logical depth
148Parallel Prefix Adders Ladner-Fisher(16,8,4,2,1)
Two wires at each level. Uniform, fan-in of
two. Large fan-out (of 16 n/2) Large capacitive
loading combined with the long wires (in the last
stages)
149Parallel Prefix Adders Kogge-Stone
Exploits idempotency to limit the fan-out to 1.
Dramatic increase in wires. The wire span
remains the same as in Ladner-Fisher. Buffers
needed in both cases K-S, L-F
150Parallel Prefix Adders Brent-Kung
- Set the fan-out to one
- Avoids explosion of wires (as in K-S)
- Makes no sense in CMOS
- fan-out 1 limit is arbitrary and extreme
- much of the capacitive load is due to wire
(anyway) - It is more efficient to insert buffers in L-F
than to use B-K scheme
151Two Parallel Prefix Adder Structures
Kogge-Stone
Han-Carlson
- log(bits) carry stages
- Extra Wiring
- log(bits) 1 carry stages
- Reduced Wiring and Gates
152Parallel Prefix Adders Han-Carlson
- Is a hybrid synthesis of L-F and K-S
- Trades increase in logic depth for a reduction in
fan-out - effectively a higher-radix variant of K-S.
- others do it similarly by serializing the prefix
computation at the higher fan-out nodes. - Others, similarly trade the logical depth for
reduction of fan-out and wire.
153Parallel Prefix Adders variety of possibilities
from Knowles
bounded by L-F and K-S at ends
154Parallel Prefix Adders variety of
possibilitiesKnowles 1999
- Following rules are used
- Lateral wires at the jth level span 2j bits
- Lateral fan-out at jth level is power of 2 up to
2j - Lateral fan-out at the jth level cannot exceed
that a the (j1)th level.
155Parallel Prefix Adders variety of
possibilitiesKnowles 1999
- The number of minimal depth graphs of this type
is given in - at 4-bits there is only K-S and L-F, afterwards
there are several new possibilities.
156Parallel Prefix Adders variety of possibilities
Knowles 1999
- example of a new 32-bit adder 4,4,2,2,1
157Parallel Prefix Adders variety of possibilities
Knowles 1999
- Example of a new 32-bit adder 4,4,2,2,1
158Parallel Prefix Adders variety of
possibilitiesKnowles 1999
- Delay is given in terms of FO4 inverter delay
w.c. - (nominal case is 40-50 faster)
- K-S is the fastest
- K-S adders are wire limited (requiring 80 more
area) - The difference is less than 15 between examined
schemes
159Parallel Prefix Adders variety of
possibilitiesKnowles 1999
- Conclusion
- Irregular, hybrid schmes are possible
- The speed-up of 15 is achieved at the cost of
large wiring, hence area and power - Circuits close in speed to K-S are available at
significantly lower wiring cost
160Possibilities for Further Research
- The logical depth is important (Knowles was
right) - The fan-out is less important than fan-in
(Knowles was wrong) - It is possible to examine a variety of topologies
with restricted and varied fan-in. - Driving strength and Logical Effort rules were
overlooked and at least neglected - It is possible to create number of topologies
taking LE rules into account. - It is further possible to combine the rules with
compound domino implementation taking advantage
of two different rules governing dynamic and
static. - It is still possible to produce a better adder !
161Other Types of Adders
162Conditional Sum Adder
- J. Sklansky, Conditional-Sum Addition Logic,
IRE Transactions on Electronic - Computers, EC-9, p.226-231, 1960.
163Conditional Sum Adder
from Ercegovac-Lang
164ConditionalSum Adder
165Conditional Sum Adder
from Ercegovac-Lang
166Conditional Sum Adder
from Ercegovac-Lang
167Conditional Sum Adder
168Carry-Select Adder
- O. J. Bedrij, Carry-Select Adder, IRE
Transactions on Electronic Computers, June - 1962, p.340-34
169Carry-Select Sum Adder
from Ercegovac-Lang
170Carry-Select Adder
- Addition under assumption of Cin0 and Cin 1.
171Carry Select Addercombining two 32-b VBAs in
select mode
Delay DVBA32 DMUX
172Carry-Select Adder
O.J. Bedrij, IBM Poughkeepsie, 1962