VLSI Arithmetic Adders

About This Presentation

Title:

VLSI Arithmetic Adders

Description:

VLSI Arithmetic Adders & Multipliers Prof. Vojin G. Oklobdzija University of California http://www.ece.ucdavis.edu/acsel Introduction Digital Computer Arithmetic ... – PowerPoint PPT presentation

Number of Views:269

Avg rating:3.0/5.0

Slides: 173

Provided by: Prof875

Learn more at: https://www.ece.ucdavis.edu

Category:

more less

Transcript and Presenter's Notes

Title: VLSI Arithmetic Adders

1
VLSI ArithmeticAdders Multipliers

Prof. Vojin G. Oklobdzija
University of California
http//www.ece.ucdavis.edu/acsel

2
Introduction

Digital Computer Arithmetic belongs to Computer
Architecture, however, it is also an aspect of
logic design.
The objective of Computer Arithmetic is to
develop appropriate algorithms that are utilizing
available hardware in the most efficient way.
Ultimately, speed, power and chip area are the
most often used measures, making a strong link
between the algorithms and technology of
implementation.

3
Basic Operations

Addition
Multiplication
Multiply-Add
Division
Evaluation of Functions
Multi-Media

4
Addition of Binary Numbers
5
Addition of Binary Numbers
Full Adder. The full adder is the fundamental
building block of most arithmetic circuits
The sum and carry outputs are described
as
ai
bi
Full Adder
Cin
Cout
si
6
Addition of Binary Numbers
Propagate
Generate
Propagate
Generate
7
Full-Adder Implementation

Full Adder operations is defined by equations

Carry-Propagate and Carry-Generate gi
One-bit adder could be implemented as shown
8
High-Speed Addition
One-bit adder could be implemented more
efficiently because MUX is faster
9
The Ripple-Carry Adder
10
The Ripple-Carry Adder
From Rabaey
11
Inversion Property
From Rabaey
12
Minimize Critical Path by Reducing Inverting
Stages
From Rabaey
13
Ripple Carry Adder

Carry-Chain of an RCA implemented using
multiplexer from the standard cell library

Critical Path
Oklobdzija, ISCAS88
14
Manchester Carry-Chain Realization of the Carry
Path

Simple and very popular scheme for implementation
of carry signal path

15
Original Design
T. Kilburn, D. B. G. Edwards, D. Aspinall,
"Parallel Addition in Digital Computers A New
Fast "Carry" Circuit", Proceedings of IEE, Vol.
106, pt. B, p. 464, September 1959.
16
Manchester Carry Chain (CMOS)

Implement P with pass-transistors
Implement G with pull-up, kill (delete) with
pull-down
Use dynamic logic to reduce the complexity and
speed up

Kilburn, et al, IEE Proc, 1959.
17
Pass-Transistor Realization in DPL
18
Carry-Skip Adder
MacSorley, Proc IRE 1/61 Lehman, Burla, IRE Trans
on Comp, 12/61
19
Carry-Skip Adder
Bypass
From Rabaey
20
Carry-Skip Adder N-bits, k-bits/group, rN/k
groups
21
Carry-Skip Adder
k
22
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
23
Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
24
Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
6
5
5
4
4
3
3
D9
1
1
Any-point-to-any-point delay 9 D as compared
to 12 D for CSKA
25
Carry-chain block size determination for a 32-bit
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
26
Delay Calculation for Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Delay model
27
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
Variable Group Length
Oklobdzija, Barnes, Arith85
28
Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Variable Block Lengths

No closed form solution for delay
It is a dynamic programming problem

29
Delay Comparison Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
30
Delay Comparison Variable Block Adder
VBA
CLA
VBA- Multi-Level
31
VLSI ArithmeticLecture 4

Prof. Vojin G. Oklobdzija
University of California
http//www.ece.ucdavis.edu/acsel

32
Review

Lecture 3

33
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
34
Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
35
Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
6
5
5
4
4
3
3
D9
1
1
Any-point-to-any-point delay 9 D as compared
to 12 D for CSKA
36
Carry-chain block size determination for a 32-bit
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
37
Delay Calculation for Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Delay model
38
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
Variable Group Length
Oklobdzija, Barnes, Arith85
39
Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Variable Block Lengths

No closed form solution for delay
It is a dynamic programming problem

40
Delay Comparison Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
41
Delay Comparison Variable Block Adder
Square Root Dependency
VBA
Log Dependency
CLA
VBA- Multi-Level
42
Circuit Issues

Adder speed can not be estimated based on
logic gates in the critical path
number of transistors in the path
logic levels in the path
Estimating Adders speed is much more complex and
many of the fast schemes may be misleading you.

43
Fan-Out Dependency
44
Fan-In Dependency
This looks like Logical Effort (1985)
45
Delay Comparison Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
46
(No Transcript)
47
Carry-Lookahead Adder(Weinberger and Smith, 1958)
ARITH-13 Presenting Achievement Award to Arnold
Weinberger of IBM (who invented CLA adder in 1958)
Ref A. Weinberger and J. L. Smith, A Logic for
High-Speed Addition, National Bureau of
Standards, Circ. 591, p.3-12, 1958.
48
CLA Definitions One-bit adder

49
CLA Definitions 4-bit Adder
50
Carry-Lookahead Adder 4-bits
Gj
Pj
51
Carry-Lookahead Adder
One gate delay D to calculate p, g
One D to calculate P and two for G
Three gate delays To calculate C4(j1)
Compare that to 8 D in RCA !
52
Carry-Lookahead Adder(Weinberger and Smith)

Additional two gate delays
C16 will take a total of 5D vs. 32D for RCA !
53
32-bit Carry Lookahead Adder
54
Carry-Lookahead Adder(Weinberger and Smith
original derivation, 1958 )
55
Carry-Lookahead Adder(Weinberger and Smith
original derivation )
56
Carry-Lookahead Adder (Weinberger and
Smith)please notice the similarity with
Parallel-Prefix Adders !
57
Carry-Lookahead Adder (Weinberger and
Smith)please notice the similarity with
Parallel-Prefix Adders !
58
Motorola CLA Implementation Example

A. Naini, D. Bearden and W. Anderson, A 4.5nS
96b CMOS Adder Design,
Proceedings of the IEEE Custom Integrated
Circuits Conference, May 3-6, 1992.

59
Critical path in Motorola's 64-bit CLA
4.8nS
1.05nS
1.7nS
3.75nS
2.7nS
2.0nS
2.35nS
60
Motorola's 64-bit CLAconventional PG Block
no better situation here !
carry ripples locally 5-transistors in the path
Basically, this is MCC performance with
Carry-Skip. One should not expect any better
results than VBA.
61
Motorola's 64-bit CLAModified PG Block
Intermediate propagate signals Pi0 are
generated to speed-up C3
still critical path resembles MCC
62
Motorola's 64-bit CLA
63
(No Transcript)
64
Delay Optimized CLA

B. Lee, V. G. Oklobdzija
Journal of VLSI Signal Processing, Vol.3, No.4,
October 1991

65
Delay Optimized CLA Lee-Oklobdzija 91
(a.) Fixed groups and levels (b.) variable-sized
groups, fixed levels (c.) variable-sized groups
and fixed levels (d.) variable-sized groups and
levels
66
Two-Levels of Logic Implementation of the Carry
Block
67
Two-Levels of Logic Implementation of the
Carry-Lookahead Block
68
Three-Levels of Logic Implementation of the Carry
Block (restricted fan-in)
69
Three-Levels of Logic Implementation of the Carry
Lookahead (restricted fan-in)
70
Delay Optimized CLA Lee-Oklobdzija 91
Delay Three-level BCLA
Delay Two-level BCLA
71
Delay Optimized CLA Lee-Oklobdzija 91
(a.) 2-level BCLA D8.5nS (b.) 3-level
BCLA D8.9nS
72
Lings Adder

Huey Ling, High-Speed Binary Adder
IBM Journal of Research and Development, Vol.5,
No.3, 1981.
Used in IBM 3033, IBM 168, Amdahl V6, HP etc.

73
Lings Derivations
define
gi implies Ci1 which implies Hi1 , thus gi gi
Hi1
ai bi pi gi ti
0 0 0 0 0
0 1 1 0 1
1 0 1 0 1
1 1 0 1 1
74
Lings Derivations
From
and
because
fundamental expansion
Now we need to derive Sum equation
75
Ling Adder
Lings equations
Variation of CLA
Ling, IBM J. Res. Dev, 5/81
76
Ling Adder
Lings equation
Variation of CLA
Ling uses different transfer function. Four of
those functions have desired properties (Lings
is one of them)
see Doran, IEEE Trans on Comp. Vol 37, No.9
Sept. 1988.
77
Ling Adder
Conventional
Fan-in of 5
Ling
Fan-in of 4
78
Advantages of Lings Adder

Uniform loading in fan-in and fan-out
H16 contains 8 terms as compared to G16 that
contains 15.
H16 can be implemented with one level of logic
(in ECL), while G16 can not.
(Lings adder takes full advantage of wired-OR,
of special importance when ECL technology is
used)

79
VLSI ArithmeticLecture 5

Prof. Vojin G. Oklobdzija
University of California
http//www.ece.ucdavis.edu/acsel

80
Review

Lecture 4

81
Lings Adder

Huey Ling, High-Speed Binary Adder
IBM Journal of Research and Development, Vol.5,
No.3, 1981.
Used in IBM 3033, IBM S370/168, Amdahl V6, HP
etc.

82
Lings Derivations
define
gi implies Ci1 which implies Hi1 , thus gi gi
Hi1
ai bi pi gi ti
0 0 0 0 0
0 1 1 0 1
1 0 1 0 1
1 1 0 1 1
83
Lings Derivations
From
and
because
fundamental expansion
Now we need to derive Sum equation
84
Ling Adder
Lings equations
Variation of CLA
Ling, IBM J. Res. Dev, 5/81
85
Ling Adder
Lings equation
Variation of CLA
Ling uses different transfer function. Four of
those functions have desired properties (Lings
is one of them)
see Doran, IEEE Trans on Comp. Vol 37, No.9
Sept. 1988.
86
Ling Adder
Conventional
Fan-in of 5
Ling
Fan-in of 4
87
Advantages of Lings Adder

Uniform loading in fan-in and fan-out
H16 contains 8 terms as compared to G16 that
contains 15.
H16 can be implemented with one level of logic
(in ECL), while G16 can not (with 8-way wire-OR).
(Lings adder takes full advantage of wired-OR,
of special importance when ECL technology is used
- his IBM limitation was fan-in of 4 and wire-OR
of 8)

88
Ling Weinberger Notes
89
Ling Weinberger Notes
90
Ling Weinberger Notes
91
Advantage of Lings Adder

32-bit adder used in IBM 3033, IBM S370/
Model168, Amdahl V6.
Implements 32-bit addition in 3 levels of logic
Implements 32-bit AGEN BIndexDisp in 4 levels
of logic (rather than 6)
5 levels of logic for 64-bit adder used in HP
processor

92
Implementation of Lings Adder in CMOS(S.
Naffziger, A Subnanosecond 64-b Adder, ISSCC
96)
93
S. Naffziger, ISSCC96
94
S. Naffziger, ISSCC96
95
S. Naffziger, ISSCC96
96
S. Naffziger, ISSCC96
97
S. Naffziger, ISSCC96
98
S. Naffziger, ISSCC96
99
S. Naffziger, ISSCC96
100
S. Naffziger, ISSCC96
101
S. Naffziger, ISSCC96
102
S. Naffziger, ISSCC96
103
S. Naffziger, ISSCC96
104
Ling Adder Critical Path
105
Ling Adder Circuits
106
LCS4 Critical G Path
107
LCS4 Logical Effort Delay
108
Results

0.5u Technology
Speed 0.930 nS
Nominal process, 80C, V3.3V

See S. Naffziger, A Subnanosecond 64-b Adder,
ISSCC 96
109
Prefix Addersand Parallel Prefix Adders
110
from Ercegovac-Lang
111
Prefix Adders
Following recurrence operation is defined
(g, p)o(g,p)(gpg, pp)
such that
(g0, p0)
i0
Gi, Pi
(gi, pi)o(Gi-1, Pi-1 )
1 i n
ci1 Gi
for i0, 1, .. n
(g-1, p-1)(cin,cin)
c1 g0 p0 cin
This operation is associative, but not commutative
It can also span a range of bits (overlapping and
adjacent)
112
from Ercegovac-Lang
113
Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
114
Pyramid AdderM. Lehman, A Comparative Study of
Propagation Speed-up Circuits in Binary
Arithmetic Units, IFIP Congress, Munich,
Germany, 1962.
115
Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
116
Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
117
Hybrid BK-KS Adder
118
Parallel Prefix Adders S. Knowles 1999
operation is associative hgtijk
operation is idempotent hgtijk
produces carry cin0
119
Parallel Prefix Adders Ladner-Fisher
Exploits associativity, but not idempotency.
Produces minimal logical depth
120
Parallel Prefix Adders Ladner-Fisher(16,8,4,2,1)
Two wires at each level. Uniform, fan-in of
two. Large fan-out (of 16 n/2) Large capacitive
loading combined with the long wires (in the last
stages)
121
Parallel Prefix Adders Kogge-Stone
Exploits idempotency to limit the fan-out to 1.
Dramatic increase in wires. The wire span
remains the same as in Ladner-Fisher. Buffers
needed in both cases K-S, L-F
122
Kogge-Stone Adder
123
Parallel Prefix Adders Brent-Kung

Set the fan-out to one
Avoids explosion of wires (as in K-S)
Makes no sense in CMOS
fan-out 1 limit is arbitrary and extreme
much of the capacitive load is due to wire
(anyway)
It is more efficient to insert buffers in L-F
than to use B-K scheme

124
Brent-Kung Adder
125
Parallel Prefix Adders Han-Carlson

Is a hybrid synthesis of L-F and K-S
Trades increase in logic depth for a reduction in
fan-out
effectively a higher-radix variant of K-S.
others do it similarly by serializing the prefix
computation at the higher fan-out nodes.
Others, similarly trade the logical depth for
reduction of fan-out and wire.

126
Parallel Prefix Adders variety of possibilities
from Knowles
bounded by L-F and K-S at ends
127
Parallel Prefix Adders variety of
possibilitiesKnowles 1999

Following rules are used
Lateral wires at the jth level span 2j bits
Lateral fan-out at jth level is power of 2 up to
2j
Lateral fan-out at the jth level cannot exceed
that a the (j1)th level.

128
Parallel Prefix Adders variety of
possibilitiesKnowles 1999

The number of minimal depth graphs of this type
is given in
at 4-bits there is only K-S and L-F, afterwards
there are several new possibilities.

129
Parallel Prefix Adders variety of possibilities
Knowles 1999

example of a new 32-bit adder 4,4,2,2,1

130
Parallel Prefix Adders variety of possibilities
Knowles 1999

Example of a new 32-bit adder 4,4,2,2,1

131
Parallel Prefix Adders variety of
possibilitiesKnowles 1999

Delay is given in terms of FO4 inverter delay
w.c.
(nominal case is 40-50 faster)
K-S is the fastest
K-S adders are wire limited (requiring 80 more
area)
The difference is less than 15 between examined
schemes

132
Parallel Prefix Adders variety of
possibilitiesKnowles 1999

Conclusion
Irregular, hybrid schmes are possible
The speed-up of 15 is achieved at the cost of
large wiring, hence area and power
Circuits close in speed to K-S are available at
significantly lower wiring cost

133
VLSI ArithmeticLecture 6

Prof. Vojin G. Oklobdzija
University of California
http//www.ece.ucdavis.edu/acsel

134
Review

Lecture 5

135
Prefix Addersand Parallel Prefix Adders
136
from Ercegovac-Lang
137
Prefix Adders
Following recurrence operation is defined
(g, p)o(g,p)(gpg, pp)
such that
(g0, p0)
i0
Gi, Pi
(gi, pi)o(Gi-1, Pi-1 )
1 i n
ci1 Gi
for i0, 1, .. n
(g-1, p-1)(cin,cin)
c1 g0 p0 cin
This operation is associative, but not commutative
It can also span a range of bits (overlapping and
adjacent)
138
Parallel Prefix Adders S. Knowles 1999
operation is associative hgtijk
operation is idempotent hgtijk
produces carry cin0
139
from Ercegovac-Lang
140
Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
141
Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
142
Parallel Prefix Adders variety of possibilities
from Ercegovac-Lang
143
Kogge-Stone Adder
144
Brent-Kung Adder
145
Hybrid BK-KS Adder
146
Pyramid AdderM. Lehman, A Comparative Study of
Propagation Speed-up Circuits in Binary
Arithmetic Units, IFIP Congress, Munich,
Germany, 1962.
147
Parallel Prefix Adders Ladner-Fisher
Exploits associativity, but not idempotency.
Produces minimal logical depth
148
Parallel Prefix Adders Ladner-Fisher(16,8,4,2,1)
Two wires at each level. Uniform, fan-in of
two. Large fan-out (of 16 n/2) Large capacitive
loading combined with the long wires (in the last
stages)
149
Parallel Prefix Adders Kogge-Stone
Exploits idempotency to limit the fan-out to 1.
Dramatic increase in wires. The wire span
remains the same as in Ladner-Fisher. Buffers
needed in both cases K-S, L-F
150
Parallel Prefix Adders Brent-Kung

Set the fan-out to one
Avoids explosion of wires (as in K-S)
Makes no sense in CMOS
fan-out 1 limit is arbitrary and extreme
much of the capacitive load is due to wire
(anyway)
It is more efficient to insert buffers in L-F
than to use B-K scheme

151
Two Parallel Prefix Adder Structures
Kogge-Stone
Han-Carlson

log(bits) carry stages
Extra Wiring

log(bits) 1 carry stages
Reduced Wiring and Gates

152
Parallel Prefix Adders Han-Carlson

Is a hybrid synthesis of L-F and K-S
Trades increase in logic depth for a reduction in
fan-out
effectively a higher-radix variant of K-S.
others do it similarly by serializing the prefix
computation at the higher fan-out nodes.
Others, similarly trade the logical depth for
reduction of fan-out and wire.

153
Parallel Prefix Adders variety of possibilities
from Knowles
bounded by L-F and K-S at ends
154
Parallel Prefix Adders variety of
possibilitiesKnowles 1999

Following rules are used
Lateral wires at the jth level span 2j bits
Lateral fan-out at jth level is power of 2 up to
2j
Lateral fan-out at the jth level cannot exceed
that a the (j1)th level.

155
Parallel Prefix Adders variety of
possibilitiesKnowles 1999

The number of minimal depth graphs of this type
is given in
at 4-bits there is only K-S and L-F, afterwards
there are several new possibilities.

156
Parallel Prefix Adders variety of possibilities
Knowles 1999

example of a new 32-bit adder 4,4,2,2,1

157
Parallel Prefix Adders variety of possibilities
Knowles 1999

Example of a new 32-bit adder 4,4,2,2,1

158
Parallel Prefix Adders variety of
possibilitiesKnowles 1999

Delay is given in terms of FO4 inverter delay
w.c.
(nominal case is 40-50 faster)
K-S is the fastest
K-S adders are wire limited (requiring 80 more
area)
The difference is less than 15 between examined
schemes

159
Parallel Prefix Adders variety of
possibilitiesKnowles 1999

Conclusion
Irregular, hybrid schmes are possible
The speed-up of 15 is achieved at the cost of
large wiring, hence area and power
Circuits close in speed to K-S are available at
significantly lower wiring cost

160
Possibilities for Further Research

The logical depth is important (Knowles was
right)
The fan-out is less important than fan-in
(Knowles was wrong)
It is possible to examine a variety of topologies
with restricted and varied fan-in.
Driving strength and Logical Effort rules were
overlooked and at least neglected
It is possible to create number of topologies
taking LE rules into account.
It is further possible to combine the rules with
compound domino implementation taking advantage
of two different rules governing dynamic and
static.
It is still possible to produce a better adder !

161
Other Types of Adders
162
Conditional Sum Adder

J. Sklansky, Conditional-Sum Addition Logic,
IRE Transactions on Electronic
Computers, EC-9, p.226-231, 1960.

163
Conditional Sum Adder
from Ercegovac-Lang
164
ConditionalSum Adder
165
Conditional Sum Adder
from Ercegovac-Lang
166
Conditional Sum Adder
from Ercegovac-Lang
167
Conditional Sum Adder
168
Carry-Select Adder