Title: VLSI Arithmetic Adders
1VLSI ArithmeticAdders Multipliers
- Prof. Vojin G. Oklobdzija
- University of California
- http//www.ece.ucdavis.edu/acsel
2Introduction
- Digital Computer Arithmetic belongs to Computer
Architecture, however, it is also an aspect of
logic design - The objective of Computer Arithmetic is to
develop appropriate algorithms that are utilizing
available hardware in the most efficient way. - Ultimately, speed, power and chip area are the
most often used measures, making a strong link
between the algorithms and technology of
implementation.
3Basic Operations
- Addition
- Multiplication
- Multiply-Add
- Division
- Evaluation of Functions
4Addition of Binary Numbers
Full Adder. The full adder is the fundamental
building block of most arithmetic circuits
The sum and carry outputs are described
as
ai
bi
Full Adder
Cin
Cout
si
5Addition of Binary Numbers
Propagate
Generate
Propagate
Generate
6Full-Adder Implementation
- Full Adder operations is defined by equations
Carry-Propagate and Carry-Generate gi
One-bit adder could be implemented as shown
7High-Speed Addition
One-bit adder could be implemented more
efficiently because MUX is faster
8The Ripple-Carry Adder
9The Ripple-Carry Adder
From Rabaey
10Inversion Property
From Rabaey
11Minimize Critical Path by Reducing Inverting
Stages
From Rabaey
12Manchester Carry-Chain Realization of the Carry
Path
- Simple and very popular scheme for implementation
of carry signal path
13Manchester Carry Chain
- Implement P with pass-transistors
- Implement G with pull-up, kill (delete) with
pull-down - Use dynamic logic to reduce the complexity and
speed up
Kilburn, et al, IEE Proc, 1959.
14Ripple Carry Adder
- Carry-Chain of an RCA implemented using
multiplexer from the standard cell library
Critical Path
Oklobdzija, ISCAS88
15Pass-Transistor Realization in DPL
16Carry-Skip Adder
MacSorley, Proc IRE 1/61 Lehman, Burla, IRE Trans
on Comp, 12/61
17Carry-Skip Adder
Bypass
From Rabaey
18Carry-Skip Adder N-bits, k-bits/group, rN/k
groups
19Carry-Skip Adder
k
20Variable Block Adder(Oklobdzija, Barnes IBM
1985)
21Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
22Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
6
5
5
4
4
3
3
D9
2
2
1
1
Any-point-to-any-point delay 9 D as compared
to 12 D for CSKA
23Carry-chain block size determination for a 32-bit
Variable Block Adder(Oklobdzija, Barnes IBM
1985)
24Delay Calculation for Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Delay model
25Variable Block Adder(Oklobdzija, Barnes IBM
1985)
Variable Group Length
Oklobdzija, Barnes, Arith85
26Carry-chain of a 32-bit Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
Variable Block Lengths
- No closed form solution for delay
- It is a dynamic programming problem
27Delay Comparison Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
28Delay Comparison Variable Block Adder
VBA
CLA
VBA- Multi-Level
29Fan-Out Dependency
30Fan-In Dependency
31Delay Comparison Variable Block
Adder(Oklobdzija, Barnes IBM 1985)
32(No Transcript)
33Carry-Lookahead Adder(Weinberger and Smith)
Weinberger and J. L. Smith, A Logic for
High-Speed Addition, National Bureau of
Standards, Circ. 591, p.3-12, 1958.
34Carry-Lookahead Adder(Weinberger and Smith)
35Carry-Lookahead Adder
One gate delay D to calculate p, g
One D to calculate P and two for G
Three gate delays To calculate C4(j1)
Compare that to 8 D in RCA !
36Carry-Lookahead Adder(Weinberger and Smith)
Additional two gate delays
C16 will take a total of 5D vs. 32D for RCA !
3732-bit Carry Lookahead Adder
38Carry-Lookahead Adder(Weinberger and Smith
original derivation )
39Carry-Lookahead Adder(Weinberger and Smith
original derivation )
40Carry-Lookahead Adder (Weinberger and
Smith)please notice the similarity with
Parallel-Prefix Adders !
41Carry-Lookahead Adder (Weinberger and
Smith)please notice the similarity with
Parallel-Prefix Adders !
42Delay Optimized CLA
- B. Lee, V. G. Oklobdzija
- Journal of VLSI Signal Processing, Vol.3, No.4,
October 1991
43Delay Optimized CLA Lee-Oklobdzija 91
(a.) Fixed groups and levels (b.) variable-sized
groups, fixed levels (c.) variable-sized groups
and fixed levels (d.) variable-sized groups and
levels
44Two-Levels of Logic Implementation of the Carry
Block
45Two-Levels of Logic Implementation of the
Carry-Lookahead Block
46Three-Levels of Logic Implementation of the Carry
Block (restricted fan-in)
47Three-Levels of Logic Implementation of the Carry
Lookahead (restricted fan-in)
48Delay Optimized CLA Lee-Oklobdzija 91
Delay Three-level BCLA
Delay Two-level BCLA
49Delay Optimized CLA Lee-Oklobdzija 91
(a.) 2-level BCLA D8.5nS (b.) 3-level
BCLA D8.9nS
50Motorola CLA Implementation Example
- A. Naini, D. Bearden and W. Anderson, A 4.5nS
96b CMOS Adder Design, - Proceedings of the IEEE Custom Integrated
Circuits Conference, May 3-6, 1992.
51Critical path in Motorola's 64-bit CLA
52Motorola's 64-bit CLAconventional PG Block
53Motorola's 64-bit CLAModified PG Block
Intermediate propagate signals Pi0 are
generated to speed-up C3
54Lings Adder
- Huey Ling, High-Speed Binary Adder
- IBM Journal of Research and Development, Vol.5,
No.3, 1981.
55Ling Adder
Lings equations
Variation of CLA
Ling, IBM J. Res. Dev, 5/81
56Ling Adder
Lings equation
Propagates informationon two bits
Doran, Trans on Comp 9/88
57Ling Adder
Conventional
Ling
58S. Naffziger, ISSCC96
59S. Naffziger, ISSCC96
60S. Naffziger, ISSCC96
61S. Naffziger, ISSCC96
62S. Naffziger, ISSCC96
63S. Naffziger, ISSCC96
64S. Naffziger, ISSCC96
65S. Naffziger, ISSCC96
66S. Naffziger, ISSCC96
67S. Naffziger, ISSCC96
68S. Naffziger, ISSCC96
69ResultsS. Naffziger, A Subnanosecond 64-b
Adder, ISSCC 96
- 0.5u Technology
- Speed 0.930 nS
- Nominal process, 80C, V3.3V
70ConditionalSum Adder
- J. Sklansky, Conditional-Sum Addition Logic,
IRE Transactions on Electronic - Computers, EC-9, p.226-231, 1960.
71ConditionalSum Adder
72ConditionalSum Adder
73Carry-Select Adder
- O. J. Bedrij, Carry-Select Adder, IRE
Transactions on Electronic Computers, June - 1962, p.340-34
74Carry-Select Adder
- Addition under assumption of Cin0 and Cin 1.
75Carry Select Addercombining two 32-b VBAs in
select mode
Delay DVBA32 DMUX
76Addition Under Non-equal Signal Arrival Profile
Assumption
- P. Stelling , V. G. Oklobdzija, "Design
Strategies for Optimal Hybrid Final Adders in a
Parallel Multiplier", special issue on VLSI
Arithmetic, Journal of VLSI Signal Processing,
Kluwer Academic Publishers, Vol.14, No.3,
December 1996
77Signal Arrival Profile form the Parallel
Multiplier Partial-Product Recuction Tree
78Oklobdzija, Villeger, IEEE Transactions on VLSI
Systems, June, 1995
79Oklobdzija and Villeger, IEEE Transactions on
VLSI Systems, June, 1995
80(No Transcript)
81(No Transcript)
82(No Transcript)
83(No Transcript)
84(No Transcript)
85(No Transcript)
86(No Transcript)
87(No Transcript)
88Performing Multiply-Add Operation in the Multiply
Time
- P. Stelling, V. G. Oklobdzija, " Achieving
Multiply-Accumulate Operation in the Multiply
Time", Thirteenth International Symposium on
Computer Arithmetic, Pacific Grove, California,
July 5 - 9, 1997.
89(No Transcript)
90Final Adder Implementation
91Final Adder Implementation
92Final Adder Implementation
93Final Adder Implementation
94Recurrence Solver Based Adders
- Koggie and Stone, IEEE Trans on Computers, August
1973 - Bilgory and Gajski, 18th DAC, 1981
- Brent and Kung, IEEE Trans on Computers, March
1982
95Recurrence Solver Based Adders
- 1973, Koggie and Stone published a general
recurrence scheme for parallel computation - 1979, Brent and Kung published Tech. Report on
regular layout for parallel adders - 1980, Guibas and Vuillemin, developed a layout
scheme based on recurrence equation for addition - 1980, Ladner and Fisher published parallel
prefix computation, Jo of ACM - 1981, Bilgory and Gajski published a paper on
recurrence structures for automatic cell
generation
96Recurrence Solver Based Adders
- They are based on recurrence equation for P,G
- (what is new there since Weinberger ?!!)
- Or and
97Recurrence Solver Based Adders
98Carry-Lookahead Adder (Weinberger and Smith)Just
to remind you !please notice the similarity with
Parallel-Prefix Adders !
99Multiplexer Based Adder
- Farooqui and Oklobdzija
- 1999 Intl Sym. on VLSI Technology, Taipei,
Taiwan, June 8-10, 1999
100Multiplexer Based Adder
- Based on the realization that MUX circuit is
faster than a logic gate due to its transmission
gate implementation - Based on Carry-Lookahead method (W-S), or
recurrence solver.
101Multiplexer Based AdderA. A. Farooqui, V. G.
Oklobdzija , F. Chechrazi, 1999 Intl Sym. on
VLSI Technology, Taipei, Taiwan, June 8-10, 1999.
102Multiplexer Based AdderA. A. Farooqui, V. G.
Oklobdzija , F. Chechrazi, 1999 Intl Sym. on
VLSI Technology, Taipei, Taiwan, June 8-10, 1999.
103Multiplexer Based AdderA. A. Farooqui, V. G.
Oklobdzija , F. Chechrazi, 1999 Intl Sym. on
VLSI Technology, Taipei, Taiwan, June 8-10, 1999.
104Multiplexer Based AdderA. A. Farooqui, V. G.
Oklobdzija , F. Chechrazi, 1999 Intl Sym. on
VLSI Technology, Taipei, Taiwan, June 8-10, 1999.
- Results in a very fast structure
- 7-MUX delays for a 64-b adder
- Delay using standard cell 0.25u, 2.5V, 25oC
Adder Size (bits) Delay (pS)
8 625
16 665
32 710
64 903
105DEC "Alpha" 21064 Adder
- Combination
- 8-bit tapered pre-discharged Manchester Carry
Chains, with Cin 0 and Cin 1 - 32-bit LSB Carry Lookahead Adder
- 32-bit MSB Conditional-Sum Adder
- Carry-Select on most significant 32-bits
- Latches in the middle pipelined addition
106DEC "Alpha" 21064 Adder
107DEC "Alpha" 21064 Adder Results
- The first 200MHz processor
- Built using 0.75u technology
- V3.3V, 30W
- Pipelined (two-latches) allowing 5nS throughput
and 10nS latency
108Conclusion
- VLSI Implementation of Addition
109Conclusion VLSI Implementation of Addition
- Currently, implementation parameters are not
reflected in algorithms used for development - Layout and wire delays effects are largely
neglected and this is becoming intolerable in the
next generation of technology - Transistor sizing has a large effect which can
outweight the algorithm - There is a great disconnect between algorithm and
implementation - New rules and measures of goodness are needed
110Multiplication
- Parallel Multiplier Implementation
111Multiplication
initially
for j0,....,n-1
p(n)XY after n steps
112Parallel Multipliers
11342 Compressor
114Re-designed 42 Compressor with 3 XOR Delay
115Three-Dimensional optimization Method
TDM(Oklobdzija, Villeger, Liu, 1996)
116Generation of the Partial Product Reduction Tree
in TDM multiplier
117Speed of Partial Product Reduction for Various
Schemes
118Booth Recoding Algorithm
xi2xi1xi Add to partial product
000 0Y
001 1Y
010 1Y
011 2Y
100 -2Y
101 -1Y
110 -1Y
111 -0Y
119Organization of Hitachi's DPL multiplier
120Hitachi's 42 compressor structure
121DPL multiplexer circuit
122Conclusion
- References
- E. Swartzlander, "Computer Arithmetic". Vol. 12,
IEEE Computer Society Press, 1990. - K. Hwang, "Computer Arithmetic Principles,
Architecture and Design", John Wiley and Sons,
1979. - M. Ercegovac, Digital Systems and
Hardware/Firmware Algorithms, Chapter 12
Arithmetic Algorithms and Processors, John Wiley
Sons, 1985. - A. Chandrakasan, W. Bowhill, F Fox, Editors,
"Design of High Performance Microprocessors
Circuits", IEEE Press, July 2000. - V. G. Oklobdzija, High-Performance System
Design Circuits and Logic, IEEE Press, July
1999. - Also http//www.ece.ucdavis.edu/acsel/Publicatio
ns.html