Title: CSE 246: Computer Arithmetic Algorithms and Hardware Design
1CSE 246 Computer Arithmetic Algorithms and
Hardware Design
Lecture 4 Adders
- Instructor
- Prof. Chung-Kuan Cheng
2Topics
- Adders
- AND/OR gate v.s. Circuit
- Logic Design
- Graph Design (Prefix Adder)
3Chapter 2 ADDERS
- Half Adders
- Half adders can add two 1-bit binary numbers when
there is no carry in. - If the inputs are xi and yi, the sum and
carry-out is given by the formula - si xi yi
- ci1 xi . yi
- We use the following notations throughout the
slides - . means logical AND
- means logical OR
- means logical XOR
- means complementation
4Full Adder
- The inputs are xi, yi (operand bits) and ci
(carry in) - The outputs are si (result bit) and ci1
(carry out) - Inputs and outputs are related by these relations
- si xi yi ci
- ci1 xi.yi ci.(xi yi)
- xi.yi ci.(xi yi)
5Full Adder
- If carry-in bit is zero, then full adder becomes
half adder - If carry-in bit is one, then
- si (xi yi)
- ci1 xi yi
- To add two n-bit numbers, we can chain n full
adders to build a ripple carry adder
6Ripple Carry Adder
x0 y0 cin/c0
x1 y1
xn-1 yn-1
cn-1
. . .
c1
c2
cout
s0
s1
sn-1
Overflow happen when operands are of same sign,
and the result is of different sign. If we use
2s complement to represent negative numbers,
overflow occurs when (cout cn-1) is 1
7Ripple Carry Adder
- For sake of brevity, we use the following
notations - gi xi.yi
- pi xi yi
- In terms of these notations, we can rewrite carry
equations as - c1 g0 p0.c0
- c2 g1 p1.c1
- and so on
- We shall use these notations afterwards while
discussing the design of other kind of adders - It has been observed that expected length of
carry chain is 2, while expected maximal length
of carry chain is lg n. Hence, ripple carry
adders are in general fast.
8Ripple Carry Adder
- How do know that an adder has completed the
operation? - Worst case scenario Wait for the longest chain
in the carry propagation network - We might inspect ci1 and its complement bi1
to determine the status of the adder
9Improvement to Ripple Carry Adder Manchester
Adders
- By intelligently using our device properties, we
can reduce the complexity of the circuit used to
compute carries in a ripple carry adder. - Define ai (xi).(yi)
- Next we observe that ci1 is 1 in exactly these
scenarios - gi is 1, i.e. both xi yi are 1
- ci is 1 and it is propagated because pi is 1
- ci1 is pulled down to logic 0 irrespective
of the value of ci, when ai is 1, i.e. both
xi and yi are 0 - From these conditions, and keeping in mind the
general characteristics of transistor devices we
can design simplified circuits for computing
carries as shown in the next slide
10Improvement to Ripple Carry Adder Manchester
Adders
11Implementation of Manchester Adder using MOS
transistors
This is essentially the same circuit for
computing carry, but implemented with MOS devices
12Manchester Adder Alternate design
- We divide the computation cycle into two distinct
half-cycle precharge and evaluate. In the
precharge half-cycle, gi and ci1 are
assigned a tentative value of logic 1. This is
evaluated in the next half-cycle with actual
value of ai. - The actual circuit for computing carries is shown
in the next slide.
13Manchester Adder Alternate design
evaluation
precharge
Q
Time ?
14Carry Look-ahead Adder
- In a ripple-carry adder m-full adders are grouped
together (m is usually equal to 4). Once the
carry-in to the group is known, all the internal
carries and the output carry is calculated
simultaneously. - We can use some algebraic manipulations to
minimize hardware complexity. - Consider the carry out of the group
- ci gi-1 pi-1.ci-1
- Putting the value of ci-1, we can rewrite as
- ci gi-1 pi-1.gi-2
pi-1.pi-2.ci-2 - Proceeding in this manner we get
- ci gi-1 pi-1.gi-2
pi-1.pi-2.gi-3 pi-1.pi-2.pi-3.gi-4
pi-1.pi-2.pi-3.pi-4.ci-4 - To further simplify the equation, we note that
gi-1 gi-1.pi-1, and pi-1 can be
factored out
15Lings Adder
- ci gi-1 pi-1.gi-2
pi-1.pi-2.gi-3 pi-1.pi-2.pi-3.gi-4
pi-1.pi-2.pi-3.pi-4.ci-4 - We replace pixiyi with tixiyi.
- Because gigiti, we have
- ci gi-1ti-1 ti-1gi-2
ti-1.ti-2.gi-3 ti-1.ti-2.ti-3.gi-4
ti-1.ti-2.ti-3.ti-4.ci-4 - Let
- hi gi-1 gi-2 ti-2.gi-3
ti-2.ti-3.gi-4 ti-2.ti-3.ti-4.ti-5
hi-4 - Ci hiti-1
16Lings Adder
- h0c0
- h3g2g1t1g0t1t0h0
- s3p3c3p3(h3t2)
- t3h3t2t3(h3t2)
- h3p3h3(p3t2)
- h6g5g4t4g3t4t3t2h3
- s6h6p6h6(p6t5)
17Generalized Design for Adders Prefix Adder
- Prefix computation
- Given n inputs x1, x2, x3xn and an associative
operator . We want to compute - yi xi xi-1 xi-2 x2 x1 for all i, 1 i
n - x can be a scalar/vector/matrix
- For design of adders, we define the operator in
the following manner - (g, p) (g, p) (g, p)
- g g p.g
- p p.p
18Alternate modeling of Prefix Computer Finite
State Machine
- A finite state machine has a set of states, and
it moves from one state to another according to
input. Mathematically, - sk f (sk-1, ak-1)
- The problem is to determine final state sn in
O(lg n) operations, given initial state s0 and
sequence of inputs (a0, a1, an-1) - This problem can be formulated in terms of prefix
computation
19Alternate modeling of Prefix Computer Finite
State Machine
- We assume that number of states are small and
finite. - Let sk fak-1(sk-1), fak-1 can be represented by
matrix Mak-1 - Now we are ready to represent our problem in
terms of prefix computation.
20Alternate Modeling of Prefix Computer Finite
State Machine
- The algorithm
- Compute Mai in parallel
- Compute
- N1 Ma1
- N2 Ma2.Ma1
-
- Nn Man.Man-1Ma1
- Compute Si1 Ni(S0)
21Prefix Computation
- FSM example
- Given
- initial state S0A
- A sequence of inputs (0 0 1 1 1 0 1 0 1)
- Derive the sequence of outputs
Compute Ns N1M0 N2M0 M0 N3M1 M0 M0 N4M1 M1
M0 M0
Input Sequence 0 0 1 1
State table
22Graph Based Approach
- Consider the (g p) chain
- break the long paths
g3
p3
g2
p2
C4
g1
p1
C1
23Graph Based Approach
g3
g2
p3
p2
g1
p1
C4
g3
p3
g2
p2
C1
g32
p32
24Graph Based Approach
g3
g2
p3
p2
g1
p1
C4
g1
p1
cin
cin
g10
p10
25Graph Based Approach
g32
p32
g10
g30
p10
p30
26Boolean Approach
- g4 p4 ( g3 p3 ( g2 p2 ( g1 p1 ( g0 p0
cin ) ) ) ) - g4 , p4 g3 , p3 g2 , p2 g1 , p1 g0
, p0 cin - g4p4g3 , p4p3 g2p2g1 , p2p1 g0
, p0cin - g4p4g3p4p3(g2p2g1) , p4p3p2p1 g0 ,
p0cin - g4p4g3p4p3(g2p2g1)(p4p3p2p1)g0 , (p4p3p2p1)
p0cin
27Prefix Adder
- Given
- n inputs (gi, pi)
- An operation o
- Compute
- yi (gi, pi) o o (g1, p1) ( 1 lt i lt n)
- Associativity
- (A o B) o C A o ( B o C)
a, i1 aibi , otherwise 1, i1 ai xor bi ,
otherwise
gi pi
- (g, p) o (g, p) (g, p)
- gg pg
- ppp
28Prefix Adder Graph Representation
ai bi
- Example
- Ripple Carry Adder
(gi , pi)
x y
xoy xoy
29Prefix Adders Conditional Sum Adder
8 7 6 5 4 3 2 1
30Prefix Adders Conditional Sum Adder
8 7 6 5 4 3 2 1
- alphabetical tree
- Binary tree
- Edges do not cross
- For output yi, there is an alphabetical tree
covering inputs (xi, xi-1, , x1)
31Prefix Adders Conditional Sum Adder
8 7 6 5 4 3 2 1
- The nodes in this tree can be reduced to
- (g, p) o c gpc
- From input x1, there is a tree covering all
outputs (yi, yi-1, , y1)
32Prefix Adders size and depth
- Objective
- Minimize of nodes, sc(n).
- Minimize depth, dc(n)
- Ripple Carry Adder
- sc(8) 7
- dc(8) 7
- total 14
- Conditional Sum Adder
- sc(8) 12
- dc(8) 3
- total 15
33Prefix Adder Well-known and Well-developed?
- Classic prefix networks Sklansky, Kogge-Stone,
Brent-Kung, Ladner-Fischer, Han-Carlson, Knowles
etc.
34Prefix Adders Brent Kung Adder
15 14 13 12 11 10 9 8 7 6 5 4 3
2 1 0
- sc(16) 26
- dc(16) 6
- total 32
35Prefix Adder New Respects, New Method
- Realistic design considerations Timing, Power
and Area. - Integer Linear Programming for prefix adder
- Logic effort timing model (gate cap. wire cap.)
- Activity-statistic power model
- Non-uniform signal arrival/required times
Logic Levels
Timing
Power
Area
Max Fanouts
Max Wire Tracks
36Prefix Adder Optimum Prefix adders
- Uniform signal arrival/required times
Sklansky Adder
Kogge-Stone Adder
Fastest depth-3 optimal prefix adder
Fastest depth-4 optimal prefix adder
37Prefix Adder Optimum Prefix adders
- Uniform signal arrival/required times
38The Big Picture
What is the minimum depth of zero-deficiency
circuits for a given width?
39Proof for Snirs Theorem
Given an arbitrary prefix graph of width n, we
have depth size 2n 2
- Proof
- Consider the alphabetical tree rooted at the MSB
output with all the input nodes being its leaves - The size of this tree is n-1 while its depth is
dM - At most dM prefix outputs can be generated from
this tree - At least one extra node is needed for the columns
where the prefix results are not ready.
Consequently - size (n-1)(n-(dM 1)) 2n -2 - dM
- which is
- size depth 2n - 2
40Definitions
- For a prefix circuit, define
- Backbone
- The binary alphabetical tree generating MSB
prefix output - Affiliated tree
- rooted at the LSB input, with all the prefix
outputs (except MSB output) as its tree nodes - Ridge
- the path from the LSB input to the MSB output.
41How to ?
- Look from the MSB output
- Since the circuit is of zero-deficiency, the
ridge has exactly d nodes (excluding the first
input node), one node per level. - The idea try to stretch the ridge as long as
possible while maintaining zero-deficiency
42T-tree
43T-tree example T3(5)
44A-tree
45A-tree example A3(5)
46Compound of A tree and T-tree
47Example
48Proposed Prefix Circuit
49An Example Z(d)d8
Width 88
50The width of Z(d) Circuit
- The width of Z(d) circuit is
- Nz(d) F(d3) 1 (d1)
- Where F(i) are the Fibonacci numbers
- Numerical Comparison
LYD Design by S. Lakshmivarahan, C.M. Yang
S.K. Dhall, 1987
LS Design by Lin Shish, 1999
51Comparison
- 64-bit case
- Based on logical effort method to include fan-out
effect and interconnect capacitance - Five adders
- Z64 A 64-bit Z(d) circuit derived from Z(d)d8
- BK Brent-Kung adder
- Sklansky
- KS Kogge-Stone adder
- HC Han-Carlson Adder
52Results
- w is the weight for lateral interconnect
capacitance KS and HC have large w value to
compensate for coupling effect - Z64 and BK adder have similar delay and area, but
Z64 could be more power efficient because it has
less logic levels
53Carry Skip Adder
a3,0 b3,0
a7,4 b7,4
a11,8 b11,8
cin
c4
c8
c12
A0
A1
A2
p3,0
p7,4
p11,8
x
0 1
0 1
0 1
c12
c4
c8
- If p3,0p3p2p1p0 1, then x cin
54Carry Propagation Paths
- A2 lt- MUX lt- MUX lt- cin
- A2 lt- MUX lt- A1
- A2 lt- MUX lt- MUX lt- A0
- c12 lt- MUX lt- A2
- c12 lt- MUX lt- MUX lt- A1
- c12 lt- MUX lt- MUX lt- MUX lt- A0
- c12 lt- MUX lt- MUX lt- MUX lt- MUX lt- cin
55False Path
- A1 lt- MUX lt- A0 lt- cin is a false path
- If carry is from cin, then block must have
p3p2p1p0 1 - Since p3,0 1, g3,0 must be 0
- The carry is not generated from A0
- The carry needs not to propagate via A0, it will
go from the MUX
56Label Algorithm
- Problem
- Given a digraph, a set of false paths
- Derive the longest path of the graph
- Algorithm
- Color the edges on each false path a label
- The length of the walk of the same labels are
accumulated - Otherwise, change to no label