CSE 246: Computer Arithmetic Algorithms and Hardware Design - PowerPoint PPT Presentation

About This Presentation

Title:

CSE 246: Computer Arithmetic Algorithms and Hardware Design

Description:

To add two n-bit numbers, we can chain n full adders to build a ripple carry adder ... Ripple Carry Adder. How do know that an adder has completed the operation? ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 57

Provided by: haiku

Learn more at: https://cseweb.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: CSE 246: Computer Arithmetic Algorithms and Hardware Design

1
CSE 246 Computer Arithmetic Algorithms and
Hardware Design
Lecture 4 Adders

Instructor
Prof. Chung-Kuan Cheng

2
Topics

Adders
AND/OR gate v.s. Circuit
Logic Design
Graph Design (Prefix Adder)

3
Chapter 2 ADDERS

Half Adders
Half adders can add two 1-bit binary numbers when
there is no carry in.
If the inputs are xi and yi, the sum and
carry-out is given by the formula
si xi yi
ci1 xi . yi
We use the following notations throughout the
slides
. means logical AND
means logical OR
means logical XOR
means complementation

4
Full Adder

The inputs are xi, yi (operand bits) and ci
(carry in)
The outputs are si (result bit) and ci1
(carry out)
Inputs and outputs are related by these relations
si xi yi ci
ci1 xi.yi ci.(xi yi)
xi.yi ci.(xi yi)

5
Full Adder

If carry-in bit is zero, then full adder becomes
half adder
If carry-in bit is one, then
si (xi yi)
ci1 xi yi
To add two n-bit numbers, we can chain n full
adders to build a ripple carry adder

6
Ripple Carry Adder
x0 y0 cin/c0
x1 y1
xn-1 yn-1
cn-1
. . .
c1
c2
cout
s0
s1
sn-1
Overflow happen when operands are of same sign,
and the result is of different sign. If we use
2s complement to represent negative numbers,
overflow occurs when (cout cn-1) is 1
7
Ripple Carry Adder

For sake of brevity, we use the following
notations
gi xi.yi
pi xi yi
In terms of these notations, we can rewrite carry
equations as
c1 g0 p0.c0
c2 g1 p1.c1
and so on
We shall use these notations afterwards while
discussing the design of other kind of adders
It has been observed that expected length of
carry chain is 2, while expected maximal length
of carry chain is lg n. Hence, ripple carry
adders are in general fast.

8
Ripple Carry Adder

How do know that an adder has completed the
operation?
Worst case scenario Wait for the longest chain
in the carry propagation network
We might inspect ci1 and its complement bi1
to determine the status of the adder

9
Improvement to Ripple Carry Adder Manchester
Adders

By intelligently using our device properties, we
can reduce the complexity of the circuit used to
compute carries in a ripple carry adder.
Define ai (xi).(yi)
Next we observe that ci1 is 1 in exactly these
scenarios
gi is 1, i.e. both xi yi are 1
ci is 1 and it is propagated because pi is 1
ci1 is pulled down to logic 0 irrespective
of the value of ci, when ai is 1, i.e. both
xi and yi are 0
From these conditions, and keeping in mind the
general characteristics of transistor devices we
can design simplified circuits for computing
carries as shown in the next slide

10
Improvement to Ripple Carry Adder Manchester
Adders
11
Implementation of Manchester Adder using MOS
transistors
This is essentially the same circuit for
computing carry, but implemented with MOS devices
12
Manchester Adder Alternate design

We divide the computation cycle into two distinct
half-cycle precharge and evaluate. In the
precharge half-cycle, gi and ci1 are
assigned a tentative value of logic 1. This is
evaluated in the next half-cycle with actual
value of ai.
The actual circuit for computing carries is shown
in the next slide.

13
Manchester Adder Alternate design
evaluation
precharge
Q
Time ?
14
Carry Look-ahead Adder

In a ripple-carry adder m-full adders are grouped
together (m is usually equal to 4). Once the
carry-in to the group is known, all the internal
carries and the output carry is calculated
simultaneously.
We can use some algebraic manipulations to
minimize hardware complexity.
Consider the carry out of the group
ci gi-1 pi-1.ci-1
Putting the value of ci-1, we can rewrite as
ci gi-1 pi-1.gi-2
pi-1.pi-2.ci-2
Proceeding in this manner we get
ci gi-1 pi-1.gi-2
pi-1.pi-2.gi-3 pi-1.pi-2.pi-3.gi-4
pi-1.pi-2.pi-3.pi-4.ci-4
To further simplify the equation, we note that
gi-1 gi-1.pi-1, and pi-1 can be
factored out

15
Lings Adder

ci gi-1 pi-1.gi-2
pi-1.pi-2.gi-3 pi-1.pi-2.pi-3.gi-4
pi-1.pi-2.pi-3.pi-4.ci-4
We replace pixiyi with tixiyi.
Because gigiti, we have
ci gi-1ti-1 ti-1gi-2
ti-1.ti-2.gi-3 ti-1.ti-2.ti-3.gi-4
ti-1.ti-2.ti-3.ti-4.ci-4
Let
hi gi-1 gi-2 ti-2.gi-3
ti-2.ti-3.gi-4 ti-2.ti-3.ti-4.ti-5
hi-4
Ci hiti-1

16
Lings Adder

h0c0
h3g2g1t1g0t1t0h0
s3p3c3p3(h3t2)
t3h3t2t3(h3t2)
h3p3h3(p3t2)
h6g5g4t4g3t4t3t2h3
s6h6p6h6(p6t5)

17
Generalized Design for Adders Prefix Adder

Prefix computation
Given n inputs x1, x2, x3xn and an associative
operator . We want to compute
yi xi xi-1 xi-2 x2 x1 for all i, 1 i
n
x can be a scalar/vector/matrix
For design of adders, we define the operator in
the following manner
(g, p) (g, p) (g, p)
g g p.g
p p.p

18
Alternate modeling of Prefix Computer Finite
State Machine

A finite state machine has a set of states, and
it moves from one state to another according to
input. Mathematically,
sk f (sk-1, ak-1)
The problem is to determine final state sn in
O(lg n) operations, given initial state s0 and
sequence of inputs (a0, a1, an-1)
This problem can be formulated in terms of prefix
computation

19
Alternate modeling of Prefix Computer Finite
State Machine

We assume that number of states are small and
finite.
Let sk fak-1(sk-1), fak-1 can be represented by
matrix Mak-1
Now we are ready to represent our problem in
terms of prefix computation.

20
Alternate Modeling of Prefix Computer Finite
State Machine

The algorithm
Compute Mai in parallel
Compute
N1 Ma1
N2 Ma2.Ma1
Nn Man.Man-1Ma1
Compute Si1 Ni(S0)

21
Prefix Computation

FSM example
Given
initial state S0A
A sequence of inputs (0 0 1 1 1 0 1 0 1)
Derive the sequence of outputs

Compute Ns N1M0 N2M0 M0 N3M1 M0 M0 N4M1 M1
M0 M0
Input Sequence 0 0 1 1
State table
22
Graph Based Approach

Consider the (g p) chain
break the long paths

g3
p3
g2
p2
C4
g1
p1
C1
23
Graph Based Approach

Generating g32 and p32

g3
g2
p3
p2
g1
p1
C4
g3
p3
g2
p2
C1
g32
p32
24
Graph Based Approach

Generating g10 and p10

g3
g2
p3
p2
g1
p1
C4
g1
p1
cin
cin
g10
p10
25
Graph Based Approach

Generating g30 and p30

g32
p32
g10
g30
p10
p30
26
Boolean Approach

g4 p4 ( g3 p3 ( g2 p2 ( g1 p1 ( g0 p0
cin ) ) ) )
g4 , p4 g3 , p3 g2 , p2 g1 , p1 g0
, p0 cin
g4p4g3 , p4p3 g2p2g1 , p2p1 g0
, p0cin
g4p4g3p4p3(g2p2g1) , p4p3p2p1 g0 ,
p0cin
g4p4g3p4p3(g2p2g1)(p4p3p2p1)g0 , (p4p3p2p1)
p0cin

27
Prefix Adder

Given
n inputs (gi, pi)
An operation o
Compute
yi (gi, pi) o o (g1, p1) ( 1 lt i lt n)

Associativity
(A o B) o C A o ( B o C)

a, i1 aibi , otherwise 1, i1 ai xor bi ,
otherwise
gi pi

(g, p) o (g, p) (g, p)
gg pg
ppp

28
Prefix Adder Graph Representation
ai bi

Example
Ripple Carry Adder

(gi , pi)
x y
xoy xoy
29
Prefix Adders Conditional Sum Adder
8 7 6 5 4 3 2 1
30
Prefix Adders Conditional Sum Adder
8 7 6 5 4 3 2 1

alphabetical tree
Binary tree
Edges do not cross

For output yi, there is an alphabetical tree
covering inputs (xi, xi-1, , x1)

31
Prefix Adders Conditional Sum Adder
8 7 6 5 4 3 2 1

The nodes in this tree can be reduced to
(g, p) o c gpc

From input x1, there is a tree covering all
outputs (yi, yi-1, , y1)

32
Prefix Adders size and depth

Objective
Minimize of nodes, sc(n).
Minimize depth, dc(n)
Ripple Carry Adder
sc(8) 7
dc(8) 7
total 14
Conditional Sum Adder
sc(8) 12
dc(8) 3
total 15

33
Prefix Adder Well-known and Well-developed?

Classic prefix networks Sklansky, Kogge-Stone,
Brent-Kung, Ladner-Fischer, Han-Carlson, Knowles
etc.

34
Prefix Adders Brent Kung Adder
15 14 13 12 11 10 9 8 7 6 5 4 3
2 1 0

sc(16) 26
dc(16) 6
total 32

35
Prefix Adder New Respects, New Method

Realistic design considerations Timing, Power
and Area.
Integer Linear Programming for prefix adder
Logic effort timing model (gate cap. wire cap.)
Activity-statistic power model
Non-uniform signal arrival/required times

Logic Levels
Timing
Power
Area
Max Fanouts
Max Wire Tracks
36
Prefix Adder Optimum Prefix adders

Uniform signal arrival/required times

Sklansky Adder
Kogge-Stone Adder
Fastest depth-3 optimal prefix adder
Fastest depth-4 optimal prefix adder
37
Prefix Adder Optimum Prefix adders

Uniform signal arrival/required times

38
The Big Picture
What is the minimum depth of zero-deficiency
circuits for a given width?
39
Proof for Snirs Theorem
Given an arbitrary prefix graph of width n, we
have depth size 2n 2

Proof
Consider the alphabetical tree rooted at the MSB
output with all the input nodes being its leaves
The size of this tree is n-1 while its depth is
dM
At most dM prefix outputs can be generated from
this tree
At least one extra node is needed for the columns
where the prefix results are not ready.
Consequently
size (n-1)(n-(dM 1)) 2n -2 - dM
which is
size depth 2n - 2

40
Definitions

For a prefix circuit, define
Backbone
The binary alphabetical tree generating MSB
prefix output
Affiliated tree
rooted at the LSB input, with all the prefix
outputs (except MSB output) as its tree nodes
Ridge
the path from the LSB input to the MSB output.

41
How to ?

Look from the MSB output
Since the circuit is of zero-deficiency, the
ridge has exactly d nodes (excluding the first
input node), one node per level.
The idea try to stretch the ridge as long as
possible while maintaining zero-deficiency

42
T-tree

Definition of Tk(k) tree

43
T-tree example T3(5)
44
A-tree

Definition of Ak(t) tree

45
A-tree example A3(5)
46
Compound of A tree and T-tree
47
Example
48
Proposed Prefix Circuit
49
An Example Z(d)d8
Width 88
50
The width of Z(d) Circuit

The width of Z(d) circuit is
Nz(d) F(d3) 1 (d1)
Where F(i) are the Fibonacci numbers
Numerical Comparison

LYD Design by S. Lakshmivarahan, C.M. Yang
S.K. Dhall, 1987
LS Design by Lin Shish, 1999
51
Comparison

64-bit case
Based on logical effort method to include fan-out
effect and interconnect capacitance
Five adders
Z64 A 64-bit Z(d) circuit derived from Z(d)d8
BK Brent-Kung adder
Sklansky
KS Kogge-Stone adder
HC Han-Carlson Adder

52
Results

w is the weight for lateral interconnect
capacitance KS and HC have large w value to
compensate for coupling effect
Z64 and BK adder have similar delay and area, but
Z64 could be more power efficient because it has
less logic levels