Title: Chapter Four Arithmetic and Logic Unit
1Chapter FourArithmetic and Logic Unit
Operation
a
32
ALU
Result
32
b
32
2Numbers
- Bits are just bits (no inherent meaning)
conventions define relationship between bits and
numbers - Binary numbers (base 2) 0000 0001 0010 0011 0100
0101 0110 0111 1000 1001... decimal 0...2n-1 - How do we represent negative numbers? i.e.,
which bit patterns will represent which numbers? - Sign Magnitude Two's Complement 000
0 000 0 001 1 001 1 010 2 010
2 011 3 011 3 100 -0 100 -4 101
-1 101 -3 110 -2 110 -2 111 -3 111
-1 - Which one is best? Why?
3MIPS
- 32 bit signed numbers0000 0000 0000 0000 0000
0000 0000 0000two 0ten0000 0000 0000 0000 0000
0000 0000 0001two 1ten0000 0000 0000 0000
0000 0000 0000 0010two 2ten...0111 1111
1111 1111 1111 1111 1111 1110two
2,147,483,646ten0111 1111 1111 1111 1111 1111
1111 1111two 2,147,483,647ten1000 0000 0000
0000 0000 0000 0000 0000two
2,147,483,648ten1000 0000 0000 0000 0000 0000
0000 0001two 2,147,483,647ten1000 0000 0000
0000 0000 0000 0000 0010two
2,147,483,646ten...1111 1111 1111 1111 1111
1111 1111 1101two 3ten1111 1111 1111 1111
1111 1111 1111 1110two 2ten1111 1111 1111
1111 1111 1111 1111 1111two 1ten - Converting n bit numbers into numbers with more
than n bits - MIPS 16 bit immediate gets converted to 32 bits
for arithmetic - copy the most significant bit (the sign bit) into
the other bits 0010 -gt 0000 0010
1010 -gt 1111 1010 - "sign extension" (lbu vs. lb)
4Overflow
2s Complement
Binary
Decimal
Decimal
0
0000
0000
0
1
0001
1111
-1
2
0010
1110
-2
3
0011
1101
-3
4
0100
1100
-4
5
0101
1011
-5
6
0110
1010
-6
7
0111
1001
-7
1000
-8
- Examples 7 3 10 but ...
- - 4 - 5 - 9 but ...
1
1
1
0
1
0
0
1
1
1
1
1
0
0
7
4
3
5
0
0
1
1
1
0
1
1
1
0
1
0
0
1
1
1
6
7
5Detecting Overflow
- Overflow (result too large for finite computer
word) - e.g., adding two n-bit numbers does not yield an
n-bit number - Note that overflow term is somewhat misleading,
it does not mean a carry overflowed - No overflow when adding a positive and a negative
number - No overflow when signs are the same for
subtraction - Overflow occurs when the value affects the sign
- overflow when adding two positives yields a
negative - or, adding two negatives gives a positive
- or, subtract a negative from a positive and get a
negative - or, subtract a positive from a negative and get a
positive - In MIPS add, addi, sub cause exception
(interrupt) on overflow - Details based on software system
- Don't always want to detect overflow
- new MIPS instructions addu, addiu, subu
6Building a 32 bit ALU
Let's look at a 1-bit ALU for addition How
could we build a 1-bit ALU for add, and, and or?
Carry In
Sum a ? b ? cin Cout a b (a ? b) cin Cout
a b a cin bcin
a
Sum
b
CarryOut
Operation
Carry In
a
0
Result
1
2
b
CarryOut
7What about subtraction (a b) ?
- Two's complement approach
- a - b a b 1
8Overflow Detection Logic
- Carry into MSB XOR Carry out of MSB
- For a N-bit ALU Overflow CarryInN - 1 ?
CarryOutN - 1
CarryIn0
a0
1-bit ALU
Result0
X
Y
X XOR Y
b0
0
0
0
CarryOut0
0
1
1
1
0
1
1
1
0
CarryIn2
a2
1-bit ALU
Result2
b2
CarryIn3
Overflow
a3
1-bit ALU
Result3
b3
CarryOut3
9Supporting slt
- Need to support the set-on-less-than instruction
(slt) - remember slt is an arithmetic instruction
- produces a 1 if rs lt rt and 0 otherwise
- use subtraction (a-b) lt 0 implies a lt b and
use sign bit
Overflow
10C
a
r
r
y
I
n
O
p
e
r
a
t
i
o
n
B
i
n
v
e
r
t
a
0
C
a
r
r
y
I
n
A
L
U
0
R
e
s
u
l
t
0
b
0
L
e
s
s
C
a
r
r
y
O
u
t
a
1
C
a
r
r
y
I
n
R
e
s
u
l
t
1
b
1
A
L
U
1
0
L
e
s
s
C
a
r
r
y
O
u
t
a
2
C
a
r
r
y
I
n
R
e
s
u
l
t
2
b
2
A
L
U
2
0
L
e
s
s
C
a
r
r
y
O
u
t
C
a
r
r
y
I
n
a
3
1
R
e
s
u
l
t
3
1
C
a
r
r
y
I
n
(sign)
S
e
t
b
3
1
A
L
U
3
1
0
O
v
e
r
f
l
o
w
L
e
s
s
11Test for equality
O
p
e
r
a
t
i
o
n
B
n
e
g
a
t
e
- Notice control lines000 and001 or010
add110 subtract111 slt
a
0
C
a
r
r
y
I
n
R
e
s
u
l
t
0
b
0
A
L
U
0
L
e
s
s
C
a
r
r
y
O
u
t
a
1
C
a
r
r
y
I
n
R
e
s
u
l
t
1
b
1
A
L
U
1
0
L
e
s
s
Z
e
r
o
C
a
r
r
y
O
u
t
a
2
C
a
r
r
y
I
n
R
e
s
u
l
t
2
b
2
A
L
U
2
0
L
e
s
s
C
a
r
r
y
O
u
t
R
e
s
u
l
t
3
1
a
3
1
C
a
r
r
y
I
n
S
e
t
b
3
1
A
L
U
3
1
0
O
v
e
r
f
l
o
w
L
e
s
s
12Conclusion
- We can build an ALU to support the MIPS
instruction set - key idea use multiplexor to select the output
we want - we can efficiently perform subtraction using
twos complement - we can replicate a 1-bit ALU to produce a 32-bit
ALU - Important points about hardware
- the speed of a gate is affected by the number of
inputs to the gate - the speed of a circuit is affected by the number
of gates in series (on the critical path or
the deepest level of logic) - Our primary focus comprehension, however,
- Clever changes to organization can improve
performance (similar to using better algorithms
in software) - well look at two examples for addition and
multiplication
13Problem ripple carry adder is slow
- Is a 32-bit ALU as fast as a 1-bit ALU?
- Is there more than one way to do addition?
- two extremes ripple carry and sum-of-products
- Can you see the ripple? How could you get rid of
it? - c1 b0c0 a0c0 a0b0
- c2 b1c1 a1c1 a1b1 c2
- c3 b2c2 a2c2 a2b2 c3
- c4 b3c3 a3c3 a3b3 c4
- Not feasible! Why?
14Carry-lookahead adder
- An approach in-between our two extremes
- c1 b0c0 a0c0 a0b0 (b0 a0)c0 a0b0
- If we didn't know the value of carry-in, what
could we do? - When would we always generate a carry? gi
ai bi - When would we propagate the carry?
pi ai bi - Did we get rid of the ripple?
- c1 g0 p0c0
- c2 g1 p1c1 c2
- c3 g2 p2c2 c3
- c4 g3 p3c3 c4
15Carry Look Ahead (Design trick peek)
cin
a0
S0
g
b0
p
c1 g0 p0 c0
g a b p a b
a1
S1
g
b1
p
c2 g1 p1 g0 p1p0c0
a2
S2
g
b2
p
c3 g2 p2 g1 p2 p1 g0 p2 p1p0 c0
a3
S3
G
g
b3
p
P
C4 . . .
16Plumbing as Carry Lookahead Analogy
17To build bigger adders
- Cant build a 16 bit adder this way .. (too big)
- Could use ripple carry of 4-bit CLA adders
- Better use the CLA principle again
5
b
1
5
18Cascaded Carry Look-ahead (16-bit)
C0
G0
P0
C1 G0 P0 C0
C2 G1 P1 G0 P1 P0 C0
C3 G2 P2 G1 P2 P1 G0 P2 P1 P0 C0
G
P
C4 . . .
192nd level Carry, Propagate as Plumbing
20Carry Lookahead Example
- Example Determine the gi, pi, Pi, and Gi values
of the following two 16 bit numbers. What is
Cout15 (C16)? - a 0001 1010 0011 0011
- b 1110 0101 1110 1011
- pi ai bi
- gi ai bi
- ci
- Repeat Using Pi and Gi
- P0 P1 P2 P3
- G0
- G1
- G2
- G3
- C4
21Speed of Ripple Carry Versus Carry Lookahead
- One simple way to model time for logic is to
assume each AND and OR gate takes the same time
for a signal to pass through it. Time is
estimated by simply counting the number of gates
along the longest path through a piece of
logic.Compare the number of gate delays for the
critical paths of two 16-bit adders, one using
ripple carry and one using two-level carry
lookahead.
22Other Design Tricks Guess
n-bit adder
n-bit adder
n-bit adder
n-bit adder
n-bit adder
0
1
Carry-select adder
Cout
23Multiplication
- Let's look at 3 versions based on grade school
algorithm 0010
(multiplicand) __x_1011 (multiplier) - 0010
- 0010
- 0000
- 0010
- 0010110
- Negative numbers convert and multiply
- there are better techniques (i.e. Booth
Algorithm), we wont look at them - m bits x n bits mn bit product
- Binary makes it easy
- 0 gt place 0 ( 0 x multiplicand)
- 1 gt place a copy ( 1 x multiplicand)
24Multiplication (version 1)
- 64-bit Multiplicand register, 64-bit ALU, 64-bit
Product register, 32-bit multiplier register
Shift Left
Multiplicand
64 bits
Multiplier
Shift Right
64-bit ALU
32 bits
Write
Product
Control
64 bits
Multiplier datapath control
25Multiplication Algorithm Version 1
Start
Multiplier0 1
Multiplier0 0
1a. Add multiplicand to product place
the result in Product register
- Product Multiplier Multiplicand 0000 0000
0011 0000 0010 - 0000 0010 0001 0000 0100
- 0000 0110 0000 0000 1000
- 0000 0110
2. Shift the Multiplicand register left 1 bit
3. Shift the Multiplier register right 1 bit
32nd repetition?
No lt 32 repetitions
Yes 32 repetitions
Done
26Observations on Multiplication Version 1
1 clock per cycle gt 100 clocks per
multiply Ratio of multiply to add 51 to
1001 1/2 bits in multiplicand always 0gt 64-bit
adder is wasted 0s inserted in left of
multiplicand as shiftedgt least significant bits
of product never changed once formed Instead of
shifting multiplicand to left, shift product to
right?
27Multiplication Version 2
- 32-bit Multiplicand register, 32-bit ALU, 64-bit
Product register, 32-bit Multiplier register
Multiplicand
32 bits
Multiplier
Shift Right
32-bit ALU
32 bits
Shift Right
Product
Control
Write
64 bits
28Multiplication Algorithm Version 2
Start
Multiplier0 1
Multiplier0 0
- Multiplier Multiplicand Product0011 0010 0000
0000
1a. Add multiplicand to the left half of product
place the result in the left half of
Product register
- Product Multiplier Multiplicand 0000 0000
0011 0010
2. Shift the Product register right 1 bit.
3. Shift the Multiplier register right 1 bit.
32nd repetition?
No lt 32 repetitions
Yes 32 repetitions
Done
29Multiplication Algorithm Version 2
Start
Multiplier0 1
Multiplier0 0
1a. Add multiplicand to the left half of product
place the result in the left half of
Product register
- Product Multiplier Multiplicand 0000 0000
0011 0010 - 0010 0000
- 0001 0000 0001 0010
- 0011 00 0001 0010
- 0001 1000 0000 0010
- 0000 1100 0000 0010
- 0000 0110 0000 0010
2. Shift the Product register right 1 bit
3. Shift the Multiplier register right 1 bit
32nd repetition?
No lt 32 repetitions
Yes 32 repetitions
Done
30Observations on Multiplication Version 2
Product register wastes space that exactly
matches size of multiplierCombine Multiplier
register and Product register
31Multiplication Version 3
- 32-bit Multiplicand register, 32 -bit ALU, 64-bit
Product register, (0-bit Multiplier register)
Multiplicand
32 bits
32-bit ALU
Shift Right
Product
(Multiplier)
Control
Write
64 bits
32Multiplication Algorithm Version 3
Start
Product0 1
Product0 0
1a. Add multiplicand to the left half of product
place the result in the left half of
Product register
- Multiplicand Product0010 0000 0011
2. Shift the Product register right 1 bit.
32nd repetition?
No lt 32 repetitions
Yes 32 repetitions
Done
33Observations on Final Version
- 2 steps per bit because Multiplier Product
combined - How can you make it faster?
- What about signed multiplication?
- Booths Algorithm
34Unsigned Combinational Multiplier
- Stage i accumulates A 2 i if Bi 1
- Q How much hardware for 32 bit multiplier?
Critical path?
35Floating Point (a brief look)
- We need a way to represent
- numbers with fractions, e.g., 3.1416
- very small numbers, e.g., .000000001
- very large numbers, e.g., 3.15576 ? 109
- Representation
- sign, exponent, significand (1)sign
???significand ???2exponent - more bits for significand gives more accuracy
- more bits for exponent increases range
- IEEE 754 floating point standard
- single precision sign bit 8 bit exponent
23 bit significand - double precision sign bit 11 bit exponent 52
bit significand
36IEEE 754 floating-point standard
- Leading 1 bit of significand is implicit
- Exponent is biased to make sorting easier
- All 0s is smallest exponent all 1s is largest
- Bias of 127 for single precision and 1023 for
double precision - Summary (1)sign ?????significand)
???2exponent bias - Example
- Decimal -.75 -3/4 -3/22
- Binary -.11 -1.1 x 2-1
- Floating point exponent 126 01111110
- Single precision sign bit 8 bit exponent
23 bit significand - IEEE single precision 1 01111110
10000000000000000000000
37Floating-Point Arithmetic
- Addition example three digit significand
- 9.999 ? 101 1.610 ? 10-1.
- Step 1 Align gt 9.999 ? 101 0. 01610 ?
101 - Step 2 Add Significand gt 9.999
-
0.016 -
10.015 - Step 3 Normalize gt 1.0015 ? 102
- Step 4 Round gt 1.002 ? 102
- Multiplication example 1.110 ? 1010 9.200 ?
10-5 - Step 1 Add exponents (10 127) (-5 127) -
127 (5 127) - Step 2 Multiply significand 1.110 ? 9.200
10.212000 - Step 3 Normalize 10.212000 ? 105 1.021 ?
106 - Step 4 Sign of product 1.021 ? 106
38Floating Point Complexities
- Operations are somewhat more complicated
- In addition to overflow we can have underflow
- Accuracy can be a big problem
- IEEE 754 keeps two extra bits, guard and round
- four rounding modes round up, round down,
truncate, nearest even - positive divided by zero yields infinity (see
page 300) - zero divide by zero yields not a number (see
page 300) - other complexities
39Chapter Four Summary
- Computer arithmetic is constrained by limited
precision - Bit patterns have no inherent meaning but
standards do exist - twos complement
- IEEE 754 floating point
- Computer instructions determine meaning of the
bit patterns - Performance and accuracy are important so there
are many complexities in real machines (i.e.,
algorithms and implementation).
40Barrel Shifter
Technology-dependent solutions transistor per
switch
41Motivation for Booths Algorithm
- Example 2 x 6 0010 x 0110
0010 x 0110 0000 shift (0 in
multiplier) - 0010 add (1 in multiplier)
- 0100 add (1 in multiplier)
- 0000 shift (0 in multiplier)
- 00001100
- ALU with add or subtract gets same result in more
than one way - 6 2 8 0110 00010 01000 11110
01000 - For example
- 0010
- x 0110
0000 shift (0 in multiplier) - 0010 sub (first 1 in multpl.)
- 0000 shift (mid string of 1s)
- 0010 add (prior step had last 1)
00001100
42Booths Algorithm
- Current Bit Bit to the Right Explanation Example O
p - 1 0 Begins run of 1s 0001111000 sub
- 1 1 Middle of run of 1s 0001111000 none
- 0 1 End of run of 1s 0001111000 add
- 0 0 Middle of run of 0s 0001111000 none
- Originally for Speed (when shift was faster than
add) - Replace a string of 1s in multiplier with an
initial subtract when we first see a one and then
later add for the bit after the last one
43Booths Example (2 x 7)
Operation Multiplicand Product next? 0. initial
value 0010 0000 0111 0 10 -gt sub
- 1a. P P - m 1110
1110 1110 0111 0 shift P (sign
ext) - 1b. 0010 1111 0011 1 11 -gt nop, shift
- 2. 0010 1111 1001 1 11 -gt nop, shift
- 3. 0010 1111 1100 1 01 -gt add
- 4a. 0010 0010
- 0001 1100 1 shift
- 4b. 0010 0000 1110 0 done
44Booths Example (2 x -3)
Operation Multiplicand Product next? 0. initial
value 0010 0000 1101 0 10 -gt sub
- 1a. P P - m 1110
1110 1110 1101 0 shift P (sign ext) - 1b. 0010 1111 0110 1 01 -gt add
0010 - 2a. 0001 0110 1 shift P
- 2b. 0010 0000 1011 0 10 -gt sub
1110 - 3a. 0010 1110 1011 0 shift
- 3b. 0010 1111 0101 1 11 -gt nop
- 4a 1111 0101 1 shift
- 4b. 0010 1111 1010 1 done
45How does it work?
0
0
0
0
0
0
0
B0
B1
B2
B3
P0
P1
P2
P3
P4
P5
P6
P7
- at each stage shift A left ( x 2)
- use next bit of B to determine whether to add in
shifted multiplicand - accumulate 2n bit partial product at each stage