Title: Chapter 3 Arithmetic for Computers
1Chapter 3Arithmetic for Computers
2Taxonomy of Computer Information
Information
Instructions
Addresses
Data
Numeric
Non-numeric
Fixed-point
Floating-point
Character (ASCII)
Boolean
Other
Single- precision
Double- precision
Signed
Unsigned (ordinal)
Sign-magnitude
2s complement
3Number Format Considerations
- Type of numbers (integer, fraction, real,
complex) - Range of values
- between smallest and largest values
- wider in floating-point formats
- Precision of values (max. accuracy)
- usually related to number of bits allocated
- n bits gt represent 2n values/levels
- Value/weight of least-significant bit
- Cost of hardware to store process numbers (some
formats more difficult to add, multiply/divide,
etc.)
4Unsigned Integers
- Positional number system
- an-1an-2 . a2a1a0
- an-1x2n-1 an-2x2n-2 a2x22 a1x21
a0x20 - Range (2n -1) 0
- Carry out of MSB has weight 2n
- Fixed-point fraction
- 0.a-1a-2 . a-n a-1x2-1 a-2x2-2
a-nx2-n
5Signed Integers
- Sign-magnitude format (n-bit values)
- A San-2 . a2a1a0 (S sign bit)
- Range -(2n-1-1) (2n-1-1)
- Addition/subtraction difficult (multiply easy)
- Redundant representations of 0
- 2s complement format (n-bit values)
- -A represented by 2n A
- Range -(2n-1) (2n-1-1)
- Addition/subtraction easier (multiply harder)
- Single representation of 0
6Computing the 2s Complement
- To compute the 2s complement of A
- Let A an-1an-2 . a2a1a0
- 2n - A 2n -1 1 A (2n -1) - A 1
- 1 1 1 1 1
- - an-1 an-2 . a2 a1 a0 1
- -----------------------------------
-- - an-1an-2 . a2a1a0 1 (ones
complement 1)
72s Complement Arithmetic
- Let (2n-1-1) A 0 and (2n-1-1) B 0
- Case 1 A B
- (2n-2) (A B) 0
- Since result lt 2n, there is no carry out of the
MSB - Valid result if (A B) lt 2n-1
- MSB (sign bit) 0
- Overflow if (A B) 2n-1
- MSB (sign bit) 1 if result 2n-1
- Carry into MSB
82s Complement Arithmetic
- Case 2 A - B
- Compute by adding A (-B)
- 2s complement A (2n - B)
- -2n-1 lt result lt 2n-1 (no overflow possible)
- If A B 2n (A - B) 2n
- Weight of adder carry output 2n
- Discard carry (2n), keeping (A-B), which is 0
- If A lt B 2n (A - B) lt 2n
- Adder carry output 0
- Result is 2n - (B - A)
- 2s complement representation of -(B-A)
92s Complement Arithmetic
- Case 3 -A - B
- Compute by adding (-A) (-B)
- 2s complement (2n - A) (2n - B) 2n 2n -
(A B) - Discard carry (2n), making result 2n - (A B)
- 2s complement representation of -(A
B) - 0 result -2n
- Overflow if -(A B) lt -2n-1
- MSB (sign bit) 0 if 2n - (A B) lt 2n-1
- no carry into MSB
10Relational Operators
- Compute A-B test ALU flags to compare A vs. B
- ZF result zero OF 2s complement
overflow - SF sign bit of result CF adder carry
output
Signed Unsigned
A B ZF 1 ZF 1
A ltgt B ZF 0 ZF 0
A B (SF ? OF) 0 CF 1 (no borrow)
A gt B (SF ? OF) ZF 0 CF ? ZF 1
A B (SF ? OF) ZF 1 CF ? ZF 0
A lt B (SF ? OF) 1 CF 0 (borrow)
11MIPS Overflow Detection
- An exception (interrupt) occurs when overflow
detected for add,addi,sub - Control jumps to predefined address for exception
- Interrupted address is saved for possible
resumption - Details based on software system / language
- example flight control vs. homework assignment
- Don't always want to detect overflow new MIPS
instructions addu, addiu, subu note addiu
still sign-extends! note sltu, sltiu for
unsigned comparisons
12Designing the Arithmetic Logic Unit (ALU)
- Provide arithmetic and logical functions as
needed by the instruction set - Consider tradeoffs of area vs. performance
- (Material from Appendix B)
13Different Implementations
- Not easy to decide the best way to build
something - Don't want too many inputs to a single gate (fan
in) - Dont want to have to go through too many gates
(delay) - For our purposes, ease of comprehension is
important - Let's look at a 1-bit ALU for addition
-
- How could we build a 1-bit ALU for add, and, and
or? - How could we build a 32-bit ALU?
cout a b a cin b cin sum a xor b xor cin
14Building a 32 bit ALU
15What about subtraction (a b) ?
- Two's complement approach just negate b and add.
16Adding a NOR function
- Can also choose to invert a. How do we get a
NOR b ?
O
p
e
r
a
t
i
o
n
A
i
n
v
e
r
t
B
i
n
v
e
r
t
C
a
r
r
y
I
n
a
0
0
1
1
R
e
s
u
l
t
b
0
2
1
C
a
r
r
y
O
u
t
17Tailoring the ALU to the MIPS
- Need to support the set-on-less-than instruction
(slt) - remember slt is an arithmetic instruction
- produces a 1 if rs lt rt and 0 otherwise
- use subtraction (a-b) lt 0 implies a lt b
- Need to support test for equality (beq t5, t6,
t7) - use subtraction (a-b) 0 implies a b
18Supporting slt
A
i
n
v
e
r
t
0
1
S
e
t
O
v
e
r
f
l
o
w
O
v
e
r
f
l
o
w
d
e
t
e
c
t
i
o
n
all other bits
Use this ALU for most significant bit
19Supporting slt
B
i
n
v
e
r
t
A
i
n
v
e
r
t
0
R
e
s
u
l
t
2
0
.
.
.
.
.
.
.
.
.
C
a
r
r
y
I
n
R
e
s
u
l
t
3
1
S
e
t
b
3
1
s
0
O
v
e
r
f
l
o
w
20Test for equality
- Notice control lines0000 and0001 or0010
add0110 subtract0111 slt1100 NOR
- Note zero is a 1 when the result is zero!
21Conclusion
- We can build an ALU to support the MIPS
instruction set - key idea use multiplexor to select the output
we want - we can efficiently perform subtraction using
twos complement - we can replicate a 1-bit ALU to produce a 32-bit
ALU - Important points about hardware
- all of the gates are always working
- the speed of a gate is affected by the number of
inputs to the gate - the speed of a circuit is affected by the number
of gates in series (on the critical path or
the deepest level of logic) - Our primary focus comprehension, however,
- Clever changes to organization can improve
performance (similar to using better algorithms
in software) - We saw this in multiplication, lets look at
addition now
22Problem ripple carry adder is slow
- Is a 32-bit ALU as fast as a 1-bit ALU?
- Is there more than one way to do addition?
- two extremes ripple carry and sum-of-products
- Can you see the ripple? How could you get rid of
it? - c1 b0c0 a0c0 a0b0
- c2 b1c1 a1c1 a1b1 c2
- c3 b2c2 a2c2 a2b2 c3
- c4 b3c3 a3c3 a3b3 c4 Not feasible!
Why?
23One-bit Full-Adder Circuit
ci
FAi
XOR
sumi
ai
XOR
AND
bi
AND
OR
Ci1
2432-bit Ripple-Carry Adder
c0 a0 b0
sum0
FA0
sum1
FA1
a1 b1
sum2
FA2
a2 b2
sum31
FA31
a31 b31
25How Fast is Ripple-Carry Adder?
- Longest delay path (critical path) runs from cin
to sum31. - Suppose delay of full-adder is 100ps.
- Critical path delay 3,200ps
- Clock rate cannot be higher than 1012/3,200
312MHz. - Must use more efficient ways to handle carry.
26Fast Adders
- In general, any output of a 32-bit adder can be
evaluated as a logic expression in terms of all
65 inputs. - Levels of logic in the circuit can be reduced to
log2N for N-bit adder. Ripple-carry has N levels. - More gates are needed, about log2N times that of
ripple-carry design. - Fastest design is known as carry lookahead adder.
27N-bit Adder Design Options
Type of adder Time complexity (delay) Space complexity (size)
Ripple-carry O(N) O(N)
Carry-lookahead O(log2N) O(N log2N)
Carry-skip O(vN) O(N)
Carry-select O(vN) O(N)
Reference J. L. Hennessy and D. A. Patterson,
Computer Architecture A Quantitative Approach,
Second Edition, San Francisco, California, 1990.
28Carry-lookahead adder
- An approach in-between our two extremes
- Motivation
- If we didn't know the value of carry-in, what
could we do? - When would we always generate a carry? gi aibi
- When would we propagate the carry? pi ai
bi - Did we get rid of the ripple? c1 g0 p0c0
- c2 g1 p1c1 c2 g1 p1 g0 p1p0c0
- c3 g2 p2c2 c3
- c4 g3 p3c3 c4
- Feasible! Why?
29Use principle to build bigger adders
- Cant build a 16 bit adder this way... (too big)
- Could use ripple carry of 4-bit CLA adders
- Better use the CLA principle again!
30Carry-Select Adder
sum0-sum15
a16-a31
16-bit ripple carry adder
0
b16-b31
0
sum16-sum31
Multiplexer
a16-a31
16-bit ripple carry adder
1
b16-b31
This is known as carry-select adder
1
31ALU Summary
- We can build an ALU to support MIPS addition
- Our focus is on comprehension, not performance
- Real processors use more sophisticated techniques
for arithmetic - Where performance is not critical, hardware
description languages allow designers to
completely automate the creation of hardware!
32Multiplication
- More complicated than addition
- accomplished via shifting and addition
- More time and more area
- Let's look at 3 versions based on a gradeschool
algorithm 0010 (multiplicand) __x_101
1 (multiplier) - Negative numbers convert and multiply
- there are better techniques, we wont look at
them
33Multiplication Implementation
Datapath
Control
34Final Version
- Multiplier starts in right half of product
What goes here?
35Multiplying Signed Numbers withBoothes
Algorithm
- Consider A x B where A and B are signed integers
(2s complemented format) - Decompose B into the sum B1 B2 Bn
- A x B A x (B1 B2 Bn)
- (A x B1) (A x B2) (A x Bn)
- Let each Bi be a single string of 1s embedded in
0s - 00111100
- Example
- 0110010011100 0110000000000
- 0000010000000
- 0000000011100
36Boothes Algorithm
- Scanning from right to left, bit number u is the
first 1 bit of the string and bit v is the first
0 left of the string - v u
- Bi 0 0 1 1 0 0
- 0 0 1 1 1 1
(2v 1) - - 0 0 0 0 1 1 (2u 1)
- (2v 1) - (2u 1)
- 2v 2u
37Boothes Algorithm
- Decomposing B
- A x B A x (B1 B2 )
- A x (2v1 2u1) (2v2 2u2)
- (A x 2v1) (A x 2u1) (A x
2v2) (A x 2u2) - A x B can be computed by adding and subtracted
shifted values of A - Scan bits right to left, shifting A once per bit
- When the bit string changes from 0 to 1, subtract
shifted A from the current product P (A x 2u) - When the bit string changes from 1 to 0, add
shifted A to the current product P (A x 2v)
38Floating Point Numbers
- We need a way to represent a wide range of
numbers - numbers with fractions, e.g., 3.1416
- large number
- 976,000,000,000,000 9.76 1014
- small number
- 0.0000000000000976 9.76 10-14
- Representation
- sign, exponent, significand
- (1)sign significand 2exponent
- more bits for significand gives more accuracy
- more bits for exponent increases range
39Scientific Notation
- Scientific notation
- 0.525105 5.25104 52.5103
- 5.25 104 is in normalized scientific notation.
- position of decimal point fixed
- leading digit non-zero
- Binary numbers
- 5.25 101.01 1.010122
- Binary point
- multiplication by 2 moves the point to the left
- division by 2 moves the point to the right
- Known as floating point format.
40Binary to Decimal Conversion
Binary (-1)S (1.b1b2b3b4) 2E
Decimal (-1)S (1 b12-1 b22-2 b32-3
b42-4) 2E
Example -1.1100 2-2 (binary) - (1 2-1
2-2) 2-2 - (1 0.5 0.25)/4
- 1.75/4 - 0.4375 (decimal)
41IEEE Std. 754 Floating-Point Format
Single-Precision
S E 8-bit Exponent F 23-bit
Fraction
bits 0-22
bits 23-30
bit 31
Double-Precision
S E 11-bit Exponent F 52-bit
Fraction
bits 0-19
bits 20-30
bit 31
Continuation of 52-bit Fraction
bits 0-31
42IEEE 754 floating-point standard
- Represented value (1)sign (1F) 2exponent
bias - Exponent is biased (excess-K format) to make
sorting easier - bias of 127 for single precision and 1023 for
double precision - E values in 1 .. 254 (0 and 255
reserved) - Range 2-126 to 2127 (1038 to 1038)
- Significand in sign-magnitude, normalized form
- Significand (1 F) 1. b-1b-1b-1b-23
- Suppress storage of leading 1
- Overflow Exponent requiring more than 8 bits.
Number can be positive or negative. - Underflow Fraction requiring more than 23 bits.
Number can be positive or negative.
43IEEE 754 floating-point standard
- Example
- Decimal -5.75 - ( 4 1 ½ ¼ )
- Binary -101.11 -1.0111 x 22
- Floating point exponent 129 10000001
- IEEE single precision 11000000101110000000000000
0000000
44Examples
Biased exponent (0-255), bias 127 (01111111) to
be subtracted
1.1010001 210100 0 10010011
10100010000000000000000 1.6328125 220
-1.1010001 210100 1 10010011
10100010000000000000000 -1.6328125 220
1.1010001 2-10100 0 01101011
10100010000000000000000 1.6328125 2-20
-1.1010001 2-10100 1 01101011
10100010000000000000000 -1.6328125 2-20
0.5 0.125 0.0078125 0.6328125
45Numbers in 32-bit Formats
- Twos complement integers
- Floating point numbers
- Ref W. Stallings, Computer Organization and
Architecture, Sixth Edition, Upper Saddle River,
NJ Prentice-Hall.
Expressible numbers
-231
231-1
0
Positive underflow
Negative underflow
Negative Overflow
Positive Overflow
Expressible negative numbers
Expressible positive numbers
0
-2-127
2-127
(2 2-23)2127
- (2 2-23)2127
46IEEE 754 Special Codes
Zero
S 00000000 00000000000000000000000
- 1.0 2-127
- Smallest positive number in single-precision IEEE
754 standard. - Interpreted as positive/negative zero.
- Exponent less than -127 is positive underflow
(regard as zero).
S 11111111 00000000000000000000000
Infinity
- 1.0 2128
- Largest positive number in single-precision IEEE
754 standard. - Interpreted as 8
- If true exponent 128 and fraction ? 0, then the
number is greater than 8. It is called not a
number or NaN (interpret as 8).
47Addition and Subtraction
- Addition/subtraction of two floating-point
numbers - Example 2 x 103 Align mantissas
0.2 x 104 - 3 x 104
3 x 104 -
3.2 x 104 - General Case
- m1 x 2e1 m2 x 2e2 (m1 m2 x
2e2-e1) x 2e1 for e1 gt e2 -
(m1 x 2e1-e2 m2) x 2e2
for e2 gt e1 -
- Shift smaller mantissa right by e1 - e2 bits
to align the mantissas.
48Addition/Subtraction Algorithm
- 0. Zero check
- - Change the sign of subtrahend
- - If either operand is 0, the other is the result
- 1. Significand alignment right shift smaller
significand until two exponents are identical. - 2. Addition add significands and report
exception if overflow occurs. - 3. Normalization
- - Shift significand bits to normalize.
- - report overflow or underflow if exponent goes
out of range. - 4. Rounding
49FP Add/Subtract (PH Text Figs. 3.16/17)
50Example
- Subtraction 0.5ten- 0.4375ten
- Step 0 Floating point numbers to be added
- 1.000two2-1 and -1.110two2-2
- Step 1 Significand of lesser exponent is
shifted right until exponents match - -1.110two2-2 ? - 0.111two2-1
- Step 2 Add significands, 1.000two (-
0.111two) - Result is 0.001two 2-1
51Example (Continued)
- Step 3 Normalize, 1.000two 2-4
- No overflow/underflow since
- 127 exponent -126
- Step 4 Rounding, no change since the sum fits
in 4 bits. - 1.000two 2-4 (10)/16 0.0625ten
52FP Multiplication Basic Idea
- (m1 x 2e1) x (m2 x 2e2) (m1 x m2) x
2e1e2 - Separate signs
- Add exponents
- Multiply significands
- Normalize, round, check overflow
- Replace sign
53FP Multiplication Algorithm
P-H Figure 3.18
54FP Mult. Illustration
- Multiply 0.5ten and -0.4375ten (answer -
0.21875ten) or - Multiply 1.000two2-1 and -1.110two2-2
- Step 1 Add exponents
- -1 (-2) -3
- Step 2 Multiply significands
- 1.000
- 1.110
- 0000
- 1000
- 1000
- 1000
- 1110000 Product is 1.110000
55FP Mult. Illustration (Cont.)
- Step 3
- Normalization If necessary, shift significand
right and increment exponent. - Normalized product is 1.110000 2-3
- Check overflow/underflow 127 exponent -126
- Step 4 Rounding 1.110 2-3
- Step 5 Sign Operands have opposite signs,
- Product is -1.110 2-3
- Decimal value - (10.50.25)/8 - 0.21875ten
56FP Division Basic Idea
- Separate sign.
- Check for zeros and infinity.
- Subtract exponents.
- Divide significands.
- Normalize/overflow/underflow.
- Rounding.
- Replace sign.
57MIPS Floating Point
- 32 floating point registers, f0, . . . , f31
- FP registers used in pairs for double precision
f0 denotes double precision content of f0,f1 - Data transfer instructions
- lwc1 f1, 100(s2) f1?Mems1100
- swc1 f1, 100(s2) Mems2100?f1
- Arithmetic instructions (xxxadd, sub, mul, div)
- xxx.s single precision
- xxx.d double precision
58Floating Point Complexities
- Operations are somewhat more complicated (see
text) - In addition to overflow we can have underflow
- Accuracy can be a big problem
- IEEE 754 keeps two extra bits, guard and round
- four rounding modes
- positive divided by zero yields infinity
- zero divide by zero yields not a number
- other complexities
- Implementing the standard can be tricky
- Not using the standard can be even worse
- see text for description of 80x86 and Pentium bug!
59Chapter Three Summary
- Computer arithmetic is constrained by limited
precision - Bit patterns have no inherent meaning but
standards do exist - twos complement
- IEEE 754 floating point
- Computer instructions determine meaning of the
bit patterns - Performance and accuracy are important so there
are many complexities in real machines - Algorithm choice is important and may lead to
hardware optimizations for both space and time
(e.g., multiplication) -
- You may want to look back (Section 3.10 is great
reading!)