Title: Chapter 4: Arithmetic for Computers (Part 1)
1Chapter 4 Arithmetic for Computers(Part 1)
2Notes on Project 1
- There are two different ways the following two
words can be stored in a computer memory - word1 .byte 0,1,2,3
- word2 .half 0,1
- One way is big-endian, where the word is stored
in memory in its original order - word1
- word2
- Another way is little-endian, where the word is
stored in memory in reverse order - word1
- word2
- Of course, this affects the way in which the lw
instruction works
00 01 02 03
0000 0001
03 02 01 00
0001 0000
3Notes on Project 1
- MIPS uses the endian-style that the architecture
underneath it uses - Intel uses little-endian, so we need to deal with
that - This affects assignment 1 because the input data
is stored as a series of bytes - If you use lws on your data set, the values will
be loaded into your dest. register in reverse
order - Hint Try the lb/sb instruction
- This instruction will load/store a byte from an
unaligned address and perform the translation for
you
4Notes on Project 1
- Hint Use SPIMs breakpoint and single-step
features to help debug your program - Also, make sure you use the registers and
memory/stack displays - Hint You may want to temporarily store your
input set into a word array for sorting - Make sure you check Appendix A for additional
useful instructions that I didnt cover in class - Make sure you comment your code!
5Goals of Chapter 4
- Data representation
- Hardware mechanisms for performing arithmetic on
data - Hardware implications on the instruction set
design
6Review of Binary Representation
- Binary/Hex -gt Decimal conversion
- Decimal -gt Binary/Hex conversion
- Least/Most significant bits
- Highest representable number/maximum number of
unique representable symbols - Twos compliment representation
- Ones compliment
- Finding signed number ranges (-2n-1 to 2n-1-1)
- Doing arithmetic with twos compliment
- Sign extending with load half/byte
- Unsigned loads
- Signed/unsigned comparison
7Binary Addition/Subtraction
- Binary subtraction works exactly like addition,
except the second operand is converted to twos
compliment - Overflow in signed arithmetic occurs under the
following conditions
Operation Operand A Operand B Result
AB Positive Positive Negative
AB Negative Negative Positive
A-B Positive Negative Negative
A-B Negative Positive Positive
8What Happens When Overflow Occurs?
- MIPS detects overflow with an exception/interrupt
- When an interrupt occurs, a branch occurs to code
in the kernel at address 80000080 where special
registers (BadVAddr, Status, Cause, and EPC) are
used to handle the interrupt - SPIM has a simple interrupt handler built-in that
deals with interrupts - We may come back to interrupts later
9Review of Shift and Logical Operations
- MIPS has operations for SLL, SRL, and SRA
- We covered this in the last chapter
- MIPS implements bit-wise AND, OR, and XOR logical
operations - These operations perform a bit-by-bit parallel
logical operation on two registers - In C, use ltlt and gtgt for arithmetic shifts, and ,
, , and for bitwise and, or, xor, and NOT,
respectively
10Review of Logic Operations
- The three main parts of a CPU
- ALU (Arithmetic and Logic Unit)
- Performs all logical, arithmetic, and shift
operations - CU (Control Unit)
- Controls the CPU performs load/store, branch,
and instruction fetch - Registers
- Physical storage locations for data
11Review of Logic Operations
- In this chapter, our goal is to learn how the ALU
is implemented - The ALU is entirely constructed using boolean
functions as hardware building blocks - The 3 basic digital logic building blocks can be
used to construct any digital logic system AND,
OR, and NOT - These functions can be directly implemented using
electric circuits (wires and transistors)
12Review of Logic Operations
- These combinational logic devices can be
assembled to create a much more complex digital
logic system
A B A AND B
0 0 0
0 1 0
1 0 0
1 1 1
A B A OR B
0 0 0
0 1 1
1 0 1
1 1 1
A not A
0 1
1 0
13Review of Logic Operations
- We need another device to build an ALU
- This is called a multiplexor it implements an
if-then-else in hardware
A B D C (out)
0 0 0 0 (a)
0 0 1 0 (b)
0 1 0 0 (a)
0 1 1 1 (b)
1 0 0 1 (a)
1 0 1 0 (b)
1 1 0 1 (a)
1 1 1 1 (b)
14A 1-bit ALU
- Perform logic operations in parellel and mux the
output - Next, we want to include addition, so lets build
a single-bit adder - Called a full adder
15Full Adder
- From the following table, we can construct the
circuit for a full adder and link multiple full
adders together to form a multi-bit adder - We can also add this input to our ALU
- How do we give subtraction ability to our adder?
- How do we detect overflow and zero results?
Inputs Inputs Inputs Outputs Outputs Comments
A B CarryIn CarryOut Sum Comments
0 0 0 0 0 00000
0 0 1 0 1 00101
0 1 0 0 1 0101
0 1 1 1 0 01110
1 0 0 0 1 10001
1 0 1 1 0 10110
1 1 0 1 0 11010
1 1 1 1 1 11111
16Chapter 4 Arithmetic for Computers(Part 2)
17Logic/Arithmetic
- From the truth table for the mux, we can use
sum-of-products to derive the logic equation - With sum-of-products, for each 1 row for each
output, we AND together all the inputs (inverting
the input 0s), then OR all the row products - To make it simpler, lets add dont cares to
the table
18Logic/Arithmetic
A B D C (out)
0 X 0 0 (a)
X 0 1 0 (b)
1 X 0 1 (a)
X 1 1 1 (b)
- This gives us the following equation
- (A and (not D)) or (B and D)
- We dont need the inputs for the dont cares in
our partial products - This is one way to simplify our logic equation
- Other ways include propositional calculus,
Karnaugh Maps, and the Quine-McCluskey algorithm
19Logic/Arithmetic
- Here is a (crude) digital logic design for the
2-to-1 mux - Note that multiple muxes can be assembled in
stages to implement multiple-input muxes
20Logic/Arithmetic
- For the adder, lets minimize the logic using a
Karnaugh Map - For CarryOut, we need 23 entries
- We can minimize this to
- CarryOutABCarryInBCarryInC
AB
CarryIn 00 01 11 10
0 1
1 1 1 1
21Logic/Arithmetic
- Theres no way to minimize this equation, so we
need the full sum of products - Sum(not A)(not B)CarryIn ABCarryIn (not
A)BCarryIn A(not B)CarryIn
AB
CarryIn 00 01 11 10
0 1 1
1 1 1
22Logic/Arithmetic
- In order to implement subtraction, we can invert
the B input to the adder and set CarryIn to be 1 - This can be implemented with a mux select B or
not B (call this input Binvert) - Now we can build a 1-bit ALU using an AND, OR,
addition, and subtraction operation - We can perform the AND, OR, and ADD in parallel
and switch the results with a 4-input mux
(Operation will be our D-input) - To make the adder a subtractor, well need to
have to set Binvert and CarryIn to 1
23Lecture 4 Arithmetic for Computers(Part 3)
24Chapter 4 Review
- So far, weve covered the following topics for
this chapter - Binary representation of signed integers
- 16 to 32 bit signed conversion
- Binary addition/subtraction
- Overflow detection/overflow exception handling
- Shift and logical operations
- Parts of the CPU
- AND, OR, XOR, and inverter gates
- Multiplexor (mux) and full adder
- Sum-of-products logic equations (truth tables)
- Logic minimization techniques
- Dont cares and Karnaugh Maps
251-bit ALU Design
- A 1-bit ALU can be constructed
- Components
- AND, OR, and adder
- 4-to-1 mux
- Binverter (inverter and 2-to-1 mux)
- Interface
- Inputs A, B, Binvert, Operation (2 bits),
CarryIn, and Less - Outputs CarryOut and Result
- Digital functions are performed in parallel and
the outputs are routed into a mux - The mux will also accept a Less input which well
accept from outside the 1-bit ALU - The select lines of the mux make up the
operation input to the ALU
2632-bit ALU
- In order to create a multi-bit ALU, array 32
1-bit ALUs - Connect the CarryOut of each bit to the CarryIn
of the next bit - A and B of each 1-bit ALU will be connected to
each successive bit of the 32-bit A and B - The Result outputs of each 1-bit ALU will form
the 32-bit result - We need to add an SLT unit and connect the output
to the least significant 1-bit ALUs Less input - Hardwire the other Less inputs to 0
- We need to add an Overflow unit
- We need to add a Zero detection unit
27SLT Unit
- To compute SLT, we need to make sure that when
the 1-bit ALUs Operation is set to 11, a
subtract operation is also being computed - With this happening, the SLT unit can compute
Less based on the MSB (sign) of A, B, and Result
Asign Bsign Rsign Less
0 0 0 0
0 0 1 1
0 1 X 0
1 0 X 1
1 1 0 1
1 1 1 0
28Overflow Unit
- When doing signed arithmetic, we need to follow
this table, as we covered previously - How do we implement this in hardware?
Operation Operand A Operand B Result
AB Positive Positive Negative
AB Negative Negative Positive
A-B Positive Negative Negative
A-B Negative Positive Positive
29Overflow Unit
- We need a truth table
- Since well be computing the logic equation with
SOP, we only need the rows where the output is 1
Operation A(31) B(31) R(31) Overflow
010 (add) 0 0 1 1
010 (add) 1 1 0 1
110 (sub) 0 1 1 1
110 (sub) 1 0 0 1
30Zero Detection Unit
- Or together all the 1-bit ALU outputs the
result is the Zero output to the ALU
3132-bit ALU Operation
- We need a 3-bit ALU Operation input into our
32-bit ALU - The two least significant bits can be routed into
all the 1-bit ALUs internally - The most significant bit can be routed into the
least significant 1-bit ALUs CarryIn, and to
Binvert of all the 1-bit ALUs
3232-bit ALU Operation
- Heres the final ALU Operation table
ALU Operation Function
000 and
001 or
010 add
110 subtract
111 set on less than
3332-bit ALU
- In the end, our ALU will have the following
interface - Inputs
- A and B (32 bits each)
- ALU Operation (3 bits)
- Outputs
- CarryOut (1 bit)
- Zero (1 bit)
- Result (32 bits)
- Overflow (1 bit)
34Carry Lookahead
- The adder architecture we previously looked at
requires n2 gate delays to compute its result
(worst case) - The longest path that a digital signal must
propagate through is called the critical path - This is WAAAYYYY too slow!
- There other ways to build an adder that require
lg n delay - Obviously, using SOP, we can build a circuit that
will compute ANY function in 2 gate delays (2
levels of logic) - Obviously, in the case of a 64-input system, the
resulting design will be too big and too complex
35Carry Lookahead
- For example, we can easily see that the CarryIn
for bit 1 is computed as - c1(a0b0)(a0c0)(b0c0)
- c2(a1b1)(a1c1)(b1c1)
- Hardware executes in parallel, so using the
following fast CarryIn computation, we can
perform an add with 3 gate delays - c2(a1b1)(a1a0b0)(a1a0c0)(a1b0c0)(b1a0b0)(b1a
0c0)(b1b0c0) - I used the logical distributive law to compute
this - As you can see, the CarryIn logic gets bigger and
bigger for consecutive bits
36Carry Lookahead
- Carry Lookahead adders are faster than
ripple-carry adders - Recall
- ci1(aibi)(aici)(bici)
- ci can be factored out
- ci1(aibi)(aibi)ci
- So
- c2(a1b1)(a1b1)((a0b0)(a0b0)c0)
37Carry Lookahead
- Note the repeated appearance of (aibi) and
(aibi) - They are called generate (gi) and propagate (pi)
- giaibi, piaibi
- ci1gipici
- This means if gi1, a CarryOut is generated
- If pi1, a CarryOut is propagated from CarryIn
38Carry Lookahead
- c1g0(p0c0)
- c2g1(p1g0)(p1p0c0)
- c3g2(p2g1)(p2p1g0)(p2p1p0c0)
- c4g3(p3g2)(p3p2g1)(p3p2p1g0)(p3p2p1p0c0)
- This system will give us an adder with 5 gate
delays but it is still too complex
39Carry Lookahead
- To solve this, well build our adder using 4-bit
adders with carry lookahead, and connect them
using super-propagate and generate logic - The superpropagate is only true if all the bits
propagate a carry - P0p0p1p2p3
- P1p4p5p6p7
- P2p8p9p10p11
- P3p12p13p14p15
40Carry Lookahead
- The supergenerate follows a similar equation
- G0g3(p3g2)(p2p2g1)(p3p2p1g0)
- G1g7(p7g6)(p7p6g5)(p7p6p5g4)
- G2g11(p11g10)(p11p10g9)(p11p10p9g8)
- G3g15(p15g14)(p15p14g13)(p15p14p13g12)
- The supergenerate and superpropagate logic for
the 4-4 bit Carry Lookahead adders is contained
in a Carry Lookahead Unit - This yields a worst-case delay of 7 gate delays
- Reason?
41Carry Lookahead
- Weve covered all ALU functions except for the
shifter - Well talk after the shifter later
42Lecture 4 Arithmetic for Computers(Part 4)
43Binary Multiplication
- In multiplication, the first operand is called
the multiplicand, and the second is called the
multiplier - The result is called the product
- Not counting the sign bits, if we multiply an
n-bit multiplicand with a m-bit multiplier, well
get a nm-bit product
44Binary Multiplication
- Binary multiplication works exactly like decimal
multiplication - In fact, multiply 100101 by 111001 and pretend
youre using decimal numbers
45First Hardware Design for Multiplier
Note that the multiplier is not routed into the
ALU
46Second Hardware Design for Multiplier
- Architects realized that at the least, half of
the bits in the multiplicand register were 0 - Reduce ALU to 32 bits, shift the product right
instead of shifting the multiplicand left - In this case, the product is only 32 bits
47Second Hardware Design for Multiplier
48Final Hardware Design for Multiplier
- Lets combine the product register with the
multiplier register - Put the multiplier in the right half of the
product register and initialize the left half
with zeros when were done, the product will be
in the right half
49Final Hardware Design for Multiplier
50Final Hardware Design for Multiplier
- For the first two designs, we need to convert the
multiplicand and the multiplier must be converted
to positive - The signs would need to be remembered so the
product can be converted to whatever sign it
needs to be - The third design will deal with signed numbers,
as long as the sign bit is extended in the
product register
51Booths Algorithm
- Booths Algorithm starts with the observation
that if we have the ability to both add and
subtract, there are multiple ways to compute a
product - For every 0 in the multiplier, we shift the
multiplicand - For every 1 in the multiplier, we add the
multiplicand to the product, then shift the
multiplicand
52Booths Algorithm
- Instead, when a 1 is seen in the multiplier,
subtract instead of add - Shift for all 1s after this, until the first 0
is seen, then add - The method was developed because in Booths era,
shifters were faster than adders
53Booths Algorithm
- Example
- 0010 2
- x 0110 6
- 0000 0 shift
- 0010 -2 (21) subtract (first 1)
- 0000 0 shift (second 1)
- 0010 2 (23) (first 0)
- -4162612
54Lecture 4 Arithmetic for Computers(Part 5)
55Binary Division
- Like last lecture, well start with some basic
terminology - Again, lets assume our numbers are base 10, but
lets only use 0s and 1s
56Binary Division
- Recall
- DividendQuotientDivisor Remainder
- Lets assume that both the dividend and divisor
are positive and hence the quotient and the
remainder are nonnegative - The division operands and both results are 32-bit
values and we will ignore the sign for now
57First Hardware Design for Divider
Initialize the Quotient register to 0, initialize
the left-half of the Divisor register with the
divisor, and initialize the Remainder register
with the dividend (right-aligned)
58Second Hardware Design for Divider
Much like with the multiplier, the divisor and
ALU can be reduced to 32-bits if we shift the
remainder right instead of shifting the divisor
to the left
Also, the algorithm must be changed so the
remainder is shifted left before the subtraction
takes place
59Third Hardware Design for Divider
Shift the bits of the quotient into the remainder
register Also, the last step of the algorithm
is to shift the left half of the remainder right
1 bit
60Signed Division
- Simplest solution remember the signs of the
divisor and the dividend and then negate the
quotient if the signs disagree - The dividend and the remainder must have the same
signs
61Considerations
- The same hardware can be used for both multiply
and divide - Requirement 64-bit register that can shift left
or right and a 32-bit ALU that can add or subtract
62Floating Point
- Floating point (also called real) numbers are
used to represent values that are fractional or
that are too big to fit in a 32-bit integer - Floating point numbers are expressed in
scientific notation (base 2) and are normalized
(no leading 0s) - 1.xxxx2 2yyyy
- In this case, xxxx is the significand and yyyy is
the exponent
63Floating Point
- In MIPS, a floating point is represented in the
following manner (IEEE 754 standard) - bit 31 sign of significand
- bit 30..23 (8) exponent (2s comp)
- bit 22..0 (23) significand
- Note that size of exponent and significand must
be traded off... accuracy vs. range - This allows us representation for signed numbers
as small as 2x10-38 to 2x1038 - Overflow and underflow must be detected
- Double-precision floating point numbers are 2
words... the significand is extended to 52 bits
and the exponent to 11 bits - Also, the first bit of the significand is
implicit (only the fractional part is specified) - In order to represent 0 in a float, put 0 in the
exponent field - So heres the equation we use (-1)S x
(1Significand) x 2E - Or (-1)S X (1 (s1x2-1) (s2x2-2) (s3x2-3)
(s4x2-4) ...) x 2E
64Considerations
- IEEE 754 sought to make floating-point numbers
easier to sort - sign is first bit
- exponent comes first
- But we want an all-0 (1) exponent to represent
the most-negative exponent and an all-1 exponent
to be the most positive - This is called biased-notation, so well use the
following equation - (-1)S x (1 Significand) x 2(Exponent-Bias)
- Bias is 127 for single-precision and 1023 for
double-precision
65Lecture 4 Arithmetic for Computers(Part 6)
66Converting Decimal Floating Point to Binary
- Use the method I showed last lecture...
- Significand
- Use the iterative method to convert the
fractional part to binary - Convert the integer part to binary using the
old-fashioned method - Shift the decimal point to the left until the
number is normalized - Drop the leading 1, and set the exponent to be
the number of positions you shifted the decimal
point - Adjust the exponent for bias (127/1023)
67Floating Point Addition
- Lets add two decimal floating point numbers...
- Lets try 9.999 x 101 1.610 x 10-1
- Assume we can only store 4 digits of the
significand and two digits of the exponent
68Floating Point Addition
- Match exponents for both operands by
un-normalizing one of them - Match to the exponent of the larger number
- Add significands
- Normalize result
- Round significand
69Binary Floating Point Addition
70Floating Point Multiplication
- Example 1.110 x 1010 X 9.200 x 10-5
- Assume 4 digits for significand and 2 digits for
exponent - Calculate the exponent of the product by simply
adding the exponents of the operand - 10(-5)5
- Bias the exponents
- 137122259
- Somethings wrong! We added the biases with the
exponents... - 5127132
71Floating Point Multiplication
- Multiply the significands...
- 1.110 x 9.20010.212000
- Normalize and add add 1 to exponent
- 1.0212 x 106
- Round significand to four digits
- 1.021
- Set sign based on signs of operands
- 1.021 x 106
72Floating Point Multiplication
73Accurate Arithmetic
- Integers can represent every value between the
largest and smallest possible values - This is not the case with floating point
- Only 253 unique values can be represented with
double precision fp - IEEE 754 always keeps 2 extra bits on the right
of the significand during intermediate
calculation called guard and round to minimize
rounding errors
74Accurate Arithmetic
- Since the worst case for rounding would be when
the actual number is halfway between two floating
point representations, accuracy is measured as
number of least-significant error bits - This is called units in the last place (ulp)
- IEEE 754 guarantees that the computer is within
.5 ulp (using guard and round)