Chapter 4: Arithmetic for Computers (Part 1)

About This Presentation

Title:

Chapter 4: Arithmetic for Computers (Part 1)

Description:

... as long as the sign bit is extended in the product register Booth s Algorithm Booth s Algorithm starts with the observation that if we have the ability to ... – PowerPoint PPT presentation

Number of Views:164

Avg rating:3.0/5.0

Slides: 75

Provided by: Adm9551

Learn more at: https://www.cse.sc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 4: Arithmetic for Computers (Part 1)

1
Chapter 4 Arithmetic for Computers(Part 1)

CS 447
Jason Bakos

2
Notes on Project 1

There are two different ways the following two
words can be stored in a computer memory
word1 .byte 0,1,2,3
word2 .half 0,1
One way is big-endian, where the word is stored
in memory in its original order
word1
word2
Another way is little-endian, where the word is
stored in memory in reverse order
word1
word2
Of course, this affects the way in which the lw
instruction works

00 01 02 03
0000 0001
03 02 01 00
0001 0000
3
Notes on Project 1

MIPS uses the endian-style that the architecture
underneath it uses
Intel uses little-endian, so we need to deal with
that
This affects assignment 1 because the input data
is stored as a series of bytes
If you use lws on your data set, the values will
be loaded into your dest. register in reverse
order
Hint Try the lb/sb instruction
This instruction will load/store a byte from an
unaligned address and perform the translation for
you

4
Notes on Project 1

Hint Use SPIMs breakpoint and single-step
features to help debug your program
Also, make sure you use the registers and
memory/stack displays
Hint You may want to temporarily store your
input set into a word array for sorting
Make sure you check Appendix A for additional
useful instructions that I didnt cover in class
Make sure you comment your code!

5
Goals of Chapter 4

Data representation
Hardware mechanisms for performing arithmetic on
data
Hardware implications on the instruction set
design

6
Review of Binary Representation

Binary/Hex -gt Decimal conversion
Decimal -gt Binary/Hex conversion
Least/Most significant bits
Highest representable number/maximum number of
unique representable symbols
Twos compliment representation
Ones compliment
Finding signed number ranges (-2n-1 to 2n-1-1)
Doing arithmetic with twos compliment
Sign extending with load half/byte
Unsigned loads
Signed/unsigned comparison

7
Binary Addition/Subtraction

Binary subtraction works exactly like addition,
except the second operand is converted to twos
compliment
Overflow in signed arithmetic occurs under the
following conditions

Operation Operand A Operand B Result
AB Positive Positive Negative
AB Negative Negative Positive
A-B Positive Negative Negative
A-B Negative Positive Positive
8
What Happens When Overflow Occurs?

MIPS detects overflow with an exception/interrupt
When an interrupt occurs, a branch occurs to code
in the kernel at address 80000080 where special
registers (BadVAddr, Status, Cause, and EPC) are
used to handle the interrupt
SPIM has a simple interrupt handler built-in that
deals with interrupts
We may come back to interrupts later

9
Review of Shift and Logical Operations

MIPS has operations for SLL, SRL, and SRA
We covered this in the last chapter
MIPS implements bit-wise AND, OR, and XOR logical
operations
These operations perform a bit-by-bit parallel
logical operation on two registers
In C, use ltlt and gtgt for arithmetic shifts, and ,
, , and for bitwise and, or, xor, and NOT,
respectively

10
Review of Logic Operations

The three main parts of a CPU
ALU (Arithmetic and Logic Unit)
Performs all logical, arithmetic, and shift
operations
CU (Control Unit)
Controls the CPU performs load/store, branch,
and instruction fetch
Registers
Physical storage locations for data

11
Review of Logic Operations

In this chapter, our goal is to learn how the ALU
is implemented
The ALU is entirely constructed using boolean
functions as hardware building blocks
The 3 basic digital logic building blocks can be
used to construct any digital logic system AND,
OR, and NOT
These functions can be directly implemented using
electric circuits (wires and transistors)

12
Review of Logic Operations

These combinational logic devices can be
assembled to create a much more complex digital
logic system

A B A AND B
0 0 0
0 1 0
1 0 0
1 1 1
A B A OR B
0 0 0
0 1 1
1 0 1
1 1 1
A not A
0 1
1 0
13
Review of Logic Operations

We need another device to build an ALU
This is called a multiplexor it implements an
if-then-else in hardware

A B D C (out)
0 0 0 0 (a)
0 0 1 0 (b)
0 1 0 0 (a)
0 1 1 1 (b)
1 0 0 1 (a)
1 0 1 0 (b)
1 1 0 1 (a)
1 1 1 1 (b)
14
A 1-bit ALU

Perform logic operations in parellel and mux the
output
Next, we want to include addition, so lets build
a single-bit adder
Called a full adder

15
Full Adder

From the following table, we can construct the
circuit for a full adder and link multiple full
adders together to form a multi-bit adder
We can also add this input to our ALU
How do we give subtraction ability to our adder?
How do we detect overflow and zero results?

Inputs Inputs Inputs Outputs Outputs Comments
A B CarryIn CarryOut Sum Comments
0 0 0 0 0 00000
0 0 1 0 1 00101
0 1 0 0 1 0101
0 1 1 1 0 01110
1 0 0 0 1 10001
1 0 1 1 0 10110
1 1 0 1 0 11010
1 1 1 1 1 11111
16
Chapter 4 Arithmetic for Computers(Part 2)

CS 447
Jason Bakos

17
Logic/Arithmetic

From the truth table for the mux, we can use
sum-of-products to derive the logic equation
With sum-of-products, for each 1 row for each
output, we AND together all the inputs (inverting
the input 0s), then OR all the row products
To make it simpler, lets add dont cares to
the table

18
Logic/Arithmetic
A B D C (out)
0 X 0 0 (a)
X 0 1 0 (b)
1 X 0 1 (a)
X 1 1 1 (b)

This gives us the following equation
(A and (not D)) or (B and D)
We dont need the inputs for the dont cares in
our partial products
This is one way to simplify our logic equation
Other ways include propositional calculus,
Karnaugh Maps, and the Quine-McCluskey algorithm

19
Logic/Arithmetic

Here is a (crude) digital logic design for the
2-to-1 mux
Note that multiple muxes can be assembled in
stages to implement multiple-input muxes

20
Logic/Arithmetic

For the adder, lets minimize the logic using a
Karnaugh Map
For CarryOut, we need 23 entries
We can minimize this to
CarryOutABCarryInBCarryInC

AB
CarryIn 00 01 11 10
0 1
1 1 1 1
21
Logic/Arithmetic

Theres no way to minimize this equation, so we
need the full sum of products
Sum(not A)(not B)CarryIn ABCarryIn (not
A)BCarryIn A(not B)CarryIn

AB
CarryIn 00 01 11 10
0 1 1
1 1 1
22
Logic/Arithmetic

In order to implement subtraction, we can invert
the B input to the adder and set CarryIn to be 1
This can be implemented with a mux select B or
not B (call this input Binvert)
Now we can build a 1-bit ALU using an AND, OR,
addition, and subtraction operation
We can perform the AND, OR, and ADD in parallel
and switch the results with a 4-input mux
(Operation will be our D-input)
To make the adder a subtractor, well need to
have to set Binvert and CarryIn to 1

23
Lecture 4 Arithmetic for Computers(Part 3)

CS 447
Jason Bakos

24
Chapter 4 Review

So far, weve covered the following topics for
this chapter
Binary representation of signed integers
16 to 32 bit signed conversion
Binary addition/subtraction
Overflow detection/overflow exception handling
Shift and logical operations
Parts of the CPU
AND, OR, XOR, and inverter gates
Multiplexor (mux) and full adder
Sum-of-products logic equations (truth tables)
Logic minimization techniques
Dont cares and Karnaugh Maps

25
1-bit ALU Design

A 1-bit ALU can be constructed
Components
AND, OR, and adder
4-to-1 mux
Binverter (inverter and 2-to-1 mux)
Interface
Inputs A, B, Binvert, Operation (2 bits),
CarryIn, and Less
Outputs CarryOut and Result
Digital functions are performed in parallel and
the outputs are routed into a mux
The mux will also accept a Less input which well
accept from outside the 1-bit ALU
The select lines of the mux make up the
operation input to the ALU

26
32-bit ALU

In order to create a multi-bit ALU, array 32
1-bit ALUs
Connect the CarryOut of each bit to the CarryIn
of the next bit
A and B of each 1-bit ALU will be connected to
each successive bit of the 32-bit A and B
The Result outputs of each 1-bit ALU will form
the 32-bit result
We need to add an SLT unit and connect the output
to the least significant 1-bit ALUs Less input
Hardwire the other Less inputs to 0
We need to add an Overflow unit
We need to add a Zero detection unit

27
SLT Unit

To compute SLT, we need to make sure that when
the 1-bit ALUs Operation is set to 11, a
subtract operation is also being computed
With this happening, the SLT unit can compute
Less based on the MSB (sign) of A, B, and Result

Asign Bsign Rsign Less
0 0 0 0
0 0 1 1
0 1 X 0
1 0 X 1
1 1 0 1
1 1 1 0
28
Overflow Unit

When doing signed arithmetic, we need to follow
this table, as we covered previously
How do we implement this in hardware?

Operation Operand A Operand B Result
AB Positive Positive Negative
AB Negative Negative Positive
A-B Positive Negative Negative
A-B Negative Positive Positive
29
Overflow Unit

We need a truth table
Since well be computing the logic equation with
SOP, we only need the rows where the output is 1

Operation A(31) B(31) R(31) Overflow
010 (add) 0 0 1 1
010 (add) 1 1 0 1
110 (sub) 0 1 1 1
110 (sub) 1 0 0 1
30
Zero Detection Unit

Or together all the 1-bit ALU outputs the
result is the Zero output to the ALU

31
32-bit ALU Operation

We need a 3-bit ALU Operation input into our
32-bit ALU
The two least significant bits can be routed into
all the 1-bit ALUs internally
The most significant bit can be routed into the
least significant 1-bit ALUs CarryIn, and to
Binvert of all the 1-bit ALUs

32
32-bit ALU Operation

Heres the final ALU Operation table

ALU Operation Function
000 and
001 or
010 add
110 subtract
111 set on less than
33
32-bit ALU

In the end, our ALU will have the following
interface
Inputs
A and B (32 bits each)
ALU Operation (3 bits)
Outputs
CarryOut (1 bit)
Zero (1 bit)
Result (32 bits)
Overflow (1 bit)

34
Carry Lookahead

The adder architecture we previously looked at
requires n2 gate delays to compute its result
(worst case)
The longest path that a digital signal must
propagate through is called the critical path
This is WAAAYYYY too slow!
There other ways to build an adder that require
lg n delay
Obviously, using SOP, we can build a circuit that
will compute ANY function in 2 gate delays (2
levels of logic)
Obviously, in the case of a 64-input system, the
resulting design will be too big and too complex

35
Carry Lookahead

For example, we can easily see that the CarryIn
for bit 1 is computed as
c1(a0b0)(a0c0)(b0c0)
c2(a1b1)(a1c1)(b1c1)
Hardware executes in parallel, so using the
following fast CarryIn computation, we can
perform an add with 3 gate delays
c2(a1b1)(a1a0b0)(a1a0c0)(a1b0c0)(b1a0b0)(b1a
0c0)(b1b0c0)
I used the logical distributive law to compute
this
As you can see, the CarryIn logic gets bigger and
bigger for consecutive bits

36
Carry Lookahead

Carry Lookahead adders are faster than
ripple-carry adders
Recall
ci1(aibi)(aici)(bici)
ci can be factored out
ci1(aibi)(aibi)ci
So
c2(a1b1)(a1b1)((a0b0)(a0b0)c0)

37
Carry Lookahead

Note the repeated appearance of (aibi) and
(aibi)
They are called generate (gi) and propagate (pi)
giaibi, piaibi
ci1gipici
This means if gi1, a CarryOut is generated
If pi1, a CarryOut is propagated from CarryIn

38
Carry Lookahead

c1g0(p0c0)
c2g1(p1g0)(p1p0c0)
c3g2(p2g1)(p2p1g0)(p2p1p0c0)
c4g3(p3g2)(p3p2g1)(p3p2p1g0)(p3p2p1p0c0)
This system will give us an adder with 5 gate
delays but it is still too complex

39
Carry Lookahead

To solve this, well build our adder using 4-bit
adders with carry lookahead, and connect them
using super-propagate and generate logic
The superpropagate is only true if all the bits
propagate a carry
P0p0p1p2p3
P1p4p5p6p7
P2p8p9p10p11
P3p12p13p14p15

40
Carry Lookahead

The supergenerate follows a similar equation
G0g3(p3g2)(p2p2g1)(p3p2p1g0)
G1g7(p7g6)(p7p6g5)(p7p6p5g4)
G2g11(p11g10)(p11p10g9)(p11p10p9g8)
G3g15(p15g14)(p15p14g13)(p15p14p13g12)
The supergenerate and superpropagate logic for
the 4-4 bit Carry Lookahead adders is contained
in a Carry Lookahead Unit
This yields a worst-case delay of 7 gate delays
Reason?

41
Carry Lookahead

Weve covered all ALU functions except for the
shifter
Well talk after the shifter later

42
Lecture 4 Arithmetic for Computers(Part 4)

CS 447
Jason Bakos

43
Binary Multiplication

In multiplication, the first operand is called
the multiplicand, and the second is called the
multiplier
The result is called the product
Not counting the sign bits, if we multiply an
n-bit multiplicand with a m-bit multiplier, well
get a nm-bit product

44
Binary Multiplication

Binary multiplication works exactly like decimal
multiplication
In fact, multiply 100101 by 111001 and pretend
youre using decimal numbers

45
First Hardware Design for Multiplier
Note that the multiplier is not routed into the
ALU
46
Second Hardware Design for Multiplier

Architects realized that at the least, half of
the bits in the multiplicand register were 0
Reduce ALU to 32 bits, shift the product right
instead of shifting the multiplicand left
In this case, the product is only 32 bits

47
Second Hardware Design for Multiplier
48
Final Hardware Design for Multiplier

Lets combine the product register with the
multiplier register
Put the multiplier in the right half of the
product register and initialize the left half
with zeros when were done, the product will be
in the right half

49
Final Hardware Design for Multiplier
50
Final Hardware Design for Multiplier

For the first two designs, we need to convert the
multiplicand and the multiplier must be converted
to positive
The signs would need to be remembered so the
product can be converted to whatever sign it
needs to be
The third design will deal with signed numbers,
as long as the sign bit is extended in the
product register

51
Booths Algorithm

Booths Algorithm starts with the observation
that if we have the ability to both add and
subtract, there are multiple ways to compute a
product
For every 0 in the multiplier, we shift the
multiplicand
For every 1 in the multiplier, we add the
multiplicand to the product, then shift the
multiplicand

52
Booths Algorithm

Instead, when a 1 is seen in the multiplier,
subtract instead of add
Shift for all 1s after this, until the first 0
is seen, then add
The method was developed because in Booths era,
shifters were faster than adders

53
Booths Algorithm

Example
0010 2
x 0110 6
0000 0 shift
0010 -2 (21) subtract (first 1)
0000 0 shift (second 1)
0010 2 (23) (first 0)
-4162612

54
Lecture 4 Arithmetic for Computers(Part 5)

CS 447
Jason Bakos

55
Binary Division

Like last lecture, well start with some basic
terminology
Again, lets assume our numbers are base 10, but
lets only use 0s and 1s

56
Binary Division

Recall
DividendQuotientDivisor Remainder
Lets assume that both the dividend and divisor
are positive and hence the quotient and the
remainder are nonnegative
The division operands and both results are 32-bit
values and we will ignore the sign for now

57
First Hardware Design for Divider
Initialize the Quotient register to 0, initialize
the left-half of the Divisor register with the
divisor, and initialize the Remainder register
with the dividend (right-aligned)
58
Second Hardware Design for Divider
Much like with the multiplier, the divisor and
ALU can be reduced to 32-bits if we shift the
remainder right instead of shifting the divisor
to the left
Also, the algorithm must be changed so the
remainder is shifted left before the subtraction
takes place
59
Third Hardware Design for Divider
Shift the bits of the quotient into the remainder
register Also, the last step of the algorithm
is to shift the left half of the remainder right
1 bit
60
Signed Division

Simplest solution remember the signs of the
divisor and the dividend and then negate the
quotient if the signs disagree
The dividend and the remainder must have the same
signs

61
Considerations

The same hardware can be used for both multiply
and divide
Requirement 64-bit register that can shift left
or right and a 32-bit ALU that can add or subtract

62
Floating Point

Floating point (also called real) numbers are
used to represent values that are fractional or
that are too big to fit in a 32-bit integer
Floating point numbers are expressed in
scientific notation (base 2) and are normalized
(no leading 0s)
1.xxxx2 2yyyy
In this case, xxxx is the significand and yyyy is
the exponent

63
Floating Point

In MIPS, a floating point is represented in the
following manner (IEEE 754 standard)
bit 31 sign of significand
bit 30..23 (8) exponent (2s comp)
bit 22..0 (23) significand
Note that size of exponent and significand must
be traded off... accuracy vs. range
This allows us representation for signed numbers
as small as 2x10-38 to 2x1038
Overflow and underflow must be detected
Double-precision floating point numbers are 2
words... the significand is extended to 52 bits
and the exponent to 11 bits
Also, the first bit of the significand is
implicit (only the fractional part is specified)
In order to represent 0 in a float, put 0 in the
exponent field
So heres the equation we use (-1)S x
(1Significand) x 2E
Or (-1)S X (1 (s1x2-1) (s2x2-2) (s3x2-3)
(s4x2-4) ...) x 2E

64
Considerations

IEEE 754 sought to make floating-point numbers
easier to sort
sign is first bit
exponent comes first
But we want an all-0 (1) exponent to represent
the most-negative exponent and an all-1 exponent
to be the most positive
This is called biased-notation, so well use the
following equation
(-1)S x (1 Significand) x 2(Exponent-Bias)
Bias is 127 for single-precision and 1023 for
double-precision

65
Lecture 4 Arithmetic for Computers(Part 6)

CS 447
Jason Bakos

66
Converting Decimal Floating Point to Binary

Use the method I showed last lecture...
Significand
Use the iterative method to convert the
fractional part to binary
Convert the integer part to binary using the
old-fashioned method
Shift the decimal point to the left until the
number is normalized
Drop the leading 1, and set the exponent to be
the number of positions you shifted the decimal
point
Adjust the exponent for bias (127/1023)

67
Floating Point Addition

Lets add two decimal floating point numbers...
Lets try 9.999 x 101 1.610 x 10-1
Assume we can only store 4 digits of the
significand and two digits of the exponent

68
Floating Point Addition

Match exponents for both operands by
un-normalizing one of them
Match to the exponent of the larger number
Add significands
Normalize result
Round significand

69
Binary Floating Point Addition
70
Floating Point Multiplication

Example 1.110 x 1010 X 9.200 x 10-5
Assume 4 digits for significand and 2 digits for
exponent
Calculate the exponent of the product by simply
adding the exponents of the operand
10(-5)5
Bias the exponents
137122259
Somethings wrong! We added the biases with the
exponents...
5127132

71
Floating Point Multiplication

Multiply the significands...
1.110 x 9.20010.212000
Normalize and add add 1 to exponent
1.0212 x 106
Round significand to four digits
1.021
Set sign based on signs of operands
1.021 x 106

72
Floating Point Multiplication
73
Accurate Arithmetic

Integers can represent every value between the
largest and smallest possible values
This is not the case with floating point
Only 253 unique values can be represented with
double precision fp
IEEE 754 always keeps 2 extra bits on the right
of the significand during intermediate
calculation called guard and round to minimize
rounding errors

74
Accurate Arithmetic

Since the worst case for rounding would be when
the actual number is halfway between two floating
point representations, accuracy is measured as
number of least-significant error bits
This is called units in the last place (ulp)
IEEE 754 guarantees that the computer is within
.5 ulp (using guard and round)

Write a Comment

User Comments (0)