Title: More ALUs and floating point numbers
1More ALUs and floating point numbers
- Today The rest of chap 4
- Multiplication, Division and Floating point
numbers
2The Story so far
- Instruction Set Architectures
- Performance issues
- 2s complement, Addition, Subtraction
Basically ISA and some ALU stuff
3CPU The big picture
Execute
Decode
Fetch
Fetch
Store
Next
Execute an entire instruction
Design hardware for each of these steps!!!
4CPU Clocking
Clk
Setup
Hold
Setup
Hold
Dont Care
- All storage elements are clocked by the same
clock edge
5CPU Big Picture Control and Data Path
Instructionlt310gt
Inst Memory
lt2125gt
lt2125gt
lt1620gt
lt1115gt
lt015gt
Adr
Op
Fun
Imm16
Rd
Rs
Rt
Control
ALUctr
MemtoReg
MemWr
nPC_sel
ALUSrc
RegDst
ExtOp
RegWr
Equal
DATA PATH
6CPU The abstract version
Control
Ideal Instruction Memory
Control Signals
Conditions
Instruction
Rd
Rs
Rt
5
5
5
Instruction Address
A
Data Address
Data Out
32
Rw
Ra
Rb
32
Ideal Data Memory
32
32 32-bit Registers
Next Address
Data In
B
Clk
Clk
32
Datapath
- Logical vs. Physical Structure
7Computer Performance
Multiplication and Division
8The 32 bit ALU-limited edition
- Bit-slice plus extra on the two ends
- Overflow means number too large for the
representation - Carry-look ahead and other adder tricks
32
A
B
32
signed-arith and cin xor co
a0
b0
a31
b31
4
ALU0
ALU31
M
cin
co
cin
co
s0
s31
C/L to produce select, comp, c-in
32
Ovflw
S
9The Design Process
- Divide and Conquer (e.g., ALU)
- Formulate a solution in terms of simpler
components. - Design each of the components (subproblems)
- Generate and Test (e.g., ALU)
- Given a collection of building blocks, look for
ways of putting them together that meets
requirement - Successive Refinement (e.g., multiplier, divider)
- Solve "most" of the problem (i.e., ignore some
constraints or special cases), examine and
correct shortcomings. - Formulate High-Level Alternatives (e.g., shifter)
- Articulate many strategies to "keep in mind"
while pursuing any one approach. - Work on the Things you Know How to Do
- The unknown will become obvious as you make
progress.
- Optimization Criteria
- Delay Logic levels, Fan in/out,
- Area Gate count, Package count, Pin out
- Cost, Power, Design time
10The 32 bit ALU-limited edition
- Supported Operations000 and001 or010
add110 subtract111 slt
- Tuned performance by using Carry-lookahead
adders.
- What about other instructions ?
- multiply mult 2,3 Hi, Lo 2 x 3 64-bit
signed product - multiply unsigned multu2,3 Hi, Lo 2 x 3
64-bit unsigned product - divide div 2,3 Lo 2 3, Lo quotient, Hi
remainder - Hi 2 mod 3
- divide unsigned divu 2,3 Lo 2
3, Unsigned quotient remainder
11Grade school
- Paper and pencil example
- Multiplicand 1000Multiplier
x 1001 1000 0000
0000 1000 Product 1001000 - m bits x n bits mn bit product
- Binary makes it easy
- 0 gt place 0 ( 0 x multiplicand)
- 1 gt place multiplicand ( 1 x multiplicand)
- well look at a couple of versions of
multiplication hardware
12Unsigned basic multiplier
- Stage i accumulates A 2 i if Bi 1
13Unsigned basic multiplier
0
0
0
0
0
0
0
B0
B1
B2
B3
P0
P1
P2
P3
P4
P5
P6
P7
- at each stage shift A left ( x 2)
- use next bit of B to determine whether to add in
shifted multiplicand - accumulate 2n bit partial product at each stage
14Unsigned basic multiplier
The algorithm
for(i0 ilt32 i) If ( mulitplier0 1 )
// we could do multiplieri and skip the shift
product multiplicand // product is 64
bit register // adder is 64 bit.
! multiplicand ltlt 1 // shift multiplicand to
prepare for next add // multiplicand is in a 64
bit register mulitplier gtgt 1 // position the
ith bit on lsb for test.
15Unsigned basic multiplier
- 64-bit Multiplicand reg, 64-bit ALU, 64-bit
Product reg, 32-bit multiplier reg
- Product Multiplier Multiplicand 0000 0000
0011 0000 0010 - 0000 0010 0001 0000 0100
- 0000 0110 0000 0000 1000
- 0000 0110
Multiplier datapath control
16Some observations
- Speed ?
- Power/efficiency of the adder ?
- Pattern of result on product register ?
- 1 clock per cycle gt 100 clocks per multiply
- Ratio of multiply to add 51 to 1001
- 1/2 the bits in multiplicand always 0gt 64-bit
adder is wasted - 0s inserted in left of multiplicand as
shiftedgt least significant bits of product
never changed once formed - Instead of shifting multiplicand to left, shift
product to right?
17Multiplier 2.0
- 32-bit Multiplicand reg, 32 -bit ALU, 64-bit
Product reg, 32-bit Multiplier reg
Multiplicand
32 bits
Multiplier
Shift Right
32-bit ALU
32 bits
Shift Right
Product
Control
Write
64 bits
18Multiplier 2.0
for(i0 ilt32 i) If ( mulitplier0 1 )
product3116 multiplicand //
product is 64 bit register // adder is 32 bit.
! product gtgt 1 // shift product right //
saving producti0 for final result mulitplier
gtgt 1 // position the ith bit on lsb for
test.
19Multiplier 2.0
- Product Multiplier Multiplicand NextProduct
- 0000 0000 0011 0010 00000010 0010 0000
- 0001 0000 0001 0010 00010010 0011 0000
- 0001 1000 0000 0010 00010000 0001 1000
- 0000 1100 0000 0010 00000000 0000 1100
- 0000 0110
20Multiplier 3.0
- Product register wastes space that exactly
matches size of multipliergt combine Multiplier
register and Product register
21Multiplier 3.0
for(i0 ilt32 i) If ( product0 1 )
product3116 multiplicand //
product is 64 bit register // adder is 32 bit.
! product gtgt 1 // shift product right //
saving producti0 for final result
22More observations ?
- 2 steps per bit because Multiplier Product
combined - MIPS registers Hi and Lo are left and right half
of Product - Gives us MIPS instruction MultU
- How can you make it faster?
- What about signed multiplication?
- easiest solution is to make both positive
remember whether tocomplement product when done
(leave out the sign bit, run for 31 steps) - apply definition of 2s complement
- need to sign-extend partial products and subtract
at the end - Booths Algorithm is elegant way to multiply
signed numbers using same hardware as before and
save cycles - can handle multiple bits at a time
23Booths algorithm
- Example 2 x 6 0010 x 0110
0010 x 0110 0000 shift (0 in
multiplier) 0010 add (1 in multiplier)
0100 add (1 in multiplier) 0000 shift
(0 in multiplier) 00001100 - ALU with add or subtract gets same result in more
than one way 6 2 8 0110
00010 01000 11110 01000 - For example
- 0010 x 0110 0000
shift (0 in multiplier) 0010 sub (first 1
in multpl.) . 0000 shift (mid
string of 1s) . 0010 add (prior step
had last 1) 00001100
24Booths algorithm
- Current Bit Bit to the Right Explanation Example O
p - 1 0 Begins run of 1s 0001111000 sub
- 1 1 Middle of run of 1s 0001111000 none
- 0 1 End of run of 1s 0001111000 add
- 0 0 Middle of run of 0s 0001111000 none
- Originally for Speed (when shift was faster than
add) - Replace a string of 1s in multiplier with an
initial subtract when we first see a one and then
later add for the bit after the last one
1 10000 01111
25Booths algorithm
Booths Example (2 x 7)
Operation Multiplicand Product next? 0. initial
value 0010 0000 0111 0 10 -gt sub
1a. P P - m 1110
1110 1110 0111 0 shift P (sign ext) 1b.
0010 1111 0011 1 11 -gt nop, shift 2. 0010 1111
1001 1 11 -gt nop, shift 3. 0010 1111 1100 1 01
-gt add 4a. 0010 0010 0001
1100 1 shift 4b. 0010 0000 1110 0 done
26Booths algorithm
Booths Example (2 x -3)
Operation Multiplicand Product next? 0. initial
value 0010 0000 1101 0 10 -gt sub
1a. P P - m 1110
1110 1110 1101 0 shift P (sign ext) 1b.
0010 1111 0110 1 01 -gt add
0010 2a. 0001 0110 1 shift
P 2b. 0010 0000 1011 0 10 -gt sub
1110 3a. 0010 1110 1011
0 shift 3b. 0010 1111 0101 1 11
-gt nop 4a 1111 0101 1 shift 4b. 0010 1111 1010
1 done
27Division
1001 Quotient Divisor 1000 1001010
Dividend 1000 10 101
1010 1000 10
Remainder (or Modulo result) See how big a
number can be subtracted, creating quotient bit
on each step Binary gt 1 divisor or 0
divisor Dividend Quotient x Divisor
Remaindergt sizeof( Dividend ) sizeof(
Quotient ) sizeof( Divisor ) 3 versions of
divide, successive refinement
28Division 1.0
- 64-bit Divisor reg, 64-bit ALU, 64-bit Remainder
reg, 32-bit Quotient reg
Shift Right
Divisor
64 bits
Quotient
Shift Left
64-bit ALU
32 bits
Write
Remainder
Control
64 bits
29Division 1.0
- Takes n1 steps for n-bit Quotient Rem.
- Quotient Divisor Remainder0000 0010 0000 0000
0111
30Division 2.0
- 1/2 bits in divisor always 0gt 1/2 of 64-bit
adder is wasted gt 1/2 of divisor is wasted - Instead of shifting divisor to right, shift
remainder to left? - 1st step cannot produce a 1 in quotient bit
(otherwise too big) gt switch order to shift
first and then subtract, can save 1 iteration
- 32-bit Divisor reg, 32-bit ALU, 64-bit Remainder
reg, 32-bit Quotient reg
31Division 2.0
Remainder gt 0
Test Remainder
Remainder lt 0
No lt n repetitions
Yes n repetitions
32Division 3.0
- Eliminate Quotient register by combining with
Remainder as shifted left - Start by shifting the Remainder left as before.
- Thereafter loop contains only two steps because
the shifting of the Remainder register shifts
both the remainder in the left half and the
quotient in the right half - The consequence of combining the two registers
together and the new order of the operations in
the loop is that the remainder will shifted left
one time too many. - Thus the final correction step must shift back
only the remainder in the left half of the
register
- 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder
reg, (0-bit Quotient reg)
33Division 3.0
Remainder Divisor0000 0111 0010
Test Remainder
Remainder lt 0
Remainder 0
No lt n repetitions
Yes n repetitions (n 4 here)
34Division some signed details
- Sign of remainder ?
- 7/4 (Q1, R3)
- 7/4 (Q2, R-1)
- Which do you prefer?
- Convention
- a/b (Q , R)
- Sign(R) lt Sign(a)
- Thus
- 7/4 (Q1, R3)
- -7/4 (Q-1,R-3)
a Qb R
a
R
Qb
0
Qb
R
-a
35Floating Point
- What can be represented in N bits?
- Unsigned 0 to 2
- 2s Complement - 2 to 2 - 1
- 1s Complement -2 1 to 2 -1
- But, what about?
- very large numbers? 9,349,398,989,787,762,244,859,
087,678 - very small number? 0.0000000000000000000000045691
- rationals 2/3
- irrationals 2
- transcendentals e
N
N-1
N-1
N-1
N-1
36Floating Point
exponent
decimal point
23
-24
6.02 x 10 1.673 x 10
radix (base)
Mantissa
e - 127
IEEE F.P. 1.M x 2
Issues Arithmetic (, -, , / )
Representation, Normal form Range and
Precision Rounding Exceptions (e.g., divide
by zero, overflow, underflow) Errors
Properties ( negation, inversion, if A B then
A - B 0 )
37Floating Point
Binary Fractions
10112 1x23 0x22 1x21 1x20 so... 101.0112
1x22 0x21 1x20 0x2-1 1x2-2
1x2-3 e.g., .75 3/4 3/22 1/2 1/4 .11
38Floating Point
Representation of floating point numbers in IEEE
754 standard single precision
1
8
23
S
E
sign
M
mantissa sign magnitude, normalized binary
significand w/ hidden integer bit 1.M
exponent excess 127 binary integer
actual exponent is e E - 127
0 lt E lt 255
S
E-127
N (-1) 2 (1.M)
0 0 00000000 0 . . . 0 -1.5 1
01111111 10 . . . 0
Magnitude of numbers that can be represented is
in the range
-126
127
23
)
2
(1.0)
(2 - 2
to
2
which is approximately
-38
38
integer comparison valid on IEEE Fl.Pt. numbers
of same sign!
to
3.40 x 10
1.8 x 10
39Floating Point
- Leading 1 bit of significand is implicit
- Exponent is biased to make sorting easier
- all 0s is smallest exponent all 1s is largest
- bias of 127 for single precision and 1023 for
double precision - summary (1)sign (1significand)
2exponent bias - Example
- decimal -.75 -3/4 -3/22
- binary -.11 -1.1 x 2-1
- floating point exponent 126 01111110
- IEEE single precision 10111111010000000000000000
000000
Significand
Sign
Exponent
40Floating Point
Floating Point Addition
- How do you add in scientific notation?
- 9.962 x 104 5.231 x 102
- Basic Algorithm
- 1. Align
- 2. Add
- 3. Normalize
- 4. Round
- Approximate algorithm.
- While (Exp(A) gt Exp(B) )
-
- shift Mantissa(B) right
- Exp(B)
-
- Mantissa(Result) Mantissa(A) Mantissa(B)
- Exp(Result) Exp(A) // or Exp(B)
- While (Mantissa(Result)msb !1!)
-
- Exp(Result)--
-
- Round(Mantissa)
- Round(Exponent)
41Floating Point
42Floating Point Addition
43Floating Point
Floating Point Multiplication
- How do you multiply in scientific notation?
- (9.9 x 104)(5.2 x 102) 5.148 x 107
- Basic Algorithm
- 1. Add exponents
- 1a. Correct for bias in exponent representation
(Exp - 127) - 2. Multiply
- 3. Normalize
- 4. Round
- 5. Set Sign
44Floating Point Accuracy Issues
FP Accuracy
- Extremely important in scientific calculations
- Very tiny errors can accumulate over time
- IEEE 754 FP standard has four rounding modes
- always round up
- always round down
- truncate
- round to nearest
- gt in case of tie, round to nearest even
- Requires extra bits in intermediate
representations -
45Floating Point Accuracy Issues
How many extra bits? IEEE Spec As if computed
the result exactly and rounded.
- Guard bits -- bits to the right of the least
significant bit of the significand computed for
use in normalization (could become significant at
that point) and rounding. - IEEE 754 has three extra bits and calls them
guard, round, and sticky.
46Floating Point Overflows
Infinity and NaNs
result of operation overflows, i.e., is larger
than the largest number that can be
represented overflow is not the same as divide
by zero (raises a different exception)
S 1 . . . 1 0 . . . 0
/- infinity
It may make sense to do further computations with
infinity e.g., X/0 gt Y may be a valid
comparison
Not a number, but not infinity (e.q. sqrt(-4))
invalid operation exception (unless operation
is or )
S 1 . . . 1 non-zero
NaN
HW decides what goes here
NaNs propagate f(NaN) NaN
47Summary
- Multiplication and division take much longer than
addition, requiring multiple addition steps. - Floating Point extends the range of numbers that
can be represented, at the expense of precision
(accuracy). - FP operations are very similar to integer, but
with pre- and post-processing. - Rounding implementation is critical to accuracy
over time.