More ALUs and floating point numbers - PowerPoint PPT Presentation

About This Presentation
Title:

More ALUs and floating point numbers

Description:

use next bit of B to determine whether to add in shifted multiplicand ... ALU with add or subtract gets same result in more than one way: 6 = 2 8 0110 ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 48
Provided by: tar115
Learn more at: https://cseweb.ucsd.edu
Category:
Tags: addin | alus | floating | more | numbers | point

less

Transcript and Presenter's Notes

Title: More ALUs and floating point numbers


1
More ALUs and floating point numbers
  • Today The rest of chap 4
  • Multiplication, Division and Floating point
    numbers

2
The Story so far
  • Instruction Set Architectures
  • Performance issues
  • 2s complement, Addition, Subtraction

Basically ISA and some ALU stuff
3
CPU The big picture
Execute
Decode
Fetch
Fetch
Store
Next
Execute an entire instruction
Design hardware for each of these steps!!!
4
CPU Clocking
Clk
Setup
Hold
Setup
Hold
Dont Care
  • All storage elements are clocked by the same
    clock edge

5
CPU Big Picture Control and Data Path
Instructionlt310gt
Inst Memory
lt2125gt
lt2125gt
lt1620gt
lt1115gt
lt015gt
Adr
Op
Fun
Imm16
Rd
Rs
Rt
Control
ALUctr
MemtoReg
MemWr
nPC_sel
ALUSrc
RegDst
ExtOp
RegWr
Equal
DATA PATH
6
CPU The abstract version
Control
Ideal Instruction Memory
Control Signals
Conditions
Instruction
Rd
Rs
Rt
5
5
5
Instruction Address
A
Data Address
Data Out
32
Rw
Ra
Rb
32
Ideal Data Memory
32
32 32-bit Registers
Next Address
Data In
B
Clk
Clk
32
Datapath
  • Logical vs. Physical Structure

7
Computer Performance
Multiplication and Division
8
The 32 bit ALU-limited edition
  • Bit-slice plus extra on the two ends
  • Overflow means number too large for the
    representation
  • Carry-look ahead and other adder tricks

32
A
B
32
signed-arith and cin xor co
a0
b0
a31
b31
4
ALU0
ALU31
M
cin
co
cin
co
s0
s31
C/L to produce select, comp, c-in
32
Ovflw
S
9
The Design Process
  • Divide and Conquer (e.g., ALU)
  • Formulate a solution in terms of simpler
    components.
  • Design each of the components (subproblems)
  • Generate and Test (e.g., ALU)
  • Given a collection of building blocks, look for
    ways of putting them together that meets
    requirement
  • Successive Refinement (e.g., multiplier, divider)
  • Solve "most" of the problem (i.e., ignore some
    constraints or special cases), examine and
    correct shortcomings.
  • Formulate High-Level Alternatives (e.g., shifter)
  • Articulate many strategies to "keep in mind"
    while pursuing any one approach.
  • Work on the Things you Know How to Do
  • The unknown will become obvious as you make
    progress.
  • Optimization Criteria
  • Delay Logic levels, Fan in/out,
  • Area Gate count, Package count, Pin out
  • Cost, Power, Design time

10
The 32 bit ALU-limited edition
  • Supported Operations000 and001 or010
    add110 subtract111 slt
  • Tuned performance by using Carry-lookahead
    adders.
  • What about other instructions ?
  • multiply mult 2,3 Hi, Lo 2 x 3 64-bit
    signed product
  • multiply unsigned multu2,3 Hi, Lo 2 x 3
    64-bit unsigned product
  • divide div 2,3 Lo 2 3, Lo quotient, Hi
    remainder
  • Hi 2 mod 3
  • divide unsigned divu 2,3 Lo 2
    3, Unsigned quotient remainder

11
Grade school
  • Paper and pencil example
  • Multiplicand 1000Multiplier
    x 1001 1000 0000
    0000 1000 Product 1001000
  • m bits x n bits mn bit product
  • Binary makes it easy
  • 0 gt place 0 ( 0 x multiplicand)
  • 1 gt place multiplicand ( 1 x multiplicand)
  • well look at a couple of versions of
    multiplication hardware

12
Unsigned basic multiplier
  • Stage i accumulates A 2 i if Bi 1

13
Unsigned basic multiplier
0
0
0
0
0
0
0
B0
B1
B2
B3
P0
P1
P2
P3
P4
P5
P6
P7
  • at each stage shift A left ( x 2)
  • use next bit of B to determine whether to add in
    shifted multiplicand
  • accumulate 2n bit partial product at each stage

14
Unsigned basic multiplier
The algorithm
for(i0 ilt32 i) If ( mulitplier0 1 )
// we could do multiplieri and skip the shift
product multiplicand // product is 64
bit register // adder is 64 bit.
! multiplicand ltlt 1 // shift multiplicand to
prepare for next add // multiplicand is in a 64
bit register mulitplier gtgt 1 // position the
ith bit on lsb for test.
15
Unsigned basic multiplier
  • 64-bit Multiplicand reg, 64-bit ALU, 64-bit
    Product reg, 32-bit multiplier reg
  • Product Multiplier Multiplicand 0000 0000
    0011 0000 0010
  • 0000 0010 0001 0000 0100
  • 0000 0110 0000 0000 1000
  • 0000 0110

Multiplier datapath control
16
Some observations
  • Speed ?
  • Power/efficiency of the adder ?
  • Pattern of result on product register ?
  • 1 clock per cycle gt 100 clocks per multiply
  • Ratio of multiply to add 51 to 1001
  • 1/2 the bits in multiplicand always 0gt 64-bit
    adder is wasted
  • 0s inserted in left of multiplicand as
    shiftedgt least significant bits of product
    never changed once formed
  • Instead of shifting multiplicand to left, shift
    product to right?

17
Multiplier 2.0
  • 32-bit Multiplicand reg, 32 -bit ALU, 64-bit
    Product reg, 32-bit Multiplier reg

Multiplicand
32 bits
Multiplier
Shift Right
32-bit ALU
32 bits
Shift Right
Product
Control
Write
64 bits
18
Multiplier 2.0
for(i0 ilt32 i) If ( mulitplier0 1 )
product3116 multiplicand //
product is 64 bit register // adder is 32 bit.
! product gtgt 1 // shift product right //
saving producti0 for final result mulitplier
gtgt 1 // position the ith bit on lsb for
test.
19
Multiplier 2.0
  • Product Multiplier Multiplicand NextProduct
  • 0000 0000 0011 0010 00000010 0010 0000
  • 0001 0000 0001 0010 00010010 0011 0000
  • 0001 1000 0000 0010 00010000 0001 1000
  • 0000 1100 0000 0010 00000000 0000 1100
  • 0000 0110

20
Multiplier 3.0
  • Product register wastes space that exactly
    matches size of multipliergt combine Multiplier
    register and Product register

21
Multiplier 3.0
for(i0 ilt32 i) If ( product0 1 )
product3116 multiplicand //
product is 64 bit register // adder is 32 bit.
! product gtgt 1 // shift product right //
saving producti0 for final result
22
More observations ?
  • 2 steps per bit because Multiplier Product
    combined
  • MIPS registers Hi and Lo are left and right half
    of Product
  • Gives us MIPS instruction MultU
  • How can you make it faster?
  • What about signed multiplication?
  • easiest solution is to make both positive
    remember whether tocomplement product when done
    (leave out the sign bit, run for 31 steps)
  • apply definition of 2s complement
  • need to sign-extend partial products and subtract
    at the end
  • Booths Algorithm is elegant way to multiply
    signed numbers using same hardware as before and
    save cycles
  • can handle multiple bits at a time

23
Booths algorithm
  • Example 2 x 6 0010 x 0110
    0010 x 0110 0000 shift (0 in
    multiplier) 0010 add (1 in multiplier)
    0100 add (1 in multiplier) 0000 shift
    (0 in multiplier) 00001100
  • ALU with add or subtract gets same result in more
    than one way 6 2 8 0110
    00010 01000 11110 01000
  • For example
  • 0010 x 0110 0000
    shift (0 in multiplier) 0010 sub (first 1
    in multpl.) . 0000 shift (mid
    string of 1s) . 0010 add (prior step
    had last 1) 00001100

24
Booths algorithm
  • Current Bit Bit to the Right Explanation Example O
    p
  • 1 0 Begins run of 1s 0001111000 sub
  • 1 1 Middle of run of 1s 0001111000 none
  • 0 1 End of run of 1s 0001111000 add
  • 0 0 Middle of run of 0s 0001111000 none
  • Originally for Speed (when shift was faster than
    add)
  • Replace a string of 1s in multiplier with an
    initial subtract when we first see a one and then
    later add for the bit after the last one

1 10000 01111
25
Booths algorithm
Booths Example (2 x 7)
Operation Multiplicand Product next? 0. initial
value 0010 0000 0111 0 10 -gt sub
1a. P P - m 1110
1110 1110 0111 0 shift P (sign ext) 1b.
0010 1111 0011 1 11 -gt nop, shift 2. 0010 1111
1001 1 11 -gt nop, shift 3. 0010 1111 1100 1 01
-gt add 4a. 0010 0010 0001
1100 1 shift 4b. 0010 0000 1110 0 done
26
Booths algorithm
Booths Example (2 x -3)
Operation Multiplicand Product next? 0. initial
value 0010 0000 1101 0 10 -gt sub
1a. P P - m 1110
1110 1110 1101 0 shift P (sign ext) 1b.
0010 1111 0110 1 01 -gt add
0010 2a. 0001 0110 1 shift
P 2b. 0010 0000 1011 0 10 -gt sub
1110 3a. 0010 1110 1011
0 shift 3b. 0010 1111 0101 1 11
-gt nop 4a 1111 0101 1 shift 4b. 0010 1111 1010
1 done
27
Division
1001 Quotient Divisor 1000 1001010
Dividend 1000 10 101
1010 1000 10
Remainder (or Modulo result) See how big a
number can be subtracted, creating quotient bit
on each step Binary gt 1 divisor or 0
divisor Dividend Quotient x Divisor
Remaindergt sizeof( Dividend ) sizeof(
Quotient ) sizeof( Divisor ) 3 versions of
divide, successive refinement
28
Division 1.0
  • 64-bit Divisor reg, 64-bit ALU, 64-bit Remainder
    reg, 32-bit Quotient reg

Shift Right
Divisor
64 bits
Quotient
Shift Left
64-bit ALU
32 bits
Write
Remainder
Control
64 bits
29
Division 1.0
  • Takes n1 steps for n-bit Quotient Rem.
  • Quotient Divisor Remainder0000 0010 0000 0000
    0111

30
Division 2.0
  • 1/2 bits in divisor always 0gt 1/2 of 64-bit
    adder is wasted gt 1/2 of divisor is wasted
  • Instead of shifting divisor to right, shift
    remainder to left?
  • 1st step cannot produce a 1 in quotient bit
    (otherwise too big) gt switch order to shift
    first and then subtract, can save 1 iteration
  • 32-bit Divisor reg, 32-bit ALU, 64-bit Remainder
    reg, 32-bit Quotient reg

31
Division 2.0
Remainder gt 0
Test Remainder
Remainder lt 0
No lt n repetitions
Yes n repetitions
32
Division 3.0
  • Eliminate Quotient register by combining with
    Remainder as shifted left
  • Start by shifting the Remainder left as before.
  • Thereafter loop contains only two steps because
    the shifting of the Remainder register shifts
    both the remainder in the left half and the
    quotient in the right half
  • The consequence of combining the two registers
    together and the new order of the operations in
    the loop is that the remainder will shifted left
    one time too many.
  • Thus the final correction step must shift back
    only the remainder in the left half of the
    register
  • 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder
    reg, (0-bit Quotient reg)

33
Division 3.0
Remainder Divisor0000 0111 0010
Test Remainder
Remainder lt 0
Remainder 0
No lt n repetitions
Yes n repetitions (n 4 here)
34
Division some signed details
  • Sign of remainder ?
  • 7/4 (Q1, R3)
  • 7/4 (Q2, R-1)
  • Which do you prefer?
  • Convention
  • a/b (Q , R)
  • Sign(R) lt Sign(a)
  • Thus
  • 7/4 (Q1, R3)
  • -7/4 (Q-1,R-3)

a Qb R
a
R
Qb
0
Qb
R
-a
35
Floating Point
  • What can be represented in N bits?
  • Unsigned 0 to 2
  • 2s Complement - 2 to 2 - 1
  • 1s Complement -2 1 to 2 -1
  • But, what about?
  • very large numbers? 9,349,398,989,787,762,244,859,
    087,678
  • very small number? 0.0000000000000000000000045691
  • rationals 2/3
  • irrationals 2
  • transcendentals e

N
N-1
N-1
N-1
N-1
36
Floating Point
exponent
decimal point
23
-24
6.02 x 10 1.673 x 10
radix (base)
Mantissa
e - 127
IEEE F.P. 1.M x 2
Issues Arithmetic (, -, , / )
Representation, Normal form Range and
Precision Rounding Exceptions (e.g., divide
by zero, overflow, underflow) Errors
Properties ( negation, inversion, if A B then
A - B 0 )
37
Floating Point
Binary Fractions
10112 1x23 0x22 1x21 1x20 so... 101.0112
1x22 0x21 1x20 0x2-1 1x2-2
1x2-3 e.g., .75 3/4 3/22 1/2 1/4 .11
38
Floating Point
Representation of floating point numbers in IEEE
754 standard single precision
1
8
23
S
E
sign
M
mantissa sign magnitude, normalized binary
significand w/ hidden integer bit 1.M
exponent excess 127 binary integer
actual exponent is e E - 127
0 lt E lt 255
S
E-127
N (-1) 2 (1.M)
0 0 00000000 0 . . . 0 -1.5 1
01111111 10 . . . 0
Magnitude of numbers that can be represented is
in the range
-126
127
23
)
2
(1.0)
(2 - 2
to
2
which is approximately
-38
38
integer comparison valid on IEEE Fl.Pt. numbers
of same sign!
to
3.40 x 10
1.8 x 10
39
Floating Point
  • Leading 1 bit of significand is implicit
  • Exponent is biased to make sorting easier
  • all 0s is smallest exponent all 1s is largest
  • bias of 127 for single precision and 1023 for
    double precision
  • summary (1)sign (1significand)
    2exponent bias
  • Example
  • decimal -.75 -3/4 -3/22
  • binary -.11 -1.1 x 2-1
  • floating point exponent 126 01111110
  • IEEE single precision 10111111010000000000000000
    000000

Significand
Sign
Exponent
40
Floating Point
Floating Point Addition
  • How do you add in scientific notation?
  • 9.962 x 104 5.231 x 102
  • Basic Algorithm
  • 1. Align
  • 2. Add
  • 3. Normalize
  • 4. Round
  • Approximate algorithm.
  • While (Exp(A) gt Exp(B) )
  • shift Mantissa(B) right
  • Exp(B)
  • Mantissa(Result) Mantissa(A) Mantissa(B)
  • Exp(Result) Exp(A) // or Exp(B)
  • While (Mantissa(Result)msb !1!)
  • Exp(Result)--
  • Round(Mantissa)
  • Round(Exponent)

41
Floating Point
42
Floating Point Addition
43
Floating Point
Floating Point Multiplication
  • How do you multiply in scientific notation?
  • (9.9 x 104)(5.2 x 102) 5.148 x 107
  • Basic Algorithm
  • 1. Add exponents
  • 1a. Correct for bias in exponent representation
    (Exp - 127)
  • 2. Multiply
  • 3. Normalize
  • 4. Round
  • 5. Set Sign

44
Floating Point Accuracy Issues
FP Accuracy
  • Extremely important in scientific calculations
  • Very tiny errors can accumulate over time
  • IEEE 754 FP standard has four rounding modes
  • always round up
  • always round down
  • truncate
  • round to nearest
  • gt in case of tie, round to nearest even
  • Requires extra bits in intermediate
    representations

45
Floating Point Accuracy Issues
How many extra bits? IEEE Spec As if computed
the result exactly and rounded.
  • Guard bits -- bits to the right of the least
    significant bit of the significand computed for
    use in normalization (could become significant at
    that point) and rounding.
  • IEEE 754 has three extra bits and calls them
    guard, round, and sticky.

46
Floating Point Overflows
Infinity and NaNs
result of operation overflows, i.e., is larger
than the largest number that can be
represented overflow is not the same as divide
by zero (raises a different exception)
S 1 . . . 1 0 . . . 0
/- infinity
It may make sense to do further computations with
infinity e.g., X/0 gt Y may be a valid
comparison
Not a number, but not infinity (e.q. sqrt(-4))
invalid operation exception (unless operation
is or )
S 1 . . . 1 non-zero
NaN
HW decides what goes here
NaNs propagate f(NaN) NaN
47
Summary
  • Multiplication and division take much longer than
    addition, requiring multiple addition steps.
  • Floating Point extends the range of numbers that
    can be represented, at the expense of precision
    (accuracy).
  • FP operations are very similar to integer, but
    with pre- and post-processing.
  • Rounding implementation is critical to accuracy
    over time.
Write a Comment
User Comments (0)
About PowerShow.com