Number Representation Fixed and Floating Point - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Number Representation Fixed and Floating Point

Description:

Number Representation Fixed and Floating Point No Method Capable of Representing ALL Real Numbers Using Finite Register Lengths Must Use Approximations to Represent ... – PowerPoint PPT presentation

Number of Views:365
Avg rating:3.0/5.0
Slides: 38
Provided by: Mitc55
Category:

less

Transcript and Presenter's Notes

Title: Number Representation Fixed and Floating Point


1
Number RepresentationFixed and Floating Point
  • No Method Capable of Representing ALL Real
    Numbers Using Finite Register Lengths
  • Must Use Approximations to Represent Values
  • Concentrate on Two Forms
  • Fixed Point
  • Floating Point
  • Others are
  • Rational Number Systems uses ratios of integers
  • Logarithmic Number Systems uses signs and
    logarithms of values

2
Fixed Versus Floating Point
  • Fixed Point Values Represent Values where Any Two
    Differ by 1 unit in the last place (ulp)
  • Equal Spacing Between Numbers
  • Floating Point Values Use Two Multi-Bit Words
  • Mantissa
  • Exponent
  • Both Forms Must be Capable of Representing Signed
    Quantities
  • Fixed Point Values CAN be Used to Represent
    Fractional Quantities

3
Floating Point Characteristics
  • Total Number of Representations Total Bit
    Strings
  • For n-bit Register we have 2n
  • Range of Value is Larger than Fixed Point
  • Precision of Value is Smaller
  • Distance Between Two Consecutive Values Increases

4
Floating Point
s
e
m
s Sign Bit (signed magnitude) e Exponent (in
2s Complement Form) m Mantissa (significand or
fraction) mMAX1 - ulp 0,1)
hidden bit
float BIAS 127 (32 bits-23 for m and 8 for
e) double BIAS1023 (64 bits-52 for m and 11
for e) Sign of Exponent is Complement of its
MSb Thus, adding/subtracting bias is just
complementation of MSb
5
Floating Point Example
double 00000000 bfe80000 Big Endian MSW has
Higher Address
s
m
e
1 011 1111 1110 1000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000
s 1 e 1022 m 0.5 Value (-1)1?1.5
?2(1022-1023) Value -(1.5)(0.5) -0.75
6
Floating Point Normalization
  • Redundant /representations are Possible!
  • Hidden Bit Helps
  • Out of All Possible Representations, Choose One
    With Fewest Leading Zeros in Significand
  • This is Normalization
  • After Performing Arithmetic, Renormalization May
  • Need to be Accomplished

7
Floating Point Special Numbers
Value v when exponent e and fraction f are
special values (IEEE standard) Note NaN Not a
Number
8
IEEE/ANSI 754/854 Standard
9
Denormalized Numbers
  • Allows for Gradual Degradation for Underflow

10
Denormals
11
Operations Internal Precision
12
Floating Point Addition/Subtraction
13
Floating Point Multiplication/Division
14
Conversions and Roundings
15
Exceptions
16
Rounding Schemes
Signed Magnitude
Twos Complement
17
Round to Nearest (Signed Magnitude)
18
Rounding Comments
19
Round to Nearest Even/Odd
Round to Nearest Even
Round to Nearest Odd (R)
20
Jamming/von Neumann Rounding
21
ROM Rounding
22
Rounding
23
Rounding Examples
Round Towards
Downward Directed Rounding
24
Floating Point Operations
25
Adders/Subtractors
26
Operand Packing/Unpacking
27
Other Key Parts of FP Add/Sub Unit
28
Pre-Shifting
29
Four-stage Combinational Shifter
Pre-shifts Operand by 0 to 15 Bits
30
Leading Zeros/Ones Counting vs. Prediction
31
Leading Zeros Prediction
32
Guard Digits
  • What is the smallest number of extra digits
    needed for rounding? post-normalization?
  • Multiplication Double Length Result
  • Add/Sub w/ differing exp. Can have Double
    Length Result
  • FP Unit Provides One Length Result

33
Significand Ranges
  • Assume Significand M?(0,1-ulp
  • Then Normalized M ranges as
  • Multiplication prodM1?M2
  • For postnormalization need at most one shift left
    to get

34
Significand Ranges (cont)
  • Division quotM1?M2
  • Need at most one shift right to get
  • Conclusion
  • 1 Extra Digit Needed for Postnormalization
  • 1 Extra Digit Needed for Round-to-Nearest
  • 2 Extra Digits Needed
  • G - guard
  • R - round

35
Sticky Bit in std754
  • Round-to-Nearest-Even Requires 1 Extra Bit
  • The sticky bit, S
  • Turns out to be Logical-OR of Other Additional
    Bits

36
Floating Point Multiplier
37
Floating Point Divider
Write a Comment
User Comments (0)
About PowerShow.com