Number Representation Fixed and Floating Point

About This Presentation

Title:

Number Representation Fixed and Floating Point

Description:

Number Representation Fixed and Floating Point No Method Capable of Representing ALL Real Numbers Using Finite Register Lengths Must Use Approximations to Represent ... – PowerPoint PPT presentation

Number of Views:370

Avg rating:3.0/5.0

Slides: 38

Provided by: Mitc55

Category:

more less

Transcript and Presenter's Notes

Title: Number Representation Fixed and Floating Point

1
Number RepresentationFixed and Floating Point

No Method Capable of Representing ALL Real
Numbers Using Finite Register Lengths
Must Use Approximations to Represent Values
Concentrate on Two Forms
Fixed Point
Floating Point
Others are
Rational Number Systems uses ratios of integers
Logarithmic Number Systems uses signs and
logarithms of values

2
Fixed Versus Floating Point

Fixed Point Values Represent Values where Any Two
Differ by 1 unit in the last place (ulp)
Equal Spacing Between Numbers
Floating Point Values Use Two Multi-Bit Words
Mantissa
Exponent
Both Forms Must be Capable of Representing Signed
Quantities
Fixed Point Values CAN be Used to Represent
Fractional Quantities

3
Floating Point Characteristics

Total Number of Representations Total Bit
Strings
For n-bit Register we have 2n
Range of Value is Larger than Fixed Point
Precision of Value is Smaller
Distance Between Two Consecutive Values Increases

4
Floating Point
s
e
m
s Sign Bit (signed magnitude) e Exponent (in
2s Complement Form) m Mantissa (significand or
fraction) mMAX1 - ulp 0,1)
hidden bit
float BIAS 127 (32 bits-23 for m and 8 for
e) double BIAS1023 (64 bits-52 for m and 11
for e) Sign of Exponent is Complement of its
MSb Thus, adding/subtracting bias is just
complementation of MSb
5
Floating Point Example
double 00000000 bfe80000 Big Endian MSW has
Higher Address
s
m
e
1 011 1111 1110 1000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000
s 1 e 1022 m 0.5 Value (-1)1?1.5
?2(1022-1023) Value -(1.5)(0.5) -0.75
6
Floating Point Normalization

Redundant /representations are Possible!

Hidden Bit Helps
Out of All Possible Representations, Choose One
With Fewest Leading Zeros in Significand
This is Normalization
After Performing Arithmetic, Renormalization May
Need to be Accomplished

7
Floating Point Special Numbers
Value v when exponent e and fraction f are
special values (IEEE standard) Note NaN Not a
Number
8
IEEE/ANSI 754/854 Standard
9
Denormalized Numbers

Allows for Gradual Degradation for Underflow

10
Denormals
11
Operations Internal Precision
12
Floating Point Addition/Subtraction
13
Floating Point Multiplication/Division
14
Conversions and Roundings
15
Exceptions
16
Rounding Schemes
Signed Magnitude
Twos Complement
17
Round to Nearest (Signed Magnitude)
18
Rounding Comments
19
Round to Nearest Even/Odd
Round to Nearest Even
Round to Nearest Odd (R)
20
Jamming/von Neumann Rounding
21
ROM Rounding
22
Rounding
23
Rounding Examples
Round Towards
Downward Directed Rounding
24
Floating Point Operations
25
Adders/Subtractors
26
Operand Packing/Unpacking
27
Other Key Parts of FP Add/Sub Unit
28
Pre-Shifting
29
Four-stage Combinational Shifter
Pre-shifts Operand by 0 to 15 Bits
30
Leading Zeros/Ones Counting vs. Prediction
31
Leading Zeros Prediction
32
Guard Digits

What is the smallest number of extra digits
needed for rounding? post-normalization?
Multiplication Double Length Result
Add/Sub w/ differing exp. Can have Double
Length Result
FP Unit Provides One Length Result

33
Significand Ranges

Assume Significand M?(0,1-ulp
Then Normalized M ranges as

Multiplication prodM1?M2

For postnormalization need at most one shift left
to get

34
Significand Ranges (cont)

Division quotM1?M2

Need at most one shift right to get

Conclusion
1 Extra Digit Needed for Postnormalization
1 Extra Digit Needed for Round-to-Nearest
2 Extra Digits Needed
G - guard
R - round

35
Sticky Bit in std754

Round-to-Nearest-Even Requires 1 Extra Bit
The sticky bit, S
Turns out to be Logical-OR of Other Additional
Bits

36
Floating Point Multiplier
37
Floating Point Divider

Write a Comment

User Comments (0)

About PowerShow.com

Number Representation Fixed and Floating Point - PowerPoint PPT Presentation

Number Representation Fixed and Floating Point

Number Representation Fixed and Floating Point No Method Capable of Representing ALL Real Numbers Using Finite Register Lengths Must Use Approximations to Represent ... – PowerPoint PPT presentation