Title: Floating Point Numbers
1Lecture 5
ITEC 1000 Introduction to Information Technology
www.governmentauctions.org
Prof. Peter Khaiter
2Lecture Template
- Floating Point Numbers
- Exponential Notation
- Excess-50 Notation
- Overflow and Underflow
- Floating Point Calculations
- Normalization in Floating Point
- IEEE 754 Standard
- Packed Decimal Format
- Programming Considerations
3Floating Point Numbers
- Real numbers
- Used in computer when the number
- is outside the integer range of the computer (too
large or too small) - contains a decimal fraction
- the range in PCs
- r
- or more
-
4Exponential Notation
- The following are equivalent representations of
1,234
123,400.0 x 10-2 12,340.0 x 10-1 1,234.0
x 100 123.4 x 101 12.34 x 102
1.234 x 103 0.1234 x 104
The representations differ in that the decimal
place the point -- floats to the left or
right (with the appropriate adjustment in the
exponent).
5Exponential Notation
- Also called scientific notation
- 4 specifications required for a number
- Sign ( in example)
- Magnitude or mantissa (12345)
- Sign of the exponent ( in 105)
- Magnitude of the exponent (5)
- Plus
- Base of the exponent (10)
- Location of decimal point (or other base) radix
point
6Parts of a Floating Point Number
-0.9876 x 10-3
7Floating Point Format Specification
- Integer format (8-bit word)
- 7 decimal digits and a sign
- Range -9,999,999 lt I lt 9,999,999
- Floating point format (8-bit word)
8Format
- Mantissa stored in sign-magnitude format
- Assume decimal point located at the beginning of
mantissa - Exponent stored in Excess-N notation
Complementary notation - Pick middle value as offset where N is the middle
value 0..99 e.g., excess-50
9Excess-50 notation
- Excess-N representation R N EE
- Example1 N 50, EE 38, R 88
- Example2 N 50, EE -38, R 12
- Excess-50 Magnitude range
10Overflow and Underflow
- Possible for the number to be too large or too
small for representation
0.00001 x 10-50 10-55
11Floating Point Format Excess-50
- First digit represents the sign of mantissa
- 0 is used as a sign
- 5 is used as a -sign (arbitrarily)
- Two next digits represent exponent in excess-50
- Five last digits represent mantissa
- fixed decimal point located at the beginning
12Examples
13Normalization
- Shift numbers left by increasing the exponent
until leading zeros eliminated - Converting decimal number into standard format
- Provide number with exponent (0 if not yet
specified) - Increase/decrease exponent to shift decimal point
to proper position - Decrease exponent to eliminate leading zeros on
mantissa - Correct precision by adding 0s or
discarding/rounding least significant digits
14Example 1 246.8035
Sign
Excess-50 exponent
Mantissa
15Example 2 1255 x 10-3
16Example 3 - 0.00000075
17Floating Point Calculations
- Addition and subtraction
- Exponent and mantissa treated separately
- Exponents of numbers must agree
- Align decimal points
- Least significant digits may be lost
- Mantissa overflow requires exponent again shifted
right
18Example
Precision lost
19Multiplication and Division
- Mantissas multiplied or divided
- Exponents added or subtracted
- Normalization necessary to
- Restore location of decimal point
- Maintain precision of the result
- Adjust excess value since added twice
- Example 2 numbers with exponent 53 represented
in excess-50 notation - 53 53 106
- Since 50 added twice, subtract 106 50 56
- Maintaining precision
- Normalizing and rounding multiplication
20Example
21Floating Point in the Computer
- Replace digits with 0 and 1 bits
- Typical floating point format
- 32 bits provide range 10-38 to 1038
- 8-bit exponent 256 levels
- Excess-128 notation
- 23 bits of mantissa approximately 7 decimal
digits of precision
22IEEE 754 Standard
- Most common standard for representing floating
point numbers - Single precision 32 bits, consisting of...
- Sign bit (1 bit)
- Exponent (8 bits)
- Mantissa (23 bits)
- Double precision 64 bits, consisting of
- Sign bit (1 bit)
- Exponent (11 bits)
- Mantissa (52 bits)
23Single Precision Format
32 bits
24Double Precision Format
64 bits
25IEEE 754 Standard
26IEEE 754 Standard
- 32-bit Floating Point Value Definition
27Normalization in Floating Point
- Mantissa
- Must always start with 1
- Leading bit is not stored
- Implied that it is located to the left of the
binary point - Normalized Form 1.MMMMMMM
- E.g.
- Mantissa
- Actual value
- Exponent
- Formatted using Excess-127 notation
- Base 2 is implied
- Range 2-126 to 2127
10100000000000000000000 1.1012 1.62510
28Excess Notation Example
Represent exponent of 1410 in excess-127 form
12710 011111112 1410
000011102 Representation 100011012
14110
29Excess Notation Example
Represent exponent of -810 in excess 127 form
12710 011111112 - 810 -
000010002 Representation
011101112
11910
30Single Precision Example
0 10000010 11000000000000000000000
31Single Precision Exercise
- What decimal value is represented by the
following 32-bit floating point number? - Answer
1 10000010 11110110000000000000000
Skip answer
Answer
32Single Precision Exercise
Answer
- What decimal value is represented by the
following 32-bit floating point number? - Answer -15.6875
1 10000010 11110110000000000000000
33Step by Step Solution
1 10000010 11110110000000000000000
To decimal form
130 - 127 3
1.11110110000000000000000000
1 .5 .25 .125 .0625 0 .015625
.0078125
1.9609375
23
15.6875
- 15.6875
( negative )
34Step by Step Solution Alternative Method
1 10000010 11110110000000000000000
To decimal form
130 - 127 3
1.11110110000000000000000000
1111.10110000000000000000000
Shift Point
- 15.6875
( negative )
35IBM floating point formats
36Alpha floating point formats
37Exercise Floating Point Conversion
- Express 3.14 as a 32-bit floating point number
- Answer
- (Note only use 10 significant bits for the
mantissa)
Skip answer
Answer
38Exercise Floating Point Conversion
Answer
- Express 3.14 as a 32-bit floating point number
- Answer
- (Note only use 10 significant bits for the
mantissa)
0 10000000 10010001111000000000000
39Detail Solution 3.14 to IEEE double precision
3.14 To Binary (approx)
11.001000111101
Delete implied left-most 1 and normalize
1001000111101
Prove !
Exponent 127 1 position point moved when
normalized
10000000
Value is positive Sign bit 0
0 10000000 10010001111010000000000
40Packed Decimal Format
- Limited use e.g where precision particularly
important, as in accounting and business
functions. - Similar to BCD e.g four bit representation, as
in BCD. - -gt Stores two digits per byte.
- Supported by business-oriented languages like
COBOL - Implemented in IBM System 370/390 and Compaq
Alpha
41Packed Decimal Format
- Each decimal digit is stored in BCD
- Two digits in a byte
- The most significant digit stored first, in the
high-order bits of the first byte - Can store up to 31 digits in 16 bytes
- The sign is stored in the low-order bits of the
last byte - Binary 1100 represents
- Binary 1101 represents -
- Binary 1111 represents unsigned number
- Decimal point not stored must be maintained by
application software
42Packed Decimal Format Example 1
Decimal Value 1 0 3 5 7, unsigned Packed
Decimal 0001 0000 0011 0101 0111 1111
Byte 1 Byte 2 Byte 3
43Packed Decimal Format Example 2
Decimal Value - 9 0 4 1 3 Packed
Decimal 1001 0000 0100 0001 0011 1101
Byte 1 Byte 2 Byte 3
44Integer vs. Floating Point Programming
Considerations
- Integer advantages
- Easier for computer to perform
- Potential for higher precision
- Faster to execute
- Fewer storage locations to save time and space
- Most high-level languages provide 2 or more
different integer word sizes/formats - Short integer (16 bits)
- Long integer (64 bits)
45Integer vs. Floating Point Programming
Considerations
- Real numbers, if
- Variable or constant has fractional part
- Numbers take on very large or very small values
outside integer range - Program should use least precision sufficient for
the task - Higher precision formats require more storage
- Packed decimal attractive alternative for
business applications
46Thank you!
Reading Lecture slides and notes, Chapter 5