IEEE Floating Point

About This Presentation

Title:

IEEE Floating Point

Description:

Nice standards for rounding, overflow, underflow. Hard to make go fast ... 1. Round down: rounded result is close to but no greater than true result. ... – PowerPoint PPT presentation

Number of Views:122

Avg rating:3.0/5.0

Slides: 60

Provided by: randa59

Learn more at: http://people.cs.uchicago.edu

Category:

more less

Transcript and Presenter's Notes

Title: IEEE Floating Point

1
IEEE Floating Point

IEEE Standard 754
Established in 1985 as uniform standard for
floating point arithmetic
Before that, many idiosyncratic formats
Supported by all major CPUs
Driven by Numerical Concerns
Nice standards for rounding, overflow, underflow
Hard to make go fast
Numerical analysts predominated over hardware
types in defining standard

2
Fractional Binary Numbers
2i
2i1
4

2
1
1/2

1/4
1/8
2j

Representation
Bits to right of binary point represent
fractional powers of 2
Represents rational number

3
Frac. Binary Number Examples

Value Representation
5-3/4 101.112
2-7/8 10.1112
63/64 0.1111112
Observations
Divide by 2 by shifting right
Multiply by 2 by shifting left
Numbers of form 0.1111112 just below 1.0
1/2 1/4 1/8 1/2i ? 1.0
Use notation 1.0 ?

4
Representable Numbers

Limitation
Can only exactly represent numbers of the form
x/2k
Other numbers have repeating bit representations
Value Representation
1/3 0.0101010101012
1/5 0.00110011001100112
1/10 0.000110011001100112

5
Floating Point Representation

Numerical Form
1s M 2E
Sign bit s determines whether number is negative
or positive
Significand M normally a fractional value in
range 1.0,2.0).
Exponent E weights value by power of two
Encoding
MSB is sign bit
exp field encodes E
frac field encodes M

s
exp
frac
6
Floating Point Precisions

Encoding
MSB is sign bit
exp field encodes E
frac field encodes M
Sizes
Single precision 8 exp bits, 23 frac bits
32 bits total
Double precision 11 exp bits, 52 frac bits
64 bits total
Extended precision 15 exp bits, 63 frac bits
Only found in Intel-compatible machines
Stored in 80 bits
1 bit wasted

7
Normalized Numeric Values

Condition
exp ? 0000 and exp ? 1111
Exponent coded as biased value
E Exp Bias
Exp unsigned value denoted by exp
Bias Bias value
Single precision 127 (Exp 1254, E -126127)
Double precision 1023 (Exp 12046, E
-10221023)
in general Bias 2e-1 - 1, where e is number of
exponent bits
Significand coded with implied leading 1
M 1.xxxx2
xxxx bits of frac
Minimum when 0000 (M 1.0)
Maximum when 1111 (M 2.0 ?)
Get extra leading bit for free

8
Normalized Encoding Example

Value
Float F 15213.0
1521310 111011011011012 1.11011011011012 X
213
Significand
M 1.11011011011012
frac 110110110110100000000002
Exponent
E 13
Bias 127
Exp 140 100011002

Floating Point Representation (Class 02) Hex
4 6 6 D B 4 0 0 Binary
0100 0110 0110 1101 1011 0100 0000 0000 140
100 0110 0 15213 1110 1101 1011 01
9
Denormalized Values

Condition
exp 0000
Value
Exponent value E Bias 1
Significand value M 0.xxxx2
xxxx bits of frac
Cases
exp 0000, frac 0000
Represents value 0
Note that have distinct values 0 and 0
exp 0000, frac ? 0000
Numbers very close to 0.0
Lose precision as get smaller
Gradual underflow

10
Special Values

Condition
exp 1111
Cases
exp 1111, frac 0000
Represents value???(infinity)
Operation that overflows
Both positive and negative
E.g., 1.0/0.0 ?1.0/?0.0 ?, 1.0/?0.0 ??
exp 1111, frac ? 0000
Not-a-Number (NaN)
Represents case when no numeric value can be
determined
E.g., sqrt(1), ?????

11
Tiny Floating Point Example

8-bit Floating Point Representation
the sign bit is in the most significant bit.
the next four bits are the exponent, with a bias
of 7.
the last three bits are the frac
Same General Form as IEEE Format
normalized, denormalized
representation of 0, NaN, infinity

0
2
3
6
7
s
exp
frac
12
Values Related to the Exponent
Exp exp E 2E 0 0000 -6 1/64 (denorms) 1 0001 -6
1/64 2 0010 -5 1/32 3 0011 -4 1/16 4 0100 -3 1/8 5
0101 -2 1/4 6 0110 -1 1/2 7 0111
0 1 8 1000 1 2 9 1001 2 4 10 1010 3 8 11 1011
4 16 12 1100 5 32 13 1101 6 64 14 1110 7 128 15
1111 n/a (inf, NaN)
13
Dynamic Range
s exp frac E Value 0 0000 000 -6 0 0 0000
001 -6 1/81/64 1/512 0 0000 010 -6 2/81/64
2/512 0 0000 110 -6 6/81/64 6/512 0 0000
111 -6 7/81/64 7/512 0 0001 000 -6 8/81/64
8/512 0 0001 001 -6 9/81/64 9/512 0 0110
110 -1 14/81/2 14/16 0 0110 111 -1 15/81/2
15/16 0 0111 000 0 8/81 1 0 0111
001 0 9/81 9/8 0 0111 010 0 10/81
10/8 0 1110 110 7 14/8128 224 0 1110
111 7 15/8128 240 0 1111 000 n/a inf
closest to zero
Denormalized numbers
largest denorm
smallest norm
closest to 1 below
Normalized numbers
closest to 1 above
largest norm
14
Floating Point Operations

Conceptual View
First compute exact result
Make it fit into desired precision
Possibly overflow if exponent too large
Possibly round to fit into frac
Rounding Modes (illustrate with rounding)
1.40 1.60 1.50 2.50 1.50
Zero 1 1 1 2 1
Round down (-?) 1 1 1 2 2
Round up (?) 2 2 2 3 1
Nearest Even (default) 1 2 2 2 2

Note 1. Round down rounded result is close to
but no greater than true result. 2. Round up
rounded result is close to but no less than true
result.
15
Closer Look at Round-To-Even

Default Rounding Mode
Hard to get any other kind without dropping into
assembly
All others are statistically biased
Sum of set of positive numbers will consistently
be over- or under- estimated
Applying to Other Decimal Places / Bit Positions
When exactly halfway between two possible values
Round so that least significant digit is even
E.g., round to nearest hundredth
1.2349999 1.23 (Less than half way)
1.2350001 1.24 (Greater than half way)
1.2350000 1.24 (Half wayround up)
1.2450000 1.24 (Half wayround down)

16
Rounding Binary Numbers

Binary Fractional Numbers
Even when least significant bit is 0
Half way when bits to right of rounding position
1002
Examples
Round to nearest 1/4 (2 bits right of binary
point)
Value Binary Rounded Action Rounded Value
2 3/32 10.000112 10.002 (lt1/2down) 2
2 3/16 10.001102 10.012 (gt1/2up) 2 1/4
2 7/8 10.111002 11.002 (1/2up) 3
2 5/8 10.101002 10.102 (1/2down) 2 1/2

17
FP Multiplication

Operands
(1)s1 M1 2E1 (1)s2 M2 2E2
Exact Result
(1)s M 2E
Sign s s1 s2
Significand M M1 M2
Exponent E E1 E2
Fixing
If M 2, shift M right, increment E
If E out of range, overflow
Round M to fit frac precision
Implementation
Biggest chore is multiplying significands

18
FP Addition

Operands
(1)s1 M1 2E1
(1)s2 M2 2E2
Assume E1 gt E2
Exact Result
(1)s M 2E
Sign s, significand M
Result of signed align add
Exponent E E1
Fixing
If M 2, shift M right, increment E
if M lt 1, shift M left k positions, decrement E
by k
Overflow if E out of range
Round M to fit frac precision

19
Floating Point in C

C Guarantees Two Levels
float single precision
double double precision
Conversions
Casting between int, float, and double changes
numeric values
Double or float to int
Truncates fractional part
Like rounding toward zero
Not defined when out of range
Generally saturates to TMin or TMax
int to double
Exact conversion, as long as int has 53 bit
word size
int to float
Will round according to rounding mode

20
Assembly Programmers View
CPU
Memory
Addresses
Registers
E I P
Object Code Program Data OS Data
Data
Condition Codes
Instructions
Stack

Programmer-Visible State
EIP Program Counter
Address of next instruction
Register File
Heavily used program data
Condition Codes
Store status information about most recent
arithmetic operation
Used for conditional branching

Memory
Byte addressable array
Code, user data, (some) OS data
Includes stack used to support procedures

21
Turning C into Object Code

Code in files p1.c p2.c
Compile with command gcc -O p1.c p2.c -o p
Use optimizations (-O)
Put resulting binary in file p

C program (p1.c p2.c)
text
Compiler (gcc -S)
Asm program (p1.s p2.s)
text
Assembler (gcc or as)
Object program (p1.o p2.o)
Static libraries (.a)
binary
Linker (gcc or ld)
binary
Executable program (p)
22
Compiling Into Assembly

C Code

int sum(int x, int y) int t xy return
t
Obtain with command gcc -O -S code.c Produces
file code.s
Generated Assembly
pushl ebp movl esp,ebp movl
12(ebp),eax addl 8(ebp),eax movl eax,
-4(ebp) movl -4(ebp),eax leave ret
23
Assembly Characteristics

Minimal Data Types
Integer data of 1, 2, or 4 bytes
Data values
Addresses (untyped pointers)
Floating point data of 4, 8, or 10 bytes
No aggregate types such as arrays or structures
Just contiguously allocated bytes in memory
Primitive Operations
Perform arithmetic function on register or memory
data
Transfer data between memory and register
Load data from memory into register
Store register data into memory
Transfer control
Unconditional jumps to/from procedures
Conditional branches

24
Machine Instruction Example

C Code
Add two signed integers
Assembly
Add 2 4-byte integers
Same instruction whether signed or unsigned
Operands
x Register eax
y Memory Mebp8
t Register eax
Return function value in eax
Object Code
3-byte instruction
Stored at address 0x401046

int t xy
addl 8(ebp),eax
Similar to expression x y
0x401046 03 45 08
25
Moving Data

Moving Data
movl Source,Dest
Move 4-byte (long) word
Lots of these in typical code
Operand Types
Immediate Constant integer data
Like C constant, but prefixed with
E.g., 0x400, -533
Encoded with 1, 2, or 4 bytes
Register One of 8 integer registers
But esp and ebp reserved for special use
Others have special uses for particular
instructions
Memory 4 consecutive bytes of memory
Various address modes

26
movl Operand Combinations
Source
Destination
C Analog
Reg
movl 0x4,eax
temp 0x4
Imm
Mem
movl -147,(eax)
p -147
Reg
movl eax,edx
temp2 temp1
movl
Reg
Mem
movl eax,(edx)
p temp
Mem
Reg
movl (eax),edx
temp p

Cannot do memory-memory transfers with single
instruction

27
Simple Addressing Modes

Normal (R) MemRegR
Register R specifies memory address
movl (ecx),eax
Displacement D(R) MemRegRD
Register R specifies start of memory region
Constant displacement D specifies offset
movl 8(ebp),edx

28
Look at sum again
int sum(int x, int y) int t xy return
t
pushl ebp movl esp,ebp //handle stack
movl 12(ebp),eax // eax y addl
8(ebp),eax // eax yx movl eax, -4(ebp)
// t eax movl -4(ebp),eax // return
t leave ret
29
Using Simple Addressing Modes
swap pushl ebp movl esp,ebp pushl
ebx movl 12(ebp),ecx movl
8(ebp),edx movl (ecx),eax movl
(edx),ebx movl eax,(edx) movl
ebx,(ecx) movl -4(ebp),ebx return ret
Set Up
void swap(int xp, int yp) int t0 xp
int t1 yp xp t1 yp t0
Body
Finish
30
Understanding Swap
void swap(int xp, int yp) int t0 xp
int t1 yp xp t1 yp t0
Stack
Register Variable ecx yp edx xp eax t1 ebx t0
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
31
Understanding Swap
Address
123
0x124
456
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
32
Understanding Swap
Address
123
0x124
456
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
33
Understanding Swap
Address
123
0x124
456
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
34
Understanding Swap
Address
123
0x124
456
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
35
Understanding Swap
Address
123
0x124
456
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
36
Understanding Swap
Address
456
0x124
456
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
37
Understanding Swap
Address
456
0x124
123
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
38
Indexed Addressing Modes

Most General Form
D(Rb,Ri,S) MemRegRbSRegRi D
D Constant displacement 1, 2, or 4 bytes
Rb Base register Any of 8 integer registers
Ri Index register Any, except for esp
Unlikely youd use ebp, either
S Scale 1, 2, 4, or 8
Special Cases
(Rb,Ri) MemRegRbRegRi
D(Rb,Ri) MemRegRbRegRiD
(Rb,Ri,S) MemRegRbSRegRi

39
Address Computation Examples
edx
0xf000
ecx
0x100
Expression Computation Address
0x8(edx) 0xf000 0x8 0xf008
(edx,ecx) 0xf000 0x100 0xf100
(edx,ecx,4) 0xf000 40x100 0xf400
0x80(,edx,2) 20xf000 0x80 0x1e080
40
Address Computation Instruction

leal Src,Dest
Src is address mode expression
Set Dest to address denoted by expression
Uses
Computing address without doing memory reference
E.g., translation of p xi
Computing arithmetic expressions of the form x
ky
k 1, 2, 4, or 8.

41
Some Arithmetic Operations

Format Computation
Two Operand Instructions
addl Src,Dest Dest Dest Src
subl Src,Dest Dest Dest - Src
imull Src,Dest Dest Dest Src
sall Src,Dest Dest Dest ltlt Src Also called
shll
sarl Src,Dest Dest Dest gtgt Src Arithmetic
shrl Src,Dest Dest Dest gtgt Src Logical
xorl Src,Dest Dest Dest Src
andl Src,Dest Dest Dest Src
orl Src,Dest Dest Dest Src

42
Some Arithmetic Operations

Format Computation
One Operand Instructions
incl Dest Dest Dest 1
decl Dest Dest Dest - 1
negl Dest Dest - Dest
notl Dest Dest Dest

43
Using leal for Arithmetic Expressions
arith pushl ebp movl esp,ebp movl
8(ebp),eax movl 12(ebp),edx leal
(edx,eax),ecx leal (edx,edx,2),edx sall
4,edx addl 16(ebp),ecx leal
4(edx,eax),eax imull ecx,eax leave ret
Set Up
int arith (int x, int y, int z) int t1
xy int t2 zt1 int t3 x4 int t4
y 48 int t5 t3 t4 int rval t2
t5 return rval
Body
Finish
44
Understanding arith
int arith (int x, int y, int z) int t1
xy int t2 zt1 int t3 x4 int t4
y 48 int t5 t3 t4 int rval t2
t5 return rval
movl 8(ebp),eax eax x movl
12(ebp),edx edx y leal (edx,eax),ecx
ecx xy (t1) leal (edx,edx,2),edx edx
3y sall 4,edx edx 48y (t4) addl
16(ebp),ecx ecx zt1 (t2) leal
4(edx,eax),eax eax 4t4x (t5) imull
ecx,eax eax t5t2 (rval)
45
Understanding arith
eax x movl 8(ebp),eax edx y movl
12(ebp),edx ecx xy (t1) leal
(edx,eax),ecx edx 3y leal
(edx,edx,2),edx edx 48y (t4) sall
4,edx ecx zt1 (t2) addl 16(ebp),ecx
eax 4t4x (t5) leal 4(edx,eax),eax eax
t5t2 (rval) imull ecx,eax
int arith (int x, int y, int z) int t1
xy int t2 zt1 int t3 x4 int t4
y 48 int t5 t3 t4 int rval t2
t5 return rval
And now some live action!
46
Condition Codes

Single Bit Registers
CF Carry Flag SF Sign Flag
ZF Zero Flag OF Overflow Flag
Implicitly Set By Arithmetic Operations
addl Src,Dest
C analog t a b
CF set if carry out from most significant bit
Used to detect unsigned overflow
ZF set if t 0
SF set if t lt 0
OF set if twos complement overflow
(agt0 bgt0 tlt0) (alt0 blt0 tgt0)
Not Set by leal instruction

47
Setting Condition Codes (cont.)

Explicit Setting by Compare Instruction
cmpl Src2,Src1
cmpl b,a like computing a-b without setting
destination
CF set if carry out from most significant bit
Used for unsigned comparisons
ZF set if a b
SF set if (a-b) lt 0
OF set if twos complement overflow
(agt0 blt0 (a-b)lt0) (alt0 bgt0 (a-b)gt0)

48
Setting Condition Codes (cont.)

Explicit Setting by Test instruction
testl Src2,Src1
Sets condition codes based on value of Src1
Src2
Useful to have one of the operands be a mask
testl b,a like computing ab without setting
destination
ZF set when ab 0
SF set when ab lt 0

49
Reading Condition Codes

SetX Instructions
Set single byte based on combinations of
condition codes

50
Reading Condition Codes (Cont.)

SetX Instructions
Set single byte based on combinations of
condition codes
One of 8 addressable byte registers
Embedded within first 4 integer registers
Does not alter remaining 3 bytes
Typically use movzbl to finish job

eax
al
ah
edx
dl
dh
ecx
cl
ch
ebx
bl
bh
esi
int gt (int x, int y) return x gt y
edi
esp
Body
ebp
movl 12(ebp),eax eax y cmpl eax,8(ebp)
Compare x y setg al al x gt y movzbl
al,eax Zero rest of eax
Note inverted ordering!
51
Jumping

jX Instructions
Jump to different part of code depending on
condition codes

52
Conditional Branch Example
_max pushl ebp movl esp,ebp movl
8(ebp),edx movl 12(ebp),eax cmpl
eax,edx jle L9 movl edx,eax L9 leave ret
Set Up
int max(int x, int y) if (x gt y) return
x else return y
Body
Finish
53
Conditional Branch Example (Cont.)
int goto_max(int x, int y) int rval y
int ok (x lt y) if (ok) goto done
rval x done return rval

C allows goto as means of transferring control
Closer to machine-level programming style
Generally considered bad coding style

movl 8(ebp),edx edx x movl
12(ebp),eax eax y cmpl eax,edx x
y jle L9 if lt goto L9 movl edx,eax eax
x L9 Done
Skipped when x ? y
54
Do-While Loop Example
C Code
Goto Version
int fact_do (int x) int result 1 do
result x x x-1 while (x gt 1)
return result
int fact_goto(int x) int result 1 loop
result x x x-1 if (x gt 1) goto
loop return result

Use backward branch to continue looping
Only take branch when while condition holds

55
Do-While Loop Compilation
Goto Version
Assembly
int fact_goto (int x) int result
1 loop result x x x-1 if (x gt 1)
goto loop return result
_fact_goto pushl ebp Setup movl esp,ebp
Setup movl 1,eax eax 1 movl
8(ebp),edx edx x L11 imull edx,eax
result x decl edx x-- cmpl 1,edx
Compare x 1 jg L11 if gt goto loop movl
ebp,esp Finish popl ebp Finish ret
Finish

Registers
edx x
eax result

56
While Loop Example 1
First Goto Version
C Code
int fact_while (int x) int result 1
while (x gt 1) result x x x-1
return result
int fact_while_goto (int x) int result
1 loop if (!(x gt 1)) goto done
result x x x-1 goto loop done
return result

Is this code equivalent to the do-while version?
Must jump out of loop if test fails

57
Actual While Loop Translation
Second Goto Version
C Code
int fact_while(int x) int result 1 while
(x gt 1) result x x x-1
return result
int fact_while_goto2 (int x) int result
1 if (!(x gt 1)) goto done loop
result x x x-1 if (x gt 1) goto
loop done return result

Uses same inner loop as do-while version
Guards loop entry with extra test

58
General While Translation
C Code
while (Test) Body
Goto Version
Do-While Version
if (!Test) goto done loop Body if
(Test) goto loop done
if (!Test) goto done do Body
while(Test) done
59
Switch Statement Example
movl 8(ebp),eax // eax op movl eax,
-4(ebp) // movl 8(ebp), eax // movl
eax, -8(ebp) // cmpl 2,-8(ebp) Compare op
je .L4 cmpl 2, -8(ebp) jg .L7 cmpl
1, -8(ebp) je .L3 jmp .L6 L7 cmpl 3
-8(ebp) je .L5 jmp .L6 L3 movl 3,
-4(ebp) jmp .L2 L4 movl 5, -4(ebp) jmp
.L2 L5 movl 9, -4(ebp) L6 movl 7,
-4(ebp) L2 leave
C code
int swich_ex(int a) int ba switch(a) case
1 b3 break case 2 b5 break case
3 b9 default b7
Setup

Write a Comment

User Comments (0)