Title: IEEE Floating Point
1IEEE Floating Point
- IEEE Standard 754
- Established in 1985 as uniform standard for
floating point arithmetic - Before that, many idiosyncratic formats
- Supported by all major CPUs
- Driven by Numerical Concerns
- Nice standards for rounding, overflow, underflow
- Hard to make go fast
- Numerical analysts predominated over hardware
types in defining standard
2Fractional Binary Numbers
2i
2i1
4
2
1
1/2
1/4
1/8
2j
- Representation
- Bits to right of binary point represent
fractional powers of 2 - Represents rational number
3Frac. Binary Number Examples
- Value Representation
- 5-3/4 101.112
- 2-7/8 10.1112
- 63/64 0.1111112
- Observations
- Divide by 2 by shifting right
- Multiply by 2 by shifting left
- Numbers of form 0.1111112 just below 1.0
- 1/2 1/4 1/8 1/2i ? 1.0
- Use notation 1.0 ?
4Representable Numbers
- Limitation
- Can only exactly represent numbers of the form
x/2k - Other numbers have repeating bit representations
- Value Representation
- 1/3 0.0101010101012
- 1/5 0.00110011001100112
- 1/10 0.000110011001100112
5Floating Point Representation
- Numerical Form
- 1s M 2E
- Sign bit s determines whether number is negative
or positive - Significand M normally a fractional value in
range 1.0,2.0). - Exponent E weights value by power of two
- Encoding
- MSB is sign bit
- exp field encodes E
- frac field encodes M
s
exp
frac
6Floating Point Precisions
- Encoding
- MSB is sign bit
- exp field encodes E
- frac field encodes M
- Sizes
- Single precision 8 exp bits, 23 frac bits
- 32 bits total
- Double precision 11 exp bits, 52 frac bits
- 64 bits total
- Extended precision 15 exp bits, 63 frac bits
- Only found in Intel-compatible machines
- Stored in 80 bits
- 1 bit wasted
7Normalized Numeric Values
- Condition
- Â exp ? 0000 and exp ? 1111
- Exponent coded as biased value
- Â E Exp Bias
- Exp unsigned value denoted by exp
- Bias Bias value
- Single precision 127 (Exp 1254, E -126127)
- Double precision 1023 (Exp 12046, E
-10221023) - in general Bias 2e-1 - 1, where e is number of
exponent bits - Significand coded with implied leading 1
- Â M 1.xxxx2
- Â xxxx bits of frac
- Minimum when 0000 (M 1.0)
- Maximum when 1111 (M 2.0 ?)
- Get extra leading bit for free
8Normalized Encoding Example
- Value
- Float F 15213.0
- 1521310 111011011011012 1.11011011011012 X
213 - Significand
- M 1.11011011011012
- frac 110110110110100000000002
- Exponent
- E 13
- Bias 127
- Exp 140 100011002
Floating Point Representation (Class 02) Hex
4 6 6 D B 4 0 0 Binary
0100 0110 0110 1101 1011 0100 0000 0000 140
100 0110 0 15213 1110 1101 1011 01
9Denormalized Values
- Condition
- Â exp 0000
- Value
- Exponent value E Bias 1
- Significand value M 0.xxxx2
- xxxx bits of frac
- Cases
- exp 0000, frac 0000
- Represents value 0
- Note that have distinct values 0 and 0
- exp 0000, frac ? 0000
- Numbers very close to 0.0
- Lose precision as get smaller
- Gradual underflow
10Special Values
- Condition
- Â exp 1111
- Cases
- exp 1111, frac 0000
- Represents value???(infinity)
- Operation that overflows
- Both positive and negative
- E.g., 1.0/0.0 ?1.0/?0.0 ?, 1.0/?0.0 ??
- exp 1111, frac ? 0000
- Not-a-Number (NaN)
- Represents case when no numeric value can be
determined - E.g., sqrt(1), ?????
11Tiny Floating Point Example
- 8-bit Floating Point Representation
- the sign bit is in the most significant bit.
- the next four bits are the exponent, with a bias
of 7. - the last three bits are the frac
- Same General Form as IEEE Format
- normalized, denormalized
- representation of 0, NaN, infinity
0
2
3
6
7
s
exp
frac
12Values Related to the Exponent
Exp exp E 2E 0 0000 -6 1/64 (denorms) 1 0001 -6
1/64 2 0010 -5 1/32 3 0011 -4 1/16 4 0100 -3 1/8 5
0101 -2 1/4 6 0110 -1 1/2 7 0111
0 1 8 1000 1 2 9 1001 2 4 10 1010 3 8 11 1011
4 16 12 1100 5 32 13 1101 6 64 14 1110 7 128 15
1111 n/a (inf, NaN)
13Dynamic Range
s exp frac E Value 0 0000 000 -6 0 0 0000
001 -6 1/81/64 1/512 0 0000 010 -6 2/81/64
2/512 0 0000 110 -6 6/81/64 6/512 0 0000
111 -6 7/81/64 7/512 0 0001 000 -6 8/81/64
8/512 0 0001 001 -6 9/81/64 9/512 0 0110
110 -1 14/81/2 14/16 0 0110 111 -1 15/81/2
15/16 0 0111 000 0 8/81 1 0 0111
001 0 9/81 9/8 0 0111 010 0 10/81
10/8 0 1110 110 7 14/8128 224 0 1110
111 7 15/8128 240 0 1111 000 n/a inf
closest to zero
Denormalized numbers
largest denorm
smallest norm
closest to 1 below
Normalized numbers
closest to 1 above
largest norm
14Floating Point Operations
- Conceptual View
- First compute exact result
- Make it fit into desired precision
- Possibly overflow if exponent too large
- Possibly round to fit into frac
- Rounding Modes (illustrate with rounding)
- 1.40 1.60 1.50 2.50 1.50
- Zero 1 1 1 2 1
- Round down (-?) 1 1 1 2 2
- Round up (?) 2 2 2 3 1
- Nearest Even (default) 1 2 2 2 2
Note 1. Round down rounded result is close to
but no greater than true result. 2. Round up
rounded result is close to but no less than true
result.
15Closer Look at Round-To-Even
- Default Rounding Mode
- Hard to get any other kind without dropping into
assembly - All others are statistically biased
- Sum of set of positive numbers will consistently
be over- or under- estimated - Applying to Other Decimal Places / Bit Positions
- When exactly halfway between two possible values
- Round so that least significant digit is even
- E.g., round to nearest hundredth
- 1.2349999 1.23 (Less than half way)
- 1.2350001 1.24 (Greater than half way)
- 1.2350000 1.24 (Half wayround up)
- 1.2450000 1.24 (Half wayround down)
16Rounding Binary Numbers
- Binary Fractional Numbers
- Even when least significant bit is 0
- Half way when bits to right of rounding position
1002 - Examples
- Round to nearest 1/4 (2 bits right of binary
point) - Value Binary Rounded Action Rounded Value
- 2 3/32 10.000112 10.002 (lt1/2down) 2
- 2 3/16 10.001102 10.012 (gt1/2up) 2 1/4
- 2 7/8 10.111002 11.002 (1/2up) 3
- 2 5/8 10.101002 10.102 (1/2down) 2 1/2
17FP Multiplication
- Operands
- (1)s1 M1 2E1 (1)s2 M2 2E2
- Exact Result
- (1)s M 2E
- Sign s s1 Â s2
- Significand M M1 Â M2
- Exponent E E1 Â E2
- Fixing
- If M 2, shift M right, increment E
- If E out of range, overflow
- Round M to fit frac precision
- Implementation
- Biggest chore is multiplying significands
18FP Addition
- Operands
- (1)s1 M1 2E1
- (1)s2 M2 2E2
- Assume E1 gt E2
- Exact Result
- (1)s M 2E
- Sign s, significand M
- Result of signed align add
- Exponent E E1
- Fixing
- If M 2, shift M right, increment E
- if M lt 1, shift M left k positions, decrement E
by k - Overflow if E out of range
- Round M to fit frac precision
19Floating Point in C
- C Guarantees Two Levels
- float single precision
- double double precision
- Conversions
- Casting between int, float, and double changes
numeric values - Double or float to int
- Truncates fractional part
- Like rounding toward zero
- Not defined when out of range
- Generally saturates to TMin or TMax
- int to double
- Exact conversion, as long as int has 53 bit
word size - int to float
- Will round according to rounding mode
20Assembly Programmers View
CPU
Memory
Addresses
Registers
E I P
Object Code Program Data OS Data
Data
Condition Codes
Instructions
Stack
- Programmer-Visible State
- EIP Program Counter
- Address of next instruction
- Register File
- Heavily used program data
- Condition Codes
- Store status information about most recent
arithmetic operation - Used for conditional branching
- Memory
- Byte addressable array
- Code, user data, (some) OS data
- Includes stack used to support procedures
21Turning C into Object Code
- Code in files p1.c p2.c
- Compile with command gcc -O p1.c p2.c -o p
- Use optimizations (-O)
- Put resulting binary in file p
C program (p1.c p2.c)
text
Compiler (gcc -S)
Asm program (p1.s p2.s)
text
Assembler (gcc or as)
Object program (p1.o p2.o)
Static libraries (.a)
binary
Linker (gcc or ld)
binary
Executable program (p)
22Compiling Into Assembly
int sum(int x, int y) int t xy return
t
Obtain with command gcc -O -S code.c Produces
file code.s
Generated Assembly
pushl ebp movl esp,ebp movl
12(ebp),eax addl 8(ebp),eax movl eax,
-4(ebp) movl -4(ebp),eax leave ret
23Assembly Characteristics
- Minimal Data Types
- Integer data of 1, 2, or 4 bytes
- Data values
- Addresses (untyped pointers)
- Floating point data of 4, 8, or 10 bytes
- No aggregate types such as arrays or structures
- Just contiguously allocated bytes in memory
- Primitive Operations
- Perform arithmetic function on register or memory
data - Transfer data between memory and register
- Load data from memory into register
- Store register data into memory
- Transfer control
- Unconditional jumps to/from procedures
- Conditional branches
24Machine Instruction Example
- C Code
- Add two signed integers
- Assembly
- Add 2 4-byte integers
- Same instruction whether signed or unsigned
- Operands
- x Register eax
- y Memory Mebp8
- t Register eax
- Return function value in eax
- Object Code
- 3-byte instruction
- Stored at address 0x401046
int t xy
addl 8(ebp),eax
Similar to expression x y
0x401046 03 45 08
25Moving Data
- Moving Data
- movl Source,Dest
- Move 4-byte (long) word
- Lots of these in typical code
- Operand Types
- Immediate Constant integer data
- Like C constant, but prefixed with
- E.g., 0x400, -533
- Encoded with 1, 2, or 4 bytes
- Register One of 8 integer registers
- But esp and ebp reserved for special use
- Others have special uses for particular
instructions - Memory 4 consecutive bytes of memory
- Various address modes
26movl Operand Combinations
Source
Destination
C Analog
Reg
movl 0x4,eax
temp 0x4
Imm
Mem
movl -147,(eax)
p -147
Reg
movl eax,edx
temp2 temp1
movl
Reg
Mem
movl eax,(edx)
p temp
Mem
Reg
movl (eax),edx
temp p
- Cannot do memory-memory transfers with single
instruction
27Simple Addressing Modes
- Normal (R) MemRegR
- Register R specifies memory address
- movl (ecx),eax
- Displacement D(R) MemRegRD
- Register R specifies start of memory region
- Constant displacement D specifies offset
- movl 8(ebp),edx
28Look at sum again
int sum(int x, int y) int t xy return
t
pushl ebp movl esp,ebp //handle stack
movl 12(ebp),eax // eax y addl
8(ebp),eax // eax yx movl eax, -4(ebp)
// t eax movl -4(ebp),eax // return
t leave ret
29Using Simple Addressing Modes
swap pushl ebp movl esp,ebp pushl
ebx movl 12(ebp),ecx movl
8(ebp),edx movl (ecx),eax movl
(edx),ebx movl eax,(edx) movl
ebx,(ecx) movl -4(ebp),ebx return ret
Set Up
void swap(int xp, int yp) int t0 xp
int t1 yp xp t1 yp t0
Body
Finish
30Understanding Swap
void swap(int xp, int yp) int t0 xp
int t1 yp xp t1 yp t0
Stack
Register Variable ecx yp edx xp eax t1 ebx t0
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
31Understanding Swap
Address
123
0x124
456
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
32Understanding Swap
Address
123
0x124
456
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
33Understanding Swap
Address
123
0x124
456
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
34Understanding Swap
Address
123
0x124
456
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
35Understanding Swap
Address
123
0x124
456
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
36Understanding Swap
Address
456
0x124
456
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
37Understanding Swap
Address
456
0x124
123
0x120
0x11c
0x118
Offset
0x114
0x120
12
yp
0x110
0x124
8
xp
0x10c
Rtn adr
4
0x108
0
ebp
0x104
-4
0x100
movl 12(ebp),ecx ecx yp movl
8(ebp),edx edx xp movl (ecx),eax eax
yp (t1) movl (edx),ebx ebx xp (t0) movl
eax,(edx) xp eax movl ebx,(ecx) yp
ebx
38Indexed Addressing Modes
- Most General Form
- D(Rb,Ri,S) MemRegRbSRegRi D
- D Constant displacement 1, 2, or 4 bytes
- Rb Base register Any of 8 integer registers
- Ri Index register Any, except for esp
- Unlikely youd use ebp, either
- S Scale 1, 2, 4, or 8
- Special Cases
- (Rb,Ri) MemRegRbRegRi
- D(Rb,Ri) MemRegRbRegRiD
- (Rb,Ri,S) MemRegRbSRegRi
39Address Computation Examples
edx
0xf000
ecx
0x100
Expression Computation Address
0x8(edx) 0xf000 0x8 0xf008
(edx,ecx) 0xf000 0x100 0xf100
(edx,ecx,4) 0xf000 40x100 0xf400
0x80(,edx,2) 20xf000 0x80 0x1e080
40Address Computation Instruction
- leal Src,Dest
- Src is address mode expression
- Set Dest to address denoted by expression
- Uses
- Computing address without doing memory reference
- E.g., translation of p xi
- Computing arithmetic expressions of the form x
ky - k 1, 2, 4, or 8.
41Some Arithmetic Operations
- Format Computation
- Two Operand Instructions
- addl Src,Dest Dest Dest Src
- subl Src,Dest Dest Dest - Src
- imull Src,Dest Dest Dest Src
- sall Src,Dest Dest Dest ltlt Src Also called
shll - sarl Src,Dest Dest Dest gtgt Src Arithmetic
- shrl Src,Dest Dest Dest gtgt Src Logical
- xorl Src,Dest Dest Dest Src
- andl Src,Dest Dest Dest Src
- orl Src,Dest Dest Dest Src
42Some Arithmetic Operations
- Format Computation
- One Operand Instructions
- incl Dest Dest Dest 1
- decl Dest Dest Dest - 1
- negl Dest Dest - Dest
- notl Dest Dest Dest
43Using leal for Arithmetic Expressions
arith pushl ebp movl esp,ebp movl
8(ebp),eax movl 12(ebp),edx leal
(edx,eax),ecx leal (edx,edx,2),edx sall
4,edx addl 16(ebp),ecx leal
4(edx,eax),eax imull ecx,eax leave ret
Set Up
int arith (int x, int y, int z) int t1
xy int t2 zt1 int t3 x4 int t4
y 48 int t5 t3 t4 int rval t2
t5 return rval
Body
Finish
44Understanding arith
int arith (int x, int y, int z) int t1
xy int t2 zt1 int t3 x4 int t4
y 48 int t5 t3 t4 int rval t2
t5 return rval
movl 8(ebp),eax eax x movl
12(ebp),edx edx y leal (edx,eax),ecx
ecx xy (t1) leal (edx,edx,2),edx edx
3y sall 4,edx edx 48y (t4) addl
16(ebp),ecx ecx zt1 (t2) leal
4(edx,eax),eax eax 4t4x (t5) imull
ecx,eax eax t5t2 (rval)
45Understanding arith
eax x movl 8(ebp),eax edx y movl
12(ebp),edx ecx xy (t1) leal
(edx,eax),ecx edx 3y leal
(edx,edx,2),edx edx 48y (t4) sall
4,edx ecx zt1 (t2) addl 16(ebp),ecx
eax 4t4x (t5) leal 4(edx,eax),eax eax
t5t2 (rval) imull ecx,eax
int arith (int x, int y, int z) int t1
xy int t2 zt1 int t3 x4 int t4
y 48 int t5 t3 t4 int rval t2
t5 return rval
And now some live action!
46Condition Codes
- Single Bit Registers
- CF Carry Flag SF Sign Flag
- ZF Zero Flag OF Overflow Flag
- Implicitly Set By Arithmetic Operations
- addl Src,Dest
- C analog t a b
- CF set if carry out from most significant bit
- Used to detect unsigned overflow
- ZF set if t 0
- SF set if t lt 0
- OF set if twos complement overflow
- (agt0 bgt0 tlt0) (alt0 blt0 tgt0)
- Not Set by leal instruction
47Setting Condition Codes (cont.)
- Explicit Setting by Compare Instruction
- cmpl Src2,Src1
- cmpl b,a like computing a-b without setting
destination - CF set if carry out from most significant bit
- Used for unsigned comparisons
- ZF set if a b
- SF set if (a-b) lt 0
- OF set if twos complement overflow
- (agt0 blt0 (a-b)lt0) (alt0 bgt0 (a-b)gt0)
48Setting Condition Codes (cont.)
- Explicit Setting by Test instruction
- testl Src2,Src1
- Sets condition codes based on value of Src1
Src2 - Useful to have one of the operands be a mask
- testl b,a like computing ab without setting
destination - ZF set when ab 0
- SF set when ab lt 0
49Reading Condition Codes
- SetX Instructions
- Set single byte based on combinations of
condition codes
50Reading Condition Codes (Cont.)
- SetX Instructions
- Set single byte based on combinations of
condition codes - One of 8 addressable byte registers
- Embedded within first 4 integer registers
- Does not alter remaining 3 bytes
- Typically use movzbl to finish job
eax
al
ah
edx
dl
dh
ecx
cl
ch
ebx
bl
bh
esi
int gt (int x, int y) return x gt y
edi
esp
Body
ebp
movl 12(ebp),eax eax y cmpl eax,8(ebp)
Compare x y setg al al x gt y movzbl
al,eax Zero rest of eax
Note inverted ordering!
51Jumping
- jX Instructions
- Jump to different part of code depending on
condition codes
52Conditional Branch Example
_max pushl ebp movl esp,ebp movl
8(ebp),edx movl 12(ebp),eax cmpl
eax,edx jle L9 movl edx,eax L9 leave ret
Set Up
int max(int x, int y) if (x gt y) return
x else return y
Body
Finish
53Conditional Branch Example (Cont.)
int goto_max(int x, int y) int rval y
int ok (x lt y) if (ok) goto done
rval x done return rval
- C allows goto as means of transferring control
- Closer to machine-level programming style
- Generally considered bad coding style
movl 8(ebp),edx edx x movl
12(ebp),eax eax y cmpl eax,edx x
y jle L9 if lt goto L9 movl edx,eax eax
x L9 Done
Skipped when x ? y
54Do-While Loop Example
C Code
Goto Version
int fact_do (int x) int result 1 do
result x x x-1 while (x gt 1)
return result
int fact_goto(int x) int result 1 loop
result x x x-1 if (x gt 1) goto
loop return result
- Use backward branch to continue looping
- Only take branch when while condition holds
55Do-While Loop Compilation
Goto Version
Assembly
int fact_goto (int x) int result
1 loop result x x x-1 if (x gt 1)
goto loop return result
_fact_goto pushl ebp Setup movl esp,ebp
Setup movl 1,eax eax 1 movl
8(ebp),edx edx x L11 imull edx,eax
result x decl edx x-- cmpl 1,edx
Compare x 1 jg L11 if gt goto loop movl
ebp,esp Finish popl ebp Finish ret
Finish
- Registers
- edx x
- eax result
56While Loop Example 1
First Goto Version
C Code
int fact_while (int x) int result 1
while (x gt 1) result x x x-1
return result
int fact_while_goto (int x) int result
1 loop if (!(x gt 1)) goto done
result x x x-1 goto loop done
return result
- Is this code equivalent to the do-while version?
- Must jump out of loop if test fails
57Actual While Loop Translation
Second Goto Version
C Code
int fact_while(int x) int result 1 while
(x gt 1) result x x x-1
return result
int fact_while_goto2 (int x) int result
1 if (!(x gt 1)) goto done loop
result x x x-1 if (x gt 1) goto
loop done return result
- Uses same inner loop as do-while version
- Guards loop entry with extra test
58General While Translation
C Code
while (Test) Body
Goto Version
Do-While Version
if (!Test) goto done loop Body if
(Test) goto loop done
if (!Test) goto done do Body
while(Test) done
59Switch Statement Example
movl 8(ebp),eax // eax op movl eax,
-4(ebp) // movl 8(ebp), eax // movl
eax, -8(ebp) // cmpl 2,-8(ebp) Compare op
je .L4 cmpl 2, -8(ebp) jg .L7 cmpl
1, -8(ebp) je .L3 jmp .L6 L7 cmpl 3
-8(ebp) je .L5 jmp .L6 L3 movl 3,
-4(ebp) jmp .L2 L4 movl 5, -4(ebp) jmp
.L2 L5 movl 9, -4(ebp) L6 movl 7,
-4(ebp) L2 leave
C code
int swich_ex(int a) int ba switch(a) case
1 b3 break case 2 b5 break case
3 b9 default b7
Setup