Title: Arithmetic Operations
1Lecture 11
Floating Point Operations
Arithmetic Operations in Software
Operations in GF(2n)
2Floating Point Operations
3(No Transcript)
4Fig. 17.3 The ANSI/IEEE standard floating-point
number representation formats.
5Table 17.1 Some features of the ANSI/IEEE
standard floatingpoint number representation
formats
6Fig. 17.4 Denormals in the IEEE single-precision
format.
7Fig. 18.1 Block diagram of a floating-point
adder/subtractor.
8Fig. 18.2 One bit-slice of a single-stage
pre-shifter.
9Fig. 18.3 Four-stage combinational shifter for
preshifting an operand by 0 to 15 bits.
10(No Transcript)
11Fig. 18.4 Leading zeros/ones counting versus
prediction.
12Fig. 18.5 Block diagram of a floating-point
multiplier.
13Fig. 18.6 Block diagram of a floating-point
divider.
14Arithmetic Operations in Software
15Little-Endian vs. Big-Endian Representation
A0 B1 C2 D3 E4 F5 67 8916
MSB
LSB
Little-Endian
Big-Endian
0
LSB 89
MSB A0
67
B1
F5
C2
E4
D3
address
D3
E4
C2
F5
B1
67
MSB A0
LSB 89
MAX
16Little-Endian vs. Big-Endian Camps
0
LSB
MSB
. . .
. . .
address
MSB
LSB
MAX
Little-Endian
Big-Endian
Motorola 68xx, 680x0
Bi-Endian
Intel
IBM
DEC VAX
Motorola Power PC
Hewlett-Packard
RS 232
Silicon Graphics MIPS
Sun SuperSPARC
Internet TCP/IP
17Little-Endian vs. Big-Endian
Origin of the terms
Jonathan Swift, Gullivers Travels
- A law requiring all citizens of Lilliput to
break their soft-eggs - at the little ends only
- A civil war breaking between the Little Endians
and - the Big-Endians, resulting in the Big Endians
taking refuge on - a nearby island, the kingdom of Blefuscu
- Satire over holy wars between Protestant Church
of England - and the Catholic Church of France
18Little-Endian vs. Big-Endian
Advantages and Disadvantages
Big-Endian
Little-Endian
- easier to determine a sign of
- the number
- easier to compare two numbers
- easier to divide two numbers
- easier to print
- easier to load and store multibyte
- numbers
- easier to write multiple precision
- routines, especially addition and
- multiplication
19Pointers (1)
Big-Endian
Little-Endian
0
int iptr
89
( iptr) 8967
( iptr) 6789
67
F5
E4
address
iptr1
D3
C2
B1
A0
MAX
20Pointers (2)
Big-Endian
Little-Endian
0
long int lptr
89
( lptr) 8967F5E4
( lptr) E4F56789
67
F5
E4
address
D3
lptr 1
C2
B1
A0
MAX
21SOFTWARE MULTIPLICATION
1 word l bytes ? bits
. . .
Bn-2
Bn-1
B0
B1
x
N bytes n words
2N bytes 2n words
22Paper-and-Pencil Algorithm of Multiplication
1 word l bytes ? bits
A
. . .
B0
B1
Bn-2
B
x
Bn-1
2 words
Assertion
D0 A0B0
D0
lg2 n ? ?
D1
D1 A0B1 A1B0
D2
D2 A0B2 A1B1 A2B0
3 words
3 words
. . . . .
D2n-4
D2n-4 An-3Bn-1 An-2Bn-2 An-1Bn-3
D2n-3
D2n-3 An-2Bn-1 An-1Bn-2
D2n-2
D2n-2 An-1Bn-1
2 words
C2n-1
. . .
C1
. . .
Cn1
C2n-2
C0
Cn-1
Cn
Cn-2
C
23Paper-and-Pencil Algorithm of Squaring
1 word l bytes ? bits
A
. . .
A0
A1
An-2
A
x
An-1
2 words
Assertion
D0 A02
D0
lg2 n ? ?
D1
D1 2A0A1
D2
D2 2A0A2 A12
3 words
3 words
. . . . .
D2n-4
D2n-4 2An-3An-1 An-22
D2n-3
D2n-3 2An-2An-1
D2n-2
D2n-2 An-12
2 words
C2n-1
. . .
C1
. . .
Cn1
C2n-2
C0
Cn-1
Cn
Cn-2
C
24Paper-and-Pencil Algorithm of Multiplication
Run Time Assuming Purely Sequential Execution of
Instructions
tA
N2
4 l
tMUL (N) tM (1
(1 ))
tM
N
l2
paper-and-pencil
N - operand length in bytes
tM - time of a single word multiplication
tA - time of a single word addition
l - word length in bytes
tMUL
? (N2)
paper-and-pencil
25Paper-and-Pencil Algorithm of Squaring
Run Time Assuming Purely Sequential Execution of
Instructions
tMUL
1
1
5 ?
(1 )
paper-and-pencil
lt 1
lt
4n (1 ?)
2
tSQR
2
paper-and-pencil
tM
time of a single word multiplication
?
tA
time of a single word addition
For large n
tSQR
1
paper-and-pencil
?
2
26Karatsuba Algorithm of Multiplication
Basic Recursive Step (1)
n
? bits
words
2
A (A1, A0)
A
2?
B (B1, B0)
B1
B0
x
B
2?
n words N bytes
0
2?
22?
23?
24?
D1 A1B1
D0 A0B0
D0
D1
D2 (A1-A0)(B0-B1)
C (C3, C2, C1, C0)
C
2?
27Karatsuba Algorithm of Multiplication
Basic Recursive Step (2)
C A B (A1 2? A0) (B1 2? B0) A1B1
22? (A1-A0)(B0-B1) A0B0A1B1 2? A0B0
D1
D0
D0
D2
D1
28Karatsuba Algorithm of Multiplication
Tree of Recursive Calls
n 2?
2?-1
2?-1
2?-1
2?-2
2?-2
2?-2
2?-2
2?-2
2?-2
2?-2
2?-2
2?-2
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
29Karatsuba Algorithm of Multiplication
Run Time Assuming Purely Sequential Execution of
Instructions
lg23
tA
tC
N
tMUL (N) tM (1
(10 - ))
1
8
tM
tM
2
N
lg23 - 1
l
Karatsuba
l
tM - time of a single word multiplication
N - operand length in bytes
tA - time of a single word addition
l - word length in bytes
tC - time of stack operations in every
recurrent call of the function
lg23
tMUL
? (N )
Karatsuba
30Schönhage-Strassen Algorithm of Multiplication
F Discrete Fourier Transform in Finite Field
GF(p)
C A B F-1 (F(A) F(B))
A (0, , 0, An-1, , A0)
B (0, , 0, Bn-1, , B0)
n-1
n-1
?
?
F ?i Bk ?2nik
F ?i Ak ?2nik
k0
k0
F(A) (?2n-1 , . . . , ?0)
F(A) (?2n-1 , . . . , ?0)
(?2n-1 ?2n-1, . . . , ?0 ?0) F(A) F(B)
F(C) (?2n-1 , . . . , ?0)
2n-1
?
1
F Ci ? k ?2n-ik
n
k0
C (C2n-1, , C0)
31Schönhage-Strassen Algorithm of Multiplication
Run Time Assuming Purely Sequential Execution of
Instructions
tMUL
? (N lg2 N)
Schönhage-Strassen
Optimization for Squaring
C A2 F-1(F(A) F(A))
tSQR
2
Schönhage-Strassen
3
32Comparison of Software Multiplication Algorithms
Optimizations for Squaring
Complexity
Name
Limitations
tSQR
1
Paper-and-pencil (classical)
?(n2)
none
?
tMOD
2
tSQR
log23
Karatsuba (Karatsuba-Ofman)
n2k
?
?(n )
1
tMOD
tSQR
Shönhage-Strassen
?(n ln n)
n of the special form
2
?
tMOD
3
33Modular Reduction Algorithms
Notation
y x mod m
?-bit word
x2n-1
xn-1
x0
. . .
x1
xn
. . .
xn1
x2n-2
xn-2
x2n-3
xn-3
mn-1
m0
. . .
m1
mn-2
mn-3
b 2?
n words k bits
n-1
?
m
mi bi
k length of m in bits
i0
n length of m in ?-bit words
2n-1
?
? - word length in bits
x
xi bi
i0
34Classical Algorithm (1)
m
x
x2n-1
x2n-2
x2n-3
x0
. . .
xn-1
. . .
mn-1
x1
mn-2
. . .
m0
qn-1 m
x2n-1b x2n-2
qn-1 qn-1 ?
qn-1
mn-1
? 0, 1, 2
x2n-2
x2n-3
. . .
xn-1
x0
. . .
x1
mn-1
mn-2
. . .
m0
x2n-2b x2n-3
qn-2 qn-2 ?
qn-2
mn-1
? 0, 1, 2
x2n-3
. . .
xn-1
x0
. . .
x1
. . . . . . .
. . .
xn-1
x0
x1
35Classical Algorithm (2)
q
x
q rem r
qn-1
qn-2
. . .
q0
m
Normalization
mn-1
. . . .
m
00101
. . . .
m
101
00
mn-1
b
2?-1
mn-1 gt
2
36Montgomery Modular Multiplication (1)
C A ? B mod M
A, B, M k-bit numbers
Montgomery domain
Integer domain
A
A A ? 2k mod M
B
B B ? 2k mod M
C MP(A, B, M) A ? B ? 2-k mod M
(A ? 2k) ? (B ? 2k) ? 2-k mod M
A ? B ? 2k mod M
C A ? B
C C ? 2k mod M
37Montgomery Modular Multiplication (2)
A
A
A MP(A, 22k mod M, M)
C
C
C MP(C, 1, M)
38Montgomery Modular Multiplication (3)
2k bits
X AB
x2n-1
x2n-2
x2n-3
xn
. . .
x0
. . .
x1
q0M
x2n-1
x2n-2
x2n-3
0
xn
. . .
. . .
x1
q1Mb
x2n-1
x2n-2
x2n-3
0
0
x2
. . .
. . . . . .
C 2k X zM C 2k ? X AB C ? AB 2-k
0
0
. . .
0
C
k bits
39Comparison of Modular Reduction Algorithms (1)
General Features
Name
Can be used as a general purpose division
algorithm (giving both quotient and remainder).
classical
Assumes that the same modulus m is used for many
reductions. Restrictions on the size of the
dividend x.
Barrett
Assumes that the same modulus m is used for many
reductions. No restrictions on the size of the
dividend x. Large amount of memory.
Selby-Mitchell
Efficient only for a sequence of modular
reductions with the same modulus, e.g., for
modular exponentiation. Inefficient for a single
modular reduction.
Montgomery
40Comparison of Modular Reduction Algorithms (2)
of s-p MULs
of s-p DIVs
Time
Complexity
Name
?(n2)
tMUL-PP lt tMOD lt 2 tMUL-PP
n
n (n2.5)
classical
For a paper-and-pencil multiplication
The same as multiplication algorithm in use
Depends on the multiplication algorithm in use
0
Barrett
? tMUL-PP
l
0
0
?(n2)
Selby-Mitchell
tMUL-PP
1 ?
0
?(n2)
n (n1)
? tMUL-PP
Montgomery
time of s-p MUL
tMUL-PP - time of multiplication
using paper-and-pencil algorithm
l - word length in bytes
?
time of s-p ADD
41Comparison of Modular Reduction Algorithms (3)
Pre calculations
Post calculations
Memory Requirements
Restrictions
Name
unnormali- zation
None
normalization
classical
0 ? x lt b2n
b2n
22k
Barrett
n words
None
m
0 ? x lt 22k
m
1986
Calculating the look-up table dependent on the
modulus m
2w1 (n1) words
None
Selby-Mitchell
None
1989
-m0-1 mod b
0 ? x lt m bn
transformation from MM domain
n1 words
Montgomery
b2k mod m 22k mod m
0 ? x lt m 2k
1985
transformation to MM domain
42Operations in GF(2n)
43Multiply Architecture
- Input A(x), B(x) ?GF(2m)
- Output C lt AB mod P
- 1. C lt 0
- 2. for i m-1 to 0 do
- C lt Cx Abi
- C lt C cmP
- 5. end for
- 6. return C
44Shift-and-Add Multiplier
d
X
Y
d
X
n d
n d
d
Z
2n 1... d
2n 1... 0
2n
2n - d
red 163 red 193 red 233
en
m (reduced) 2n (non - reduced)
P
45Least Significant Digit First Multiplier
d
X
Y
m
red 163 red 193 red 233
n
d
d
n
X
n d
n d
n d
Z
n d
red 163 red 193 red 233
m
P