Title: An Expandable Montgomery Modular Multiplication Processor
1An Expandable Montgomery Modular Multiplication
Processor
- Adnan Abdul-Aziz Gutub Alaaeldin A. M. Amin
- Computer Engineering Department
- King Fahd University of Petroleum Minerals
- Dhahran, SAUDI ARABIA
2Presentation Outline
- Introduction (RSA cryptographic system
- The Systolic Multiplier
- The Basic Cell
- Montgomery Product (MP) Algorithm
- Expandability of the Parallel Design
- The Expandable MP Hardware
- Conclusion
3RSA Public Key Cryptosystem
- Developed in 1978, by Rivest, Shamir Adleman
- Its security is based on the integer factoring
problem - The most popular method -
- simple to understand implement
- same algorithm for encryption decryption
- can also be used for digital signature
4Concept
5Concept
6Concept
7RSA Algorithm
M is the message, (E,N) is the encryption key,
C is the cipher text, (D,N) is the decryption
key.
For Encryption C ME mod N
For Decryption M CD mod N
private
Encryption key (E,N)
Decryption key (D,N)
public
8RSA Security
- Security depends on the key size.
more secure system
larger key size
9RSA Implementations
software slow speed
hardware
- Modular Multiplication
- multiply/divide
- add/subtract
- logarithmic speed
- Montgomery
Modular Exponentiation repeated squaring
10(No Transcript)
11Montgomerys Method
- Introduced by P. Montgomery in 1985
- Modular multiplication with out trial division
- Can be implemented in VLSI
- Requires some pre-computations.
- Suitable for large number multiplication.
12Montgomery Modular Multiplication
OBJECTIVE
To Compute Z XY mod N
Pre-computation R, R-1, N
1
mapping X Y to Montgomery Domain - x XR mod N
, y YR mod N
2
Montgomery Product z MP(x,y) xy R-1 mod N
3
map z from Montgomery to normal Z MP(1,z)
4
13Montgomerys AlgorithmTo compute XY mod N
- Pre-computations
- choose R 2k k number of bits of E R gt N
GCD(R,N)1. - compute R-1 such that R-1R mod N1 0ltR-1ltN.
- compute N such that N-N-1 mod R 0ltNltR.
- compute x X.R mod N.
- compute y Y.R mod N.
Mapping to Montgomerys Domain
performed by software
14Montgomerys AlgorithmMP(x,y) xyR-1 mod N
- Montgomerys Modular Multiplication MP(x,y)
- P x.y
- U P N. (P.N mod R)
- S U/R
- MP S (if SltN) ELSE MP S-N
k
k
A
A2
A1
A mod R
R 2k
A1
A/R
A2
15Number Representation
A
A
Al-1
A1
A0
Al-2
A2
A k-bits l -words
16Numbers Representation
A
A
Al-1
A1
A0
Al-2
A2
A k-bits l -words A k-bits lb - bits
b - bits
17Numbers Representation
A
A
Al-1
A1
A0
Al-2
A2
b - bits
A A0 A12b A222b . . . Al-2 2(l-2)b
Al-12(l-1)b
18The Systolic Multiplier
Control input
clock
Systolic Multiplier
z
0,...,0,1
0,...,0, xl-1 , xl-2 ,...., x1 ,x0
x
0,...,0, yl-1 , yl-2 ,....., y1 ,y0
y
0, q2l-1 , q2l-2 ,........, q1 ,q0
p x.y q
q
p0 , p1,..............., p2l-1 , p2l
p
First product digit
19Building the Systolic Multiplier
clock
zin
z
0,...,0,1
xin
x
0,..,0, xl-1 ,...., x1 ,x0
yin
cell l/21
0,..,0, yl-1 ,....., y1 ,y0
y
cell 1
qin
q
0, q2l-1 ,........, q1 ,q0
pout
p
0
p0 , p1,......., p2l-1 , p2l
- (l/2 1) cells required for l-digit
multiplication
20Expandable Systolic Multiplier
Multiplier for 2l-digits
clock
z
zin
zout
zin
zout
xout
x
xin
xin
xout
y
yout
yin
yin
cell l/21
cell l/21
yout
cell 1
cell 1
q
qout
qin
qin
qout
pout
pin
pout
pin
p
0
Multiplier for l-digits
Multiplier for l-digits
21Systolic Montgomery Reduction(J. Sauerbrey 1992)
VHDL
- N0 -N-1 mod 2b
- p x.y
- for i 0 to l-1
- vi pi . N0 mod 2b
- p pvi N 2bi
- end for
- return p/R
Note that x,y lt Nlt R where R 2lb gcd(R,N)
0
clock
z
0,...,0,1
Systolic Multiplier
0,...,0,Nl-1,...,N0
x
0,...,0,N0,...,N0
y
X
l-times
p x.y q
0, p2l-1 , p2l-2 ,......., p1 ,p0
q
p
0,...,0,t0 , t1,............, tl-1
l-times
22Implementation of the Systolic Montgomery
Reduction for l 4
Correct
T
N
2T
2T
2T
000 N0
2T
2T
2T
T
T
T
T
p(0)
T
T
T
T
y
x
y
2T
p(4)
q
x
x.y q
x.y mod 2b
delay of 2-clock cycles
2b base of numbers x y
Systolic Multiplier
23Clarificationfor l 4
- N0 -N-1 mod 2b
- p(0) x.y
- for i 0 to l-1
- vi pi(i) . N0 mod 2b
- p(i1) p(i) vi N 2b i
- end for
- return p(l)/R
p(0) N0 is precomputed
T
N
2T
2T
2T
0 N0
2T
2T
2T
v0
v2
v1
v3
T
T
T
T
p(0)
T
T
T
T
p(4)
p(0)
p(1)
p(2)
p(3)
24Expandability of the Parallel Implementation
basic design for l-digits
expanded design for 2l-digits
expanded design for 3l-digits
25Projection
T
N
2T
2T
2T
000 N0
2T
2T
2T
T
T
T
T
p(0)
T
T
T
T
y
x
y
2T
p(4)
q
x
x.y q
x.y mod 2b
delay of 2-clock cycles
2b base of numbers x y
Systolic Multiplier
26The Serial MP Design
LOOP i 0 to l-1
p(0) is precomputed
2l
Systolic Multiplier p xy q
z(i)
z(i1)
z
2l 1
2T
N(i)
x
N(i1)
y
v(i)
0
Mux
N0
multiplier
p(i)
q
z(i)
2l1
p
p(i1)
27For Expandability
- Allow input data to have more digits
- Allow systolic multiplier to be expandable
- Allow registers to be expandable
- Multiplexing
28The Expandable MP system
Basic chip for l-digits
input data
Chip for additional l-digits
Chip for additional l-digits
additional l-digits
Results
Design for 2l-digits
Design for 3l-digits
Design for 4l-digits
29VHDL Modeling
- All three designs were modeled in VHDL
- Structural level gt similar to real hardware
- Designs gtgt fully parametrized in terms
- l number of words
- b number of bits in each word
- t time delay for each gate
30Conclusion
- An expandable Montgomery modular multiplication
processor was designed, modeled in VHDL, and
analyzed.
31(No Transcript)
32Systolic Montgomery Reductionsignal flow graph
for l 4
- N0 -N-1 mod 2b
- p(0) x.y
- for i 0 to l-1
- vi pi(i) . N0 mod 2b
- p(i1) p(i) vi N bb i
- end for
- return p(l)/r
y
x
y
q
x
x.y q
x.y mod 2b
2b base of numbers x y
Systolic Multiplier
time 0 1 2 3
4 5 6
....0 0 N3 N2 N1 N0
....0 0 0 0 N0
..p(0)1 p(0)0
33Montgomerys AlgorithmMP(x,y) xyR-1 mod N
- Loop i 0
- v0 p0(0) . N0 mod 2b
- p(1) p(0) v0 N 20
- N0 -N-1 mod 2b
- p(0) x.y
- for i 0 to l-1
- vi pi(i) . N0 mod 2b
- p(i1) p(i) vi N 2b i
- end for
- return p(l)/R
- Loop i 1
- v1 p1(1) . N0 mod 2b
- p(2) p(1) v1 N 2b
- Loop i 2
- v2 p2(2) . N0 mod 2b
- p(3) p(2) v2 N 22b
34suitable for expandability logical start