Title: CSE 246: Computer Arithmetic Algorithms and Hardware Design
1CSE 246 Computer Arithmetic Algorithms and
Hardware Design
Lecture 6.1 Multiplication Arithmetic
- Instructor
- Prof. Chung-Kuan Cheng
2Topics
- Karatsubas Method (1962)
- Tooms Method (1963)
- Modular Method
- FFT
3Karatsubas Method
- U2nU1U0, V2nV1V0
- UV 22nU1V12n(U1V0U0V1)U0V0
- (22n2n)U1V12n(U1-U0)(V0-V1)(2n1)U0V0
T(2n)lt 3T(n)cn T(2k)ltc(3k-2k) T(n)T(2lgn)ltc(3
lgn-2lgn)lt3cnlg3
lg31.585
4Tooms Method
- U2rnUr2nU1U0
- V2rnVr2nV1V0
- U(x) xrUrxU1U0
- V(x) xrVrxV1V0
- U(x)V(x)W(x) x2rW2rxW1W0
- Set 2r1 equations
- W(0)U(0)V(0)
- W(1)U(1)V(1)
- W(2r)U(2r)V(2r)
5Tooms Method
- T((r1)n)lt (2r1)T(n)cn
- T(n)ltcnlogr1(2r1)ltcn1logr12
Theorem Given egt 0, there exists a
multiplication algorithm such that the number of
elementary operation T(n) needed to multiply two
n-bit numbers satisfies for some constant c(e)
independent of n
T(n)ltc(e)n1e
6Tooms Method
- U(4,13,2)16, V(9,2,5)16
- U(x)4x213x2, V9x22x5
- W(x)U(x)V(x)
- W(0)10, W(1)304,W(2)1980
- W(3)7084,W(4)18526
- W(x) x2rW2rxW1W0
7Tooms Method
- W(x) x2rW2rxW1W0
- Rewrite
- W(x) a2rx2ra1x1a0
- where xkx(x-1)(x-k1)
- W(x1)-W(x)
- 2ra2rx2r-1(2r-1)a2r-1x2r-2a1
- (W(x2)-W(x1))-(W(x1)-W(x))
- 2r(2r-1)a2rx2r-2(2r-1)(2r-2)a2r-1x2r-32a2
8Tooms Method
- W()10, 304, 1980, 7084, 18526
- W()294, 1676, 5104, 11442
- W()1382, 3428, 6338
- W()/2 691, 1714, 3169
- W()/2 1023, 1455
- W()/6 341, 485
- W()/6 144
- W()/24 36
- W(x) 36x4341x3691x2294x110
- (((36(x-3)341)(x-2)691)(x-1)294)x10
- 36x4125x364x269x10
9Tooms Method
10Toom and Cooks Method
- Theorem There is a constant c such that the
execution time of Toom and Cooks method is less
than - cn23.5sqrt(lgn) cycles
11Modular Method (Schonhage)
- Recursive formula q01, qk13qk-1
- Thus, we have qk1/2(3k1)
- Relatively prime pi
- 6qk-1,6qk1,6qk2,6qk3,6qk5,6qk7
- Set six moduli
- mi2pi-1
12Modular Method
- Given U and V, Find WUxV
- Compute uiUmodmi viVmodmi
- Compute wiuixvimodmi
- Recover W
- T(n)O(nlog36)O(n1.631)
13FFT
Given U(t)(u0,u1,uK-1),V(t)(v0,v1,vK-1) Find
P(t)(p0,p1,,pK-1), where ptsum(ijt modK) uivj
- Set wexp(2pi/K), i.e. wK1
- us sum(0lttltK) wstut
- vs sum(0lttltK) wstvt
- U(s)V(s)(u0v0,u1v1,,uK-1vK-1)
- P(s)U(s)V(s), psusvs
- ps sum(0lttltK) wstpt
14FFT
- Kgt 2n-1, unun1uK-10
- vnvn1vK-10
- ptsum(ijt modK)uivj
- utv0ut-1v1u0vt
15FFT (K2k ,t(tk-1,,t0))
- Set A0(tk-1,,t0)ut ,i.e. A0(t)ut
- Set A1(sk-1,tk-2,,t0)
- A0(0,tk-2,,t0)w2k-1sk-1A0(1,tk-2,,t0)
- Set A2(sk-1,sk-2,tk-3,,t0)
- A1(sk-1,0,tk-3,,t0)
- w2k-2(sk-2sk-1)2A1(sk-1,1,tk-3,,t
0) - Set Ak(sk-1,sk-2,sk-3,,s0)
- Ak-1(sk-1,,s1,0)
- w(s0s1sk-1)2 Ak-1(sk-1,,s1,1)
16FFT (K2k ,t(tk-1,,t0))
- Replace tk-1 with sk-1
- sk-1 determines w2k-1sk-1
- Replace tk-2 with sk-2
- sk-1,sk-2 determines w2k-2(sk-2sk-1)2
- Replace t0 with s0
- sk-1,sk-2,,s0 determines w(s0s1sk-1)2
- Binary s(s0,s1,,sk-1)2
17FFT (K2k ,t(tk-1,,t0))
- By induction, we have
- Aj(sk-1,,sk-j,tk-j-1,,t0)
- sum(tk-1,,tk-j)w2k-j (sk-j,,sk-1)2
(tk-1,,tk-j)2ut - Ak(sk-1,,s0)
- sum(tk-1,,t0) w(s0,,sk-1)2(tk-1,,t0)2ut
- us
18FFT k2
19FFT k2
20FFT k2
21FFT k2
22FFT k3
23FFT k3
24FFT k3
25FFT k3
26FFT
- usu0u1su2s2u2k-1s2k-1
- usu0u2s2u2k-2s2k-2
- u1su3s3u2k-1s2k-1
- us Fe(s2) sFd(s2)
- Fe(s2)u0u2s2u2k-2s2k-2
- Fd(s2)u1u3s2u2k-1s2k-1
- us Fee(s4)s2Fed(s4) sFde(s4) s2Fdd(s4)
27FFT
- usu0u1su2s2u2k-1s2k-1
- us Fee(s4)s2Fed(s4) sFde(s4) s2Fdd(s4)
- us Feee(s8) s4Feed(s8) s2Fede(s8)
s4Fedd(s8) sFdee(s8)s4Fded(s8)
s2Fdde(s8) s4Fddd(s8) - Fxx(s2k-1) Fxxe(s2k) s2k-1Fxxd(s2k)
28FFT
- usu0u1su2s2u3s3u4s4u5s5u6s6u7s7
- us Fe(s2) sFd(s2)
- Fe(s2)u0u2s2u4s4u6s6
- Fd(s2)u1u3s2u5s4u7s6
- Fe(s2)Fee(s4) s2Fed(s4)
- Fee(s4)u0u4s4, Fed(s4)u2u6s6
- Fd(s2)Fde(s4) s2Fdd(s4)
- Fde(s4)u1u5s4, Fdd(s4)u3u7s4
- Fx(sw0)Fx(sw4), Fx(sw2)Fx(sw6),
Fx(sw)Fx(sw5), Fx(sw3)Fx(sw7) - xe,d (s0,s1,s2)(-,0,0),(-,0,1),(-,1,0),(-,1
,1) - Fxx(sw0)Fxx(sw2)Fxx(sw4)Fxx(sw6),
Fxx(sw)Fxx(sw3)Fxx(sw5)Fxx(sw7), - xxee,ed,de,dd, (s0,s1,s2)(-,-,0),(-,-,1)
29FFT (Inversion)
- ur sum(0ltsltK)wrsus
- sum(0lts,tltK)wrswstut
- sum(0lttltK)utsum(0ltsltK)ws(tr)
- Ku(-r)modK
- sum(0ltsltK)wsjK if jmodK0,
- 0 otherwise.
30FFT
- 2nlt2k glt 4n, K2k
- Precision m 6k
- Let M time of m-bit multiplication
- Total time to multiply n-bit numbers
- O(n)O(Mnk/g)