CSE 246: Computer Arithmetic Algorithms and Hardware Design - PowerPoint PPT Presentation

About This Presentation
Title:

CSE 246: Computer Arithmetic Algorithms and Hardware Design

Description:

CSE 246: Computer Arithmetic Algorithms and Hardware Design ... w9. w6. w3. 1 (011) w28. w8. w4. 1 (100) w35. w10. w5. 1 (101) w42. w12. w6. 1 (110) w21. w6. w3 ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 31
Provided by: Ale8219
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: CSE 246: Computer Arithmetic Algorithms and Hardware Design


1
CSE 246 Computer Arithmetic Algorithms and
Hardware Design
Lecture 6.1 Multiplication Arithmetic
  • Instructor
  • Prof. Chung-Kuan Cheng

2
Topics
  • Karatsubas Method (1962)
  • Tooms Method (1963)
  • Modular Method
  • FFT

3
Karatsubas Method
  • U2nU1U0, V2nV1V0
  • UV 22nU1V12n(U1V0U0V1)U0V0
  • (22n2n)U1V12n(U1-U0)(V0-V1)(2n1)U0V0

T(2n)lt 3T(n)cn T(2k)ltc(3k-2k) T(n)T(2lgn)ltc(3
lgn-2lgn)lt3cnlg3
lg31.585
4
Tooms Method
  • U2rnUr2nU1U0
  • V2rnVr2nV1V0
  • U(x) xrUrxU1U0
  • V(x) xrVrxV1V0
  • U(x)V(x)W(x) x2rW2rxW1W0
  • Set 2r1 equations
  • W(0)U(0)V(0)
  • W(1)U(1)V(1)
  • W(2r)U(2r)V(2r)

5
Tooms Method
  • T((r1)n)lt (2r1)T(n)cn
  • T(n)ltcnlogr1(2r1)ltcn1logr12

Theorem Given egt 0, there exists a
multiplication algorithm such that the number of
elementary operation T(n) needed to multiply two
n-bit numbers satisfies for some constant c(e)
independent of n
T(n)ltc(e)n1e
6
Tooms Method
  • U(4,13,2)16, V(9,2,5)16
  • U(x)4x213x2, V9x22x5
  • W(x)U(x)V(x)
  • W(0)10, W(1)304,W(2)1980
  • W(3)7084,W(4)18526
  • W(x) x2rW2rxW1W0

7
Tooms Method
  • W(x) x2rW2rxW1W0
  • Rewrite
  • W(x) a2rx2ra1x1a0
  • where xkx(x-1)(x-k1)
  • W(x1)-W(x)
  • 2ra2rx2r-1(2r-1)a2r-1x2r-2a1
  • (W(x2)-W(x1))-(W(x1)-W(x))
  • 2r(2r-1)a2rx2r-2(2r-1)(2r-2)a2r-1x2r-32a2

8
Tooms Method
  • W()10, 304, 1980, 7084, 18526
  • W()294, 1676, 5104, 11442
  • W()1382, 3428, 6338
  • W()/2 691, 1714, 3169
  • W()/2 1023, 1455
  • W()/6 341, 485
  • W()/6 144
  • W()/24 36
  • W(x) 36x4341x3691x2294x110
  • (((36(x-3)341)(x-2)691)(x-1)294)x10
  • 36x4125x364x269x10

9
Tooms Method
10
Toom and Cooks Method
  • Theorem There is a constant c such that the
    execution time of Toom and Cooks method is less
    than
  • cn23.5sqrt(lgn) cycles

11
Modular Method (Schonhage)
  • Recursive formula q01, qk13qk-1
  • Thus, we have qk1/2(3k1)
  • Relatively prime pi
  • 6qk-1,6qk1,6qk2,6qk3,6qk5,6qk7
  • Set six moduli
  • mi2pi-1

12
Modular Method
  • Given U and V, Find WUxV
  • Compute uiUmodmi viVmodmi
  • Compute wiuixvimodmi
  • Recover W
  • T(n)O(nlog36)O(n1.631)

13
FFT
Given U(t)(u0,u1,uK-1),V(t)(v0,v1,vK-1) Find
P(t)(p0,p1,,pK-1), where ptsum(ijt modK) uivj
  • Set wexp(2pi/K), i.e. wK1
  • us sum(0lttltK) wstut
  • vs sum(0lttltK) wstvt
  • U(s)V(s)(u0v0,u1v1,,uK-1vK-1)
  • P(s)U(s)V(s), psusvs
  • ps sum(0lttltK) wstpt

14
FFT
  • Kgt 2n-1, unun1uK-10
  • vnvn1vK-10
  • ptsum(ijt modK)uivj
  • utv0ut-1v1u0vt

15
FFT (K2k ,t(tk-1,,t0))
  • Set A0(tk-1,,t0)ut ,i.e. A0(t)ut
  • Set A1(sk-1,tk-2,,t0)
  • A0(0,tk-2,,t0)w2k-1sk-1A0(1,tk-2,,t0)
  • Set A2(sk-1,sk-2,tk-3,,t0)
  • A1(sk-1,0,tk-3,,t0)
  • w2k-2(sk-2sk-1)2A1(sk-1,1,tk-3,,t
    0)
  • Set Ak(sk-1,sk-2,sk-3,,s0)
  • Ak-1(sk-1,,s1,0)
  • w(s0s1sk-1)2 Ak-1(sk-1,,s1,1)

16
FFT (K2k ,t(tk-1,,t0))
  • Replace tk-1 with sk-1
  • sk-1 determines w2k-1sk-1
  • Replace tk-2 with sk-2
  • sk-1,sk-2 determines w2k-2(sk-2sk-1)2
  • Replace t0 with s0
  • sk-1,sk-2,,s0 determines w(s0s1sk-1)2
  • Binary s(s0,s1,,sk-1)2

17
FFT (K2k ,t(tk-1,,t0))
  • By induction, we have
  • Aj(sk-1,,sk-j,tk-j-1,,t0)
  • sum(tk-1,,tk-j)w2k-j (sk-j,,sk-1)2
    (tk-1,,tk-j)2ut
  • Ak(sk-1,,s0)
  • sum(tk-1,,t0) w(s0,,sk-1)2(tk-1,,t0)2ut
  • us

18
FFT k2

19
FFT k2

20
FFT k2

21
FFT k2

22
FFT k3
23
FFT k3
24
FFT k3
25
FFT k3

26
FFT
  • usu0u1su2s2u2k-1s2k-1
  • usu0u2s2u2k-2s2k-2
  • u1su3s3u2k-1s2k-1
  • us Fe(s2) sFd(s2)
  • Fe(s2)u0u2s2u2k-2s2k-2
  • Fd(s2)u1u3s2u2k-1s2k-1
  • us Fee(s4)s2Fed(s4) sFde(s4) s2Fdd(s4)

27
FFT
  • usu0u1su2s2u2k-1s2k-1
  • us Fee(s4)s2Fed(s4) sFde(s4) s2Fdd(s4)
  • us Feee(s8) s4Feed(s8) s2Fede(s8)
    s4Fedd(s8) sFdee(s8)s4Fded(s8)
    s2Fdde(s8) s4Fddd(s8)
  • Fxx(s2k-1) Fxxe(s2k) s2k-1Fxxd(s2k)

28
FFT
  • usu0u1su2s2u3s3u4s4u5s5u6s6u7s7
  • us Fe(s2) sFd(s2)
  • Fe(s2)u0u2s2u4s4u6s6
  • Fd(s2)u1u3s2u5s4u7s6
  • Fe(s2)Fee(s4) s2Fed(s4)
  • Fee(s4)u0u4s4, Fed(s4)u2u6s6
  • Fd(s2)Fde(s4) s2Fdd(s4)
  • Fde(s4)u1u5s4, Fdd(s4)u3u7s4
  • Fx(sw0)Fx(sw4), Fx(sw2)Fx(sw6),
    Fx(sw)Fx(sw5), Fx(sw3)Fx(sw7)
  • xe,d (s0,s1,s2)(-,0,0),(-,0,1),(-,1,0),(-,1
    ,1)
  • Fxx(sw0)Fxx(sw2)Fxx(sw4)Fxx(sw6),
    Fxx(sw)Fxx(sw3)Fxx(sw5)Fxx(sw7),
  • xxee,ed,de,dd, (s0,s1,s2)(-,-,0),(-,-,1)

29
FFT (Inversion)
  • ur sum(0ltsltK)wrsus
  • sum(0lts,tltK)wrswstut
  • sum(0lttltK)utsum(0ltsltK)ws(tr)
  • Ku(-r)modK
  • sum(0ltsltK)wsjK if jmodK0,
  • 0 otherwise.

30
FFT
  • 2nlt2k glt 4n, K2k
  • Precision m 6k
  • Let M time of m-bit multiplication
  • Total time to multiply n-bit numbers
  • O(n)O(Mnk/g)
Write a Comment
User Comments (0)
About PowerShow.com