Distributed Arithmetic - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed Arithmetic

Description:

Title [Sample Course Title Slide Insert Presentation Title] Author: Xilinx Last modified by: GEC Created Date: 1/3/1999 11:00:45 PM Document presentation format – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 14
Provided by: Xil2
Learn more at: https://www.egr.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Distributed Arithmetic


1
Distributed Arithmetic
  • Dr Sumam David S.
  • Dept. of EC, NITK Surathkal
  • Courtesy for slides Xilinx Professors Workshop
    Resources

2
Objective
  • Distributed arithmetic
  • What ?
  • Where ?
  • How ?

3
What is DA?
  • Multiplication using LUT
  • Used to implement multipliers in LUT rich FPGAs

4
Twos Complement Multiplication
  • One bit at a time

5
SDA 1-Tap FIR Filter
N BITS WIDE SAMPLE DATA
Partial Product ROM
A0
/-
X0
Parallel to serial converter
Scaling Accumulator
6
Distributed Arithmeticfor a 2-Tap Filter
  • Partial products of equal weight are added
    together before being summed to next higher
    partial product weight
  • Create look-up table of summed partial products

-23 22 21 20
-23 22 21 20
C0 1 0 0 1 (-7)
C1 0 1 1 0 ( 6)
X0 0 1 1 1 ( 7)
X
X1 0 1 0 1 ( 5)
X

( 1 0 0 1 ( 1 0 0
1 ( 1 0 0 1 (0 0 0 0 1 1 0 0 1
1 1 1
0 1 1 0) 0 0 0
0 ) 0 1 1 0 ) 0 0 0 0
) 0 0 0 1 1 1 1 0
1 1 1 1 1 0 0
1 1 1 1 1 0 0 0 0 1
1 1 0 1 1 0 1
(-1) (-14) (-4) (0) (-19)
(-49)
( 30)
(Serial-Data / Tap-Parallel Multiply)
Sign Extension
7
SDA 2-Tap FIR Filter
N BITS WIDE SAMPLE DATA
Partial Product ROM
A0
X0
/-
A1
X1
Scaling Accumulator
8
SDA 4-Tap FIR Filter
N BITS WIDE SAMPLE DATA
Partial Product ROM
A0
0000...0
X0
C0
1

A1
0000...0
X1
C1

1
A2
0000...0
X2
C2
1

A3
0000...0
X3
C3
9
SDA 8-Tap FIR Filter
N BITS WIDE SAMPLE DATA
A0
Partial Product ROM
X0
A1
X1
A2
Pre-Adder
X2
A3
X3

/-
A0
X4
Partial Product ROM
Scaling Accumulator
A1
X5
A2
X6
4 -input LUT contains all possible sums of the
partial products
A3
X7
10
Xilinx DA FIR Performance
6000
Dual MAC
DA FIR B8
5000
DA FIR B12
4000
DA FIR B16
3000
Performance (MMACs/s)
Serial FPGA FIR
2000
1000
0
0
50
100
150
200
250
Filter Length (Taps)
Filter Length (Taps)
fclk 200 MHz for both processor and FPGA B
data sample precision for FPGA
11
Trade Clock Cycles for Logic Area
Trade Clock Cycles for Logic Area
Multi bits per clock cycle
20Ms/s
160Ms/s
b7
b7
b7
Serial-DA
Parallel-DA
b4
b3
b0
Hardware Over-sampling 4
b0
Hardware Over-sampling 8
Hardware Over-sampling 2
b0
b0
b7
b3
Hardware Over-sampling 1
b4
b0
The sample is serialized and processed 1 bit per
clock cycle. 8 clock cycles are thus required to
process the whole sample
The sample is serialized and processed 2 bits per
clock cycle. 4 clock cycles are thus required to
process the whole sample
The sample is processed in parallel 8 bits per
clock cycle
The sample is serialized and processed 4 bits per
clock cycle
b0
12
Conclusion
  • Efficiency of computation
  • Slow as its bit serial
  • Memory requirements

13
References
  • The role of Distributed Arithmetic in FPGA based
    signal processing, www.xilinx.com
Write a Comment
User Comments (0)
About PowerShow.com