Distributed Arithmetic

About This Presentation

Title:

Distributed Arithmetic

Description:

Title [Sample Course Title Slide Insert Presentation Title] Author: Xilinx Last modified by: GEC Created Date: 1/3/1999 11:00:45 PM Document presentation format – PowerPoint PPT presentation

Number of Views:136

Avg rating:3.0/5.0

Slides: 14

Provided by: Xil2

Learn more at: https://www.egr.msu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Arithmetic

1
Distributed Arithmetic

Dr Sumam David S.
Dept. of EC, NITK Surathkal
Courtesy for slides Xilinx Professors Workshop
Resources

2
Objective

Distributed arithmetic
What ?
Where ?
How ?

3
What is DA?

Multiplication using LUT
Used to implement multipliers in LUT rich FPGAs

4
Twos Complement Multiplication

One bit at a time

5
SDA 1-Tap FIR Filter
N BITS WIDE SAMPLE DATA
Partial Product ROM
A0
/-
X0
Parallel to serial converter
Scaling Accumulator
6
Distributed Arithmeticfor a 2-Tap Filter

Partial products of equal weight are added
together before being summed to next higher
partial product weight
Create look-up table of summed partial products

-23 22 21 20
-23 22 21 20
C0 1 0 0 1 (-7)
C1 0 1 1 0 ( 6)
X0 0 1 1 1 ( 7)
X
X1 0 1 0 1 ( 5)
X

( 1 0 0 1 ( 1 0 0
1 ( 1 0 0 1 (0 0 0 0 1 1 0 0 1
1 1 1
0 1 1 0) 0 0 0
0 ) 0 1 1 0 ) 0 0 0 0
) 0 0 0 1 1 1 1 0
1 1 1 1 1 0 0
1 1 1 1 1 0 0 0 0 1
1 1 0 1 1 0 1
(-1) (-14) (-4) (0) (-19)
(-49)
( 30)
(Serial-Data / Tap-Parallel Multiply)
Sign Extension
7
SDA 2-Tap FIR Filter
N BITS WIDE SAMPLE DATA
Partial Product ROM
A0
X0
/-
A1
X1
Scaling Accumulator
8
SDA 4-Tap FIR Filter
N BITS WIDE SAMPLE DATA
Partial Product ROM
A0
0000...0
X0
C0
1

A1
0000...0
X1
C1

1
A2
0000...0
X2
C2
1

A3
0000...0
X3
C3
9
SDA 8-Tap FIR Filter
N BITS WIDE SAMPLE DATA
A0
Partial Product ROM
X0
A1
X1
A2
Pre-Adder
X2
A3
X3

/-
A0
X4
Partial Product ROM
Scaling Accumulator
A1
X5
A2
X6
4 -input LUT contains all possible sums of the
partial products
A3
X7
10
Xilinx DA FIR Performance
6000
Dual MAC
DA FIR B8
5000
DA FIR B12
4000
DA FIR B16
3000
Performance (MMACs/s)
Serial FPGA FIR
2000
1000
0
0
50
100
150
200
250
Filter Length (Taps)
Filter Length (Taps)
fclk 200 MHz for both processor and FPGA B
data sample precision for FPGA
11
Trade Clock Cycles for Logic Area
Trade Clock Cycles for Logic Area
Multi bits per clock cycle
20Ms/s
160Ms/s
b7
b7
b7
Serial-DA
Parallel-DA
b4
b3
b0
Hardware Over-sampling 4
b0
Hardware Over-sampling 8
Hardware Over-sampling 2
b0
b0
b7
b3
Hardware Over-sampling 1
b4
b0
The sample is serialized and processed 1 bit per
clock cycle. 8 clock cycles are thus required to
process the whole sample
The sample is serialized and processed 2 bits per
clock cycle. 4 clock cycles are thus required to
process the whole sample
The sample is processed in parallel 8 bits per
clock cycle
The sample is serialized and processed 4 bits per
clock cycle
b0
12
Conclusion