Title: Is F Better than D
1Is F Better than D
- David Hansen and James Michelussi
2Introduction
- Discrete Fourier Transform (DFT)
- Fast Fourier Transform (FFT)
- FFT Algorithm Applying the Mathematics
- Implementations of DFT and FFT
- Hardware Benchmarks
- Conclusion
3DFT
- In 1807 introduced by Jean Baptiste Joseph
Fourier. - allows a sampled or discrete signal that is
periodic to be transformed from the time domain
to the frequency domain - Correlation between the time domain signal and N
cosine and N sine waves
X(k) DFT Frequency Signal N Number of Sample
Points X(n) Time Domain Signal WN Twiddle
Factor
4DFT (Walking Speed)
- Why is this important? Where is this used?
- allows machines to calculate the frequency domain
- allows for the convolution of signals by just
multiplying them together - Used in digital spectral analysis for speech,
imaging and pattern recognition as well as signal
manipulation using filters - But the DFT requires N2 multiplications!
5FFT (Jet Speed)
- J. W. Cooley and J. W. Tukey are given credit for
bringing the FFT to the world in the 1960s - Simply an algorithm for more efficiently
calculating the DFT - Takes advantage of symmetry and periodicity in
the twiddle factors as well as uses a divide and
conquer method - Symmetry WNr N/2 -WNr
- Periodicity WNrN WNr
- Requires only (N/2)log2(N) multiplications !
- Faster computation times
- More precise results due to less round-off error
6FFT Algorithm
- Several different types of FFT Algorithms
(Radix-2, Radix-4, DIT DIF) - Focus on Radix-2 using Decimation in Time (DIT)
method - Breaks down the DFT calculation into a number of
2-point DFTs - Each 2-point DFT uses an operation called the
Butterfly - These groups are then re-combined with another
group of two and so on for log2(N) stages - Using the DIT method the input time domain points
must be reordered using bit reversal
7Butterfly Operation
8Bit Reversal
98-Point Radix-2 FFT Example
108-Point Radix-2 FFT Example
11Implementations of DFT and FFT
12DFT Implementation
for (r0 rltsamples/2 r) float re 0.0f,
im 0.0f float part (float)r -2.0f PI /
(float)samples for (k0 kltsamples
k) float theta part (float)k re
data_ink cos(theta) im data_ink
sin(theta)
- Nested For Loop, (N/2)N Iterations O(N2)
- 63027.41 Cycles / Sample (123 cycles per inner
loop iteration) - Obvious Inefficiencies, cos and sin math.h
functions - Efficient assembly coding could reduce the inner
loop to 3 cycles per iteration (1,536 cycles /
sample)
13C FFT Implementation
void fft_float (unsigned NumSamples, float
RealIn, float ImagIn, float RealOut,
float ImagOut ) for ( i0 i lt NumSamples
i ) // Iterate over the samples and
perform the bit-reversal j ReverseBits
( i, NumBits ) BlockEnd 1 //
Following loop iterates Log2(NumSamples) for
( BlockSize 2 BlockSize lt NumSamples
BlockSize ltlt 1 ) // Perform Angle
Calculations (Using math.h sin/cos) //
Following 2 loops iterate over NumSamples/2
for ( i0 i lt NumSamples i BlockSize )
for ( ji, n0 n lt BlockEnd
j, n ) // Perform
butterfly calculations
BlockEnd BlockSize
14C FFT Implementation
- Bit-Reverse For Loop N iterations
- Nested For Loops
- First Outer Loop Log2(N) iterations
- Made use of sin/cos math.h functions
- Second Outer Loop N / BlockSize iterations
- Inner Loop BlockSize/2 iterations
- O(N Log2(N) N/BlockSize BlockSize/2)
- O(NNLog2(N))
- 193.84 Cycles / Sample
15Assembly FFT Implementation
- Bit-Reverse Address Generation
- Hide Bit-Reverse operation inside first and
second FFT Stages - Sin and Cos values stored in a Look-Up-Table
- 256 Kbyte LUT added to Data1
- Needed to grow Data1 Memory Space using LDF file
- Interleaved Real and Imaginary Arrays
- Quad Reads Loads 2 Complex Points per Cycle
- Supports the Real FFT for input signals with no
Imaginary component - 40 Algorithm-based Savings
16Assembly FFT Implementation
- Special Butterfly Instruction
- Can perform addition/subtraction in parallel in
one compute block - Speeds up the inner-most loop
- VLIW and SIMD Operations
- Performs simultaneous operations in both compute
blocks - Loop unrolling and instruction scheduling keeps
the entire processor busy with instructions. - 11.35 Cycles per Sample
17Assembly FFT Implementation
18DC FFT Test
19Audio FFT Test
201024 Point DFT / FFT Comparison
211024 Point Radix-2 FFT Hardware Comparison
22Conclusion
- The FFT algorithm is very useful when computing
the frequency domain on a DSP. - FFT is much faster than a regular DFT algorithm
- FFT is more precise by having less errors
created due to round off. - The timed coding examples further support this
claim and demonstrate how to code the algorithm.
- The Radix-2 FFT isnt the fastest but it uses a
less complex addressing and twiddle factor
routine - In this case (unlike in school) F is better then
D.