Title: An FPGA Based Adaptive Viterbi Decoder
1 An FPGA Based Adaptive Viterbi Decoder
Sriram Swaminathan Russell
Tessier Department of ECE University of
Massachusetts Amherst
2Overview
- Introduction
- Objectives
- Background
- Adaptive Viterbi Algorithm
- Architecture and Implementation Issues
- Results
- Related Work
- Summary and Future Work
3Introduction
- A Digital Data Communication System
-
Source
Bitstream with redundancy
information
Bitstream
Channel Encoder
Source Encoder
Modulator
Convolutional encoder
Noise
Sink
Source Decoder
Channel Decoder
DeModulator
Viterbi
Bitstream
4Goals
- Implement Adaptive Viterbi Algorithm on hardware
- Constraints
- Data rate (or throughput) - 20 Kbps
- Probability of Error or Bit Error Rate (BER) lt
10-5 - of errors / Length of Sequence
- Minimize
- Design-time area
5Convolutional Encoder
- Accepts information bits as a continuous stream
- Operates on the current b-bit input, where b
ranges from 1 to 6 and some number of immediately
preceding b-bit inputs to produce V output bits,
V gt b
b 1, V 2
0
FF
FF
1
0
0
1
6Definitions
- Constraint Length
- Number of successive b-bit groups of information
bits for each encoding operation - Denoted by K
- Code Rate (or) Rate
- b/V
- Typical values
- K 7
- Rate 1/2, 1/3
7The Viterbi Algorithm
- Finds a bit-sequence in the set of all possible
transmitted bit-sequences that most closely
resembles the received data. - Maximum likelihood algorithm
- Each bit received by decoder associated with a
measure of correctness. - Practical for short constraint length
convolutional codes
8State diagram
0/00
- State
- Encoder memory
- Branch
- k/ij,
- where i and j represent
- the output bits
- associated with
- input bit k
00
1/11
0/11
1/00
10
01
0/10
1/01
0/01
11
1/10
9Trellis Diagram
K 3 Rate ½
T0
T1
T2
T3
Accumulated metric
0
0
2
3
00
00
00
00
22,30 3
11
11
11
11
3
1
01
01,31 1
Total number of states 2K-1
00
10
2
0
2
10
10
20,31 2
01
01
3
1
11
01,31 1
10
ENC IN 0 1 0 ENC OUT
00 11 10 RECEIVED 00 11
11
10 Adaptive Viterbi Algorithm
- Motivation
- Extremely large memory and logic for Viterbi
- Algorithm
- Fewer number of paths retained
- Reduced memory and computation
- Definitions
- Path Bit sequence
- Path metric or cost Accumulated error metric of
a path - Survivor Path which is retained for the
subsequent time step
11Adaptive Viterbi Algorithm
- Criterion for path survival
- A threshold T is introduced such that a path is
retained if and only if current path metric is
less than dmT, where dm is the minimum cost
among all survivors of the previous time step. - The total number of survivors per time step is
limited to a critical number called Nmax selected
by user. - Only best Nmax paths have to be retained at any
time.
12Trellis Diagram for AVA
13Parameters in the algorithm
- Constraint length K
- Truncation length, TL
- Rate R
- Threshold T
- Maximum of paths per time Nmax
14Influence of Threshold T and Nmax
- Threshold T
- Smaller T, low average of survivors, increased
BER - Larger T, high average of survivors, reduced
BER - Nmax
- Smaller Nmax
- Possibility of discarding the best path gt high
BER - Smaller area
- Larger Nmax
- Reduced BER
- Larger area
- Selection of Nmax and T crucial
15Variation of BER with T and Nmax for K 9 14
T18 Nmax 9
T24 Nmax 41
16Optimal values of Nmax, T and TL for different Ks
17Simplified View of Adaptive Viterbi Decoder
Decoded output
18Survivor Memory
- Store all possible bit-sequences(paths) before
making a decision - Size of memory for Viterbi
- Rows Nmax
- Columns Truncation Length - (3-5) K
- Two schemes
- Traceback
- Large Latency, small area, low power
- Register Exchange
- Fast, Large area, large power
Truncation length
Nmax
19Practical Considerations
- Serial Implementation
- Same ACS repeatedly used for all states
- Small area, Inexpensive
- Slow, Low throughput (data rate)
- Parallel Implementation
- Each State has its own ACS (2K-1 ACS)
- Fast, High throughput (data rate)
- Large area, bottleneck for large K values
20 Architecture
21Architecture (contd.)
di lt dm T
Count paths
Elimination of sorting
Add
yes
b1
Count lt Nmax
sum1
yes
Update memory
Add
b2
no
di lt dm T
yes
sum2
T T-2
22 System Model
Test-bench
23FPGA Implementation
- FPGA can exploit the parallelism
- Dynamic reconfiguration for performance
enhancement - Implementation platform
- WildOne-XL FPGA board from Annapolis Microsystems
Inc. - 2 XC4036 FPGAs, one for user application
- Simulation on Virtex XCV1000
24Hardware implementation
RTL description in VHDL
HDL Simulation
Cadence Affirma tools
Synthesis
Synplicity Synplify Pro
FPGA Mapping, place and route
Xilinx Foundation 2.1i
XC4036XL-08
FPGA
25XC4036XL FPGA Resource utilization
K CLBs LUTs FFs
4i/p 3i/p
4 553 978 196 278 5 1194 2046 340
540
6 1206 2081 482 724
7 1215 2087 537 756
8 1284 2119 654 788
9 1296 2213 615 820
26Decoding rate on XC4036 FPGA
- Overheads
- 32-bit, 33 MHz PCI bus
- Execution of Wildone API using VC
- Slowdown
- 1.5-2 times
FPGA freq.(MHz) 40.455 20.089 19.857
19.674 17.576 17.316
27Issues in Reconfiguration
- Reconfigurable Units
- Number of ACS units (depends on number of
survivors) - Run-time survivor memory
- Reconfiguration types
- Fine-grained - infeasible
- Coarse-grained - feasible
- Motivation
- Performance improvement
- Tradeoff
- Small SNR (noisy channel), Large K, slow decoding
- Large SNR (less noisy channel), Small K, fast
decoding - Maintain approx. same BER
28Coarse-timescale reconfiguration
- 20.9 performance improvement over static
Less Noisy channel
Noisy channel
29Coarse-timescale reconfiguration Experimental
Approach
- Vary channel noise during transmission
- Noise changes 250,000 bits or 1.5 to 2.5
seconds - If noise change is detected
- Download new decoder configuration content to the
FPGA on WildOne board - Reconfiguration overhead 40 mS
- PCI bus transfer Noise change detection
download bitstream
30Comparison with microprocessor
- Intel Celeron 366 MHz, 128 MB RAM
- Speed-up
- Up to 7.5X for XC4036 (incl. overheads)
31Conclusions and future work
- A new adaptive Viterbi decoder
- dynamically reconfigurable
- 21 improvement over static
- Scales linearly
- Speed-up up to 7.5X over a microprocessor
- Future Research
- Extend present concept to Power-aware dynamic
reconfiguration