Design of a HighThroughput LowPower IS95 Viterbi Decoder

About This Presentation

Title:

Design of a HighThroughput LowPower IS95 Viterbi Decoder

Description:

Construction of a complex graph called trellis. Computation of the shortest path. IS95 VD Trellis. 256 nodes # of symbols. 1. 2. 3. 4. Challenge of Large-State VD ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 35

Provided by: xun2

Category:

more less

Transcript and Presenter's Notes

Title: Design of a HighThroughput LowPower IS95 Viterbi Decoder

1
Design of a High-Throughput Low-Power IS95
Viterbi Decoder

Xun Liu Marios C. Papaefthymiou
Advanced Computer Architecture Laboratory
Electrical Engineering and Computer Science
Department
University of Michigan

2
(No Transcript)
3
(No Transcript)
4
IS95 Convolutional Encoding

Used in the reverse link of IS95 CDMA system
256 states (8 state registers)
Rate 1/3
Maximum Free Distance coding

5
Viterbi Decoding (VD)

VD is optimal for convolutional codes.
Maximum likelihood decoding scheme.
Minimum error for additive white Gaussian noise
channel.
VD procedure.
Construction of a complex graph called trellis.
Computation of the shortest path.

6

7
Challenge of Large-State VD Designs

High computational complexity.
VDs with hundreds of states require multiple Gops
throughput, when symbol transfer rates reach
Mbps.
Parallel processing.
High interconnect power dissipation.
Complex routing among the processors.

For large-state VDs, global data transfer and
interconnect issues must be considered carefully
8
Viterbi Decoder Designs
9
Presentation Outline

Viterbi decoding overview
Our contributions
Data transfer oriented hierarchical
inter-processor optimization
Intra-processor power optimization
Chip data

10
Encoding Example
11
Viterbi Decoding

12
(No Transcript)
13
(No Transcript)
14
VD Summary

Each decoded symbol requires a layer of similar
computations
2N edge weight computations (N of states).
N add-compare-select (ACS) operations.
Operations within each layer are independent.

15
Viterbi Decoder Architectures
Design space number of processors used
16
Viterbi Decoder Architectures
Design space number of processors used
Intermediate solutions
17
Key Issues

How many ACS processors?
Which ACS operations are executed in each
processor?
Which ACS operations can be executed
concurrently?
In what order are the operations executed?
Can processors be pipelined?

18
Q Which operations are executed in each ACS
processor?A Operation partitioning for global
data transfer reduction
19
Operation Partitioning Example
20
Operation Partitioning Results

Obtain solution by iterative bi-partitioning
(KL).
For 64 partitions, gt50 data transfers are
global.
Largest absolute reduction 4 to 32 partitions.

21
Q Which operations are executed
simultaneously?A Operation packing for global
bus minimization
22
Operation Packing Example
0
2
2
0
23
Operation Packing

Packing procedure for global bus minimization
One operation from each partition in each slice
Global data transfers within a slice done
simultaneously
Bus cost the number of ACS units connected
Our heuristic
Distribute global transfers evenly in all slices

24
Operation Packing Results

Comparison solution one bus between any two ACS
processors
Global buses reduction 31 on the average
Most effective range 8 to 32 partitions

25
Q In what order should operations be executed?
Q Can ACS units be pipelined? A
Non-forwarding scheduling
26
Non-forwarding Scheduling
27
Non-forwarding Scheduling Results

Greedy heuristic
Pick slice with the least dependencies first.
Iteratively pick the next slice such that the
upper bound of the non-forwarding pipeline depth
derived by the chosen slices is maximized.