Implementing the Viterbi algorithm on programmable processors - PowerPoint PPT Presentation

About This Presentation

Title:

Implementing the Viterbi algorithm on programmable processors

Description:

Need for VSP architecture. Large amount of memory access. Traceback decoding ... VSP architecture. RICE UNIVERSITY. Branch Metric Calculation. RICE UNIVERSITY ... – PowerPoint PPT presentation

Number of Views:130

Avg rating:3.0/5.0

Slides: 43

Provided by: sridharr4

Category:

more less

Transcript and Presenter's Notes

Title: Implementing the Viterbi algorithm on programmable processors

1
Implementing the Viterbi algorithm on
programmable processors

Sridhar Rajagopal
Elec 696
sridhar_at_rice.edu

2
Motivation

Viterbi decoding - One of the major bottlenecks
in baseband processing PHY
Need for flexibility in the algorithm parameters
due to different protocols read programmable
No architecture developed yet to meet real-time
requirements of 3G systems.
2 - 8 Mbps range for wideband CDMA
100 Mbps range for wireless LAN

3
Today

Background
Advanced DSP architectures -- TI C6x 15
Viterbi algorithm basics 10
Viterbi on TI DSPs 10
A programmable processor specifically designed
for Viterbi 15

4
TI C6x architecture

VLIW Very Long Instruction Word arch.
Similar to a vector processor -- but
multiple instructions -gt multiple Func. Units
FUs are not all the same
32-bit architecture
8 functional units

5
(No Transcript)
6
8 VelociTI principles

Parallel fetch, decode and execute
Pipelined enough to make ADD critical path
Instructions based on RISC
Load - Store architecture
Orthogonal - Instruction Set and Reg. File
Determinism
Conditional Instructions
Instruction Packing

7
2 4 8 Functional Units

.M Multiplication unit
16 bit x 16 bit signed/ packed/
.L arithmetic Logic unit
Comparisons and logic operations
Saturation arithmetic and absolute value
.S Shifter unit
Bit manipulation (set, get, shift, rotate)
Branching, addition and packed addition
.D Data unit
Load/store to memory
Addition and pointer arithmetic

8
How powerful am I?

8 instructions per cycle
Max
6 adds per cycle
2 multiplies per cycle
2 load/stores per cycle
2 branches per cycle
Idea is you will be using instructions in these
ratios to get full FU utilization.

9
C6x DSP Core
10
C6x Datapath
11
C6x Resource Constraints

Instructions using the same FU
1 inst. / FU
Cross Paths
only 1 operand from other reg. file to (L,S,M)
Loads and stores
2 loads and stores from 2 different reg. files
Reads and writes
max 4-reads from the same register
No 2 writes to the same register )

12
Instruction Packing

Fetch Packet
Execute Packet
Avoid NOPs in the instruction code
Multi-cycle NOPs if absolutely necessary
LSB- p bit of instruction for packing

A B C ,D E, F, G H 8 instructions
instead of 32
13
Conditional Instructions

All instructions can be conditioned based on the
value in registers A1,A2,B0,B1,B2
Avoids branch latencies
If condition not met by end of first phase of
execution, results not written back to reg. file
Conditional loads/stores squashed before data
phase

14
C6x Pipeline

Fetch (if necessary) - 4 phases
Address Generate
Address Send
Access Ready Wait
Fetch Packet Receive
Decode - 2 phases
Instruction dispatch (if necessary)
Instruction decode
Execute - 10 phases
Most 1 phase

15
Some interesting instructions

Saturation
Bit-counting -- Image coding
Integer-comparison
Bit-manipulation
Seed generation for reciprocal instructions

16
Other details

64 KB internal program and data
DMA - peripherals to memory
Intrinsics in code for better programming
similar to using ViS in UltraSPARC
Software pipelining of loops
PERFORMANCE
5-10X
higher clock -- higher pipeline (2-4X)
Additional ALUs

17
Additional features in C64x

SIMD support
Communication-specific instructions
interleaving, galois field multiply
Bit count and rotate hardware
64 32-bit registers
Lower resource constraints
No more NOPs needed ever no boundaries

18
C64x DSP Core
19
Today

Background
Advanced DSP architectures -- TI C6x 15
Viterbi algorithm basics 10
Viterbi on TI DSPs 10
A programmable processor specifically designed
for Viterbi 15

20
Viterbi Decoding
k
n gt k
n
k
Decoder
Encoder
Rate k/n 1/2 Convolutional Encoder
21
Error Protection

States 2(FFs) 2(Constraint Length - 1)
Cannot go from any state to any state

22
Trellis for decoding
23
Trellis for an input sequence
24
Error detection

Branch metric Distance between received
symbol pair and possible symbol pairs
Path metric Accumulated error metric

25
Error-correction
26
Stages in Viterbi Decoding

Calculate Branch metrics for all states every
stage
Update Path metrics for all states every stage
At the end, Traceback the trellis to get the
decoded bits

27
Computations

Branch metrics
Hamming distance (XOR) and Count 1s
Euclidean distance squared distance
Path metrics
Add Branch metrics to existing path metrics
Compare for minimum and Select minimum
Survivor Traceback
Linked list /Pointer chasing
Memory Intensive / Sequential Operations

28
Today

Background
Advanced DSP architectures -- TI C6x 15
Viterbi algorithm basics 10
Viterbi on TI DSPs 10
A programmable processor specifically designed
for Viterbi 15

29
Viterbi support in different processors

C54x
Special hardware accelerator
ACS unit with 2 ACC and split ALU
Viterbi butterfly (2 ACS) in 4 cycles
C62x
nothing special
C6416
Viterbi coprocessor
K 5-9,Rate 1/2,1/3,1/4

30
Viterbi Coprocessor in C6416
31
Viterbi Coprocessor in C6416

SM, SD and HD memory not accessible to DSP

32
Today

Background
Advanced DSP architectures -- TI C6x 15
Viterbi algorithm basics 10
Viterbi on TI DSPs 10
A programmable processor specifically designed
for Viterbi 15

33
Need for VSP architecture

Large amount of memory access
Traceback decoding
Not efficient on a GPP
Program instructions in a GPP is of a higher
order than complexity of the algorithm

34
VSP architecture
35
Branch Metric Calculation
36
Path Metric Calculation
37
Traceback Unit
38
Traceback with survivor updates
Start Filling the Trellis
5Constraint Length
Start Traceback
Update Survivor Path for most recent symbol
Symbol Decoded
39
Survivor Path Updates
40
Circular updates
41
Software Programming

Small but specialized instruction set
LOAD, ACS
Shorter execution time
All 3 subprocessors programmed independently
10 ns, (100 MHz) in 1990 to get 1.5 Mbps

42
Conclusions

Viterbi algorithm important for implementation in
a programmable communication receiver
Approaches have been as co-processor support to
DSPs or specialized processors.
We are yet to design programmable processors that
meet real-time requirements for 100 Mbps
applications.

Write a Comment

User Comments (0)