Implementing the Viterbi algorithm on programmable processors - PowerPoint PPT Presentation

About This Presentation
Title:

Implementing the Viterbi algorithm on programmable processors

Description:

Need for VSP architecture. Large amount of memory access. Traceback decoding ... VSP architecture. RICE UNIVERSITY. Branch Metric Calculation. RICE UNIVERSITY ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 43
Provided by: sridharr4
Category:

less

Transcript and Presenter's Notes

Title: Implementing the Viterbi algorithm on programmable processors


1
Implementing the Viterbi algorithm on
programmable processors
  • Sridhar Rajagopal
  • Elec 696
  • sridhar_at_rice.edu

2
Motivation
  • Viterbi decoding - One of the major bottlenecks
    in baseband processing PHY
  • Need for flexibility in the algorithm parameters
    due to different protocols read programmable
  • No architecture developed yet to meet real-time
    requirements of 3G systems.
  • 2 - 8 Mbps range for wideband CDMA
  • 100 Mbps range for wireless LAN

3
Today
  • Background
  • Advanced DSP architectures -- TI C6x 15
  • Viterbi algorithm basics 10
  • Viterbi on TI DSPs 10
  • A programmable processor specifically designed
    for Viterbi 15

4
TI C6x architecture
  • VLIW Very Long Instruction Word arch.
  • Similar to a vector processor -- but
  • multiple instructions -gt multiple Func. Units
  • FUs are not all the same
  • 32-bit architecture
  • 8 functional units

5
(No Transcript)
6
8 VelociTI principles
  • Parallel fetch, decode and execute
  • Pipelined enough to make ADD critical path
  • Instructions based on RISC
  • Load - Store architecture
  • Orthogonal - Instruction Set and Reg. File
  • Determinism
  • Conditional Instructions
  • Instruction Packing

7
2 4 8 Functional Units
  • .M Multiplication unit
  • 16 bit x 16 bit signed/ packed/
  • .L arithmetic Logic unit
  • Comparisons and logic operations
  • Saturation arithmetic and absolute value
  • .S Shifter unit
  • Bit manipulation (set, get, shift, rotate)
  • Branching, addition and packed addition
  • .D Data unit
  • Load/store to memory
  • Addition and pointer arithmetic

8
How powerful am I?
  • 8 instructions per cycle
  • Max
  • 6 adds per cycle
  • 2 multiplies per cycle
  • 2 load/stores per cycle
  • 2 branches per cycle
  • Idea is you will be using instructions in these
    ratios to get full FU utilization.

9
C6x DSP Core
10
C6x Datapath
11
C6x Resource Constraints
  • Instructions using the same FU
  • 1 inst. / FU
  • Cross Paths
  • only 1 operand from other reg. file to (L,S,M)
  • Loads and stores
  • 2 loads and stores from 2 different reg. files
  • Reads and writes
  • max 4-reads from the same register
  • No 2 writes to the same register )

12
Instruction Packing
  • Fetch Packet
  • Execute Packet
  • Avoid NOPs in the instruction code
  • Multi-cycle NOPs if absolutely necessary
  • LSB- p bit of instruction for packing

A B C ,D E, F, G H 8 instructions
instead of 32
13
Conditional Instructions
  • All instructions can be conditioned based on the
    value in registers A1,A2,B0,B1,B2
  • Avoids branch latencies
  • If condition not met by end of first phase of
    execution, results not written back to reg. file
  • Conditional loads/stores squashed before data
    phase

14
C6x Pipeline
  • Fetch (if necessary) - 4 phases
  • Address Generate
  • Address Send
  • Access Ready Wait
  • Fetch Packet Receive
  • Decode - 2 phases
  • Instruction dispatch (if necessary)
  • Instruction decode
  • Execute - 10 phases
  • Most 1 phase

15
Some interesting instructions
  • Saturation
  • Bit-counting -- Image coding
  • Integer-comparison
  • Bit-manipulation
  • Seed generation for reciprocal instructions

16
Other details
  • 64 KB internal program and data
  • DMA - peripherals to memory
  • Intrinsics in code for better programming
  • similar to using ViS in UltraSPARC
  • Software pipelining of loops
  • PERFORMANCE
  • 5-10X
  • higher clock -- higher pipeline (2-4X)
  • Additional ALUs

17
Additional features in C64x
  • SIMD support
  • Communication-specific instructions
  • interleaving, galois field multiply
  • Bit count and rotate hardware
  • 64 32-bit registers
  • Lower resource constraints
  • No more NOPs needed ever no boundaries

18
C64x DSP Core
19
Today
  • Background
  • Advanced DSP architectures -- TI C6x 15
  • Viterbi algorithm basics 10
  • Viterbi on TI DSPs 10
  • A programmable processor specifically designed
    for Viterbi 15

20
Viterbi Decoding
k
n gt k
n
k
Decoder
Encoder
Rate k/n 1/2 Convolutional Encoder
21
Error Protection
  • States 2(FFs) 2(Constraint Length - 1)
  • Cannot go from any state to any state

22
Trellis for decoding
23
Trellis for an input sequence
24
Error detection
  • Branch metric Distance between received
    symbol pair and possible symbol pairs
  • Path metric Accumulated error metric

25
Error-correction
26
Stages in Viterbi Decoding
  • Calculate Branch metrics for all states every
    stage
  • Update Path metrics for all states every stage
  • At the end, Traceback the trellis to get the
    decoded bits

27
Computations
  • Branch metrics
  • Hamming distance (XOR) and Count 1s
  • Euclidean distance squared distance
  • Path metrics
  • Add Branch metrics to existing path metrics
  • Compare for minimum and Select minimum
  • Survivor Traceback
  • Linked list /Pointer chasing
  • Memory Intensive / Sequential Operations

28
Today
  • Background
  • Advanced DSP architectures -- TI C6x 15
  • Viterbi algorithm basics 10
  • Viterbi on TI DSPs 10
  • A programmable processor specifically designed
    for Viterbi 15

29
Viterbi support in different processors
  • C54x
  • Special hardware accelerator
  • ACS unit with 2 ACC and split ALU
  • Viterbi butterfly (2 ACS) in 4 cycles
  • C62x
  • nothing special
  • C6416
  • Viterbi coprocessor
  • K 5-9,Rate 1/2,1/3,1/4

30
Viterbi Coprocessor in C6416
31
Viterbi Coprocessor in C6416
  • SM, SD and HD memory not accessible to DSP

32
Today
  • Background
  • Advanced DSP architectures -- TI C6x 15
  • Viterbi algorithm basics 10
  • Viterbi on TI DSPs 10
  • A programmable processor specifically designed
    for Viterbi 15

33
Need for VSP architecture
  • Large amount of memory access
  • Traceback decoding
  • Not efficient on a GPP
  • Program instructions in a GPP is of a higher
    order than complexity of the algorithm

34
VSP architecture
35
Branch Metric Calculation
36
Path Metric Calculation
37
Traceback Unit
38
Traceback with survivor updates
Start Filling the Trellis
5Constraint Length
Start Traceback
Update Survivor Path for most recent symbol
Symbol Decoded
39
Survivor Path Updates
40
Circular updates
41
Software Programming
  • Small but specialized instruction set
  • LOAD, ACS
  • Shorter execution time
  • All 3 subprocessors programmed independently
  • 10 ns, (100 MHz) in 1990 to get 1.5 Mbps

42
Conclusions
  • Viterbi algorithm important for implementation in
    a programmable communication receiver
  • Approaches have been as co-processor support to
    DSPs or specialized processors.
  • We are yet to design programmable processors that
    meet real-time requirements for 100 Mbps
    applications.
Write a Comment
User Comments (0)
About PowerShow.com