TMS320C54x DSP processor - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

TMS320C54x DSP processor

Description:

All data are copy rights of their respective authors as listed in the references ... poly uses temporary register t for multiplicand x ; ... – PowerPoint PPT presentation

Number of Views:3654

Avg rating:3.0/5.0

Slides: 32

Provided by: eceU3

Category:

more less

Transcript and Presenter's Notes

Title: TMS320C54x DSP processor

1
TMS320C54x DSPprocessor
Shahab adin Rahmanian
2
Outline

Introduction
Architecture
Applications
features
Instruction Set and addressing
FIR Filtering
Accelerating Polynomial Evaluation
Numerical Issues
Write code in C
Conclusion

3
Introduction
2
4
TMS320C54x

a fixed-point digital signal processor (DSP) in
the TMS320 family.
Low power DSP 0.54 mW/MIP
Acceleration for FIR and LMS filtering, code book
search, polynomial evaluation, Viterbi decoding
,Fast Fourier transform

4
5
Some Typical Applications

General-Purpose
Adaptive filtering
Digital filtering
Fast Fourier transforms
Control
Disk drive control
Laser printer control
Robotics control
Military
Missile guidance
Radar processing
Secure communication
Telecommunications
1200- to 19200-bps modems
Adaptive equalizers
Cellular telephones
Echo cancellation
Video conferencing

6
Software Applications

Circular Buffers
Single-Instruction Repeat (RPT) Loops
Extended-Precision Arithmetic
Addition and Subtraction
Multiplication
Division
Square Root
Floating-Point Arithmetic
Application-Oriented Operations
Symmetric FIR Filters
Adaptive Filtering
Viterbi Algorithm for Channel Decoding
Fast Fourier Transforms

7
Some key features

CPU
Advanced multi bus architecture with three
separate 16-bit data buses and one program bus
40-bit arithmetic logic unit (ALU), including a
40-bit barrel shifter and two independent
40-bit accumulators
17-bit 17-bit parallel multiplier coupled to a
40-bit dedicated adder for non-pipelined
single-cycle multiply/accumulate (MAC) operation
Memory
192K words 16-bit maximum addressable memory
space (64K words program, 64K words data, and 64K
words I/O)
28K words 16-bit single-access on-chip ROM
with 8K words configurable as program or data
memory (C541 only)

8
Some key features

On-chip peripherals
On-chip phase-locked loop (PLL) clock generator
with internal oscillator or external clock source
Two full-duplexed serial ports to support 8- and
16-bit transfers (C541only)
Time-division multiplexed (TDM) serial port
(C542/C543 only)
One 16-bit timer
Speed 25/20-ns execution time for a single-cycle
fixed-point instruction (40 MIPS/50 MIPS) with
5-V power supply

9
C54x Addressing Modes

Immediate
Operand is part of the instruction
Absolute
Address of operand is part of the instruction
Register
Operand is specified in a register

ADD 0FFh
LD (LABEL), A
READA DATA(data read from address in
accumulator A)
10
C54x Addressing Modes

Direct
Address of operand is part of the instruction
(added to implied memory page)
Indirect
Address of operand is stored in a register
Offset addressing
Register offset (ar1ar0)
Autoincrement/decrement
Bit reversed addressing
Circular addressing

ADD 010h,A
ADD AR1
ADD AR1(10)
ADD AR10
ADD AR1
ADD AR1B
ADD AR10B
11
C54X Instructions Set by Category
LogicalANDBITBITFCMPLCMPMORROLRORSFTASFT
CSFTLXOR
ArithmeticADDMACMASMPYNEGSUBZERO
ProgramControlBBCCALLCCIDLEINTRNOPRCRET
RPTRPTBRPTZTRAPXC
ApplicationSpecificABSABDSTDELAYEXPFIRSLMS
MAXMINNORMPOLYRNDSATSQDSTSQURSQURASQURS
DataManagementLDMARMV(D,K,M,P)ST
NotesCMPL complement MAR modify address
reg.CMPM compare memory MAS multiply and subtract
12
Block FIR Filtering

yn h0 xn h1 xn-1 ... hN-1
xn-(n-1)
h stored as linear array of N elements (in prog.
mem.)
x stored as circular array of N elements (in data
mem.)

Addresses a4 h, a5 N samples of x, a6 input
buffer, a7 output buffer Modulo addressing
prevents need to reinitialize regs each sample
Moving filter coefficients from program to data
memory is not shownfirtask ld firDP,dp
initialize data page pointer stm frameSize-1,brc
compute 256 outputs rptbd firloop-1 stm N,bk
FIR circular buffer size ld ar6,a load
input value to accumulator b stl a,ar4
replace oldest sample with newest rptz a,(N-1)
zero accumulator a, do N taps mac ar40,ar5
0,a one tap, accumulate in a sth a,ar7
store ynfirloop ret
13
Accelerating Symmetric FIR Filtering

Coefficients in linear phase filters are either
symmetric or anti-symmetric
Symmetric coefficients using 2 mults 3 adds
yn h0 xn h1 xn-1 h1 xn-2 h0
xn-3 yn h0 (xn xn-3) h1 (xn-1
xn-2)
Accelerated by FIRS (FIR Symmetric) instruction

x in twocircularbuffers
h inprogrammemory
14
Accelerating Symmetric FIR Filtering

Addresses a6 input buffer, a7 output
buffer a4 array with xn-4, xn-3, xn-2,
xn-1 for N 8 a5 array with xn-5,
xn-6, xn-7, xn-8 for N 8 Modulo
addressing prevents need to reinitialize regs
each samplefirtask ld firDP,dp initialize
data page pointer stm frameSize-1,brc compute
256 outputs rptbd firloop-1 stm N/2,bk FIR
circular buffer size ld ar6,b load input
value to accumulator b mvdd ar4,a50 move
old xn-N/2 to new xn-N/2-1 stl b,ar4
replace oldest sample with newest add a40,a5
0,a a xn xn-N/2-1 rptz b,(N/2-1)
zero accumulator b, do N/2-1 taps firs ar40,a
r50,coeffs b a hi, do next
a mar a4(2) to load the next newest
sample mar ar5 position for xn-N/2
sample sth b,ar7firloop ret

15
Architecture - FIRS
16
Accelerating Polynomial Evaluation

Function approximation and spline interpolation
Fast polynomial evaluation (N coefficients)
y(x) c0 c1 x c2 x2 c3 x3 Expanded form
y(x) c0 x (c1 x (c2 x (c3))) Horners
form
POLY reduces 2 N cycles using MACADD to N cycles

ar2 contains address of array c3 c2 c1 c0
poly uses temporary register t for multiplicand
x first two times poly instruction executes
gives 1. a c(3) x 0 c(3) b c2
2. a c(2) x c(3) b c1 ld
ar2,16,b b c3 ltlt 16 ld ar3,t t x
(ar3 contains addr of x) rptz a,3 a 0,
repeat next inst. 4 times poly ar2 a b
xa b c(i-1) ltlt 16 sth a,ar4 store
result (ar4 is addr of y)
17
Integer Multiplication

Integer multiplication yields products larger
than the inputs, as can be seen in the example
below, using single digit decimal values as
inputs

Does the user store the lower (1) or upper (8)
result?
Both must be kept, resulting in additional
resources (two cycles ,words of code, and RAM
locations) to complete the store.
Worse, how can the double-sized result be used
recursively as an input in later calculations,
given that the multiplier inputs an input in
later calculations, given that the multiplier
inputs are single-width?

18
Fractional Multiplication

Multiplication of fractions yields products that
never exceed the range of a fraction, as can be
seen in the example below, using single digit
decimal fractions as inputs

Dont we still have a double sized result to
store?
In this case, we can store just the upper result
(.8)
This allows storage of result with fewer
resources
Results may be used recursively
Has accuracy been lost by dropping the lower
accumulator value?

19
Accuracy vs. Precision

Often the programmer wants to retain the fullest
accuracy of a calculation, thus dropping the 16
LSBs of the result in the previous example seems
a bad choice.
Note though, the inputs how much accuracy do
they offer?
The product offers double precision but its
accuracy is based on the single-width inputs.
Thus, storing a single precision result is not
only an efficient solution, but represents the
limit of the accuracy of the result.
The accumulator is double-sized for two reasons
To allow for integer operations, which would
possibly require the LSBs for the result.
So that sum-of-product operations will generate
accumulative noise at the 32nd vs. the 16th bit.

20
Redundant Sign Bit
Multiplication of two signed numbers yields
product with two sign bits Extra sign bit
causes problems if stored to memory as
result Wastes space Creates off-size Q
Solution Fractional mode bit! When FRCT (mode
bit in ST1) is set, the multiplier output is
left-shifted by one For 16-bit C54x Q1 5Q1
5Q1 5
21
Accumulation

With fractions, we were able to guarantee that no
multiplicative overflow could occur, ie FFltF.
For addition, this rule does not apply, ie
FFgtF.
Therefore, we need additional measures to manage
the possibility of overflow for accumulation. Two
general methods apply
Guard Bits the C54x offers an 8-bit extension
above the high accumulator to allow valid
representation of the result of up to 256
summations.
Non-gain Systems offer additional criteria that
allow a simple solution for unlimited length
summations.

22
Guard Bits and saturation

Guard Bits the C54x offers an 8-bit extension
above the high accumulator to allow valid
representation of the result of up to 256
summations.

Saturation (SAT)
SAT instruction saturates value exceeding 32-bit
range in the selected accumulator

SAT A
SAT B
23
Non-gain Systems

Many systems can be modeled to have no DC gain
Filters with low Q.
Any systems scaled by its maximum gain value.
Input values from A/D converters are
automatically fractions, if the limits of the A/D
are presumed to be /-1
Coefficient values can similarly bonded by
making the largest value the scaling factor for
all other values.
For these systems, it is known that the final
value of the process is less than or equal to the
input values.
The accumulator therefore can be allowed to
temporarily
overflow, since the final result is known to be
bonded /-1.
Allows maximum usage of selected A/D and D/A
converters
D/A bits for gain are more expensive than using
analog components

24
Division

The C54x does not have a single cycle 16-bit
divide instruction
Divide is a rare function in DSP
Division hardware is expensive
The C54x does have a single cycle 1-bit divide
instruction conditional subtract or SUBC
Preceded by RPT 15, a 16-bit divide is performed
Is much faster than without SUBC
The SUBC process operates only on unsigned
operands, thus software must
Compare the signs of the input operands
If they are alike, plan a positive quotient
If they differ, plan to negate (NEG) the quotient
Strip the signs of the inputs
Perform the unsigned division
Attach the proper sign based on the comparison
of the inputs

25
Division Routine

B numden (tells sign)
Strip sign of numerator
Strip sign of denominator
16 iterations
1-bit divide
If result needs to be negative
Invert sign
Store negative result

26
Rounding

Result of multiplication can be rounded for MPY,
and MAS operations. This is specified by
appending the instruction with an R suffix.
Example MAC with rounding is MACR. Rounding
consists of adding 215 to the result and then
clearing the low accumulator.
In a long sum-of-products, only the last MAC
operation should specify rounding