TMS320C54x DSP processor - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

TMS320C54x DSP processor

Description:

All data are copy rights of their respective authors as listed in the references ... poly uses temporary register t for multiplicand x ; ... – PowerPoint PPT presentation

Number of Views:3651
Avg rating:3.0/5.0
Slides: 32
Provided by: eceU3
Category:

less

Transcript and Presenter's Notes

Title: TMS320C54x DSP processor


1
TMS320C54x DSPprocessor
Shahab adin Rahmanian
2
Outline
  • Introduction
  • Architecture
  • Applications
  • features
  • Instruction Set and addressing
  • FIR Filtering
  • Accelerating Polynomial Evaluation
  • Numerical Issues
  • Write code in C
  • Conclusion

3
Introduction
2
4
TMS320C54x
  • a fixed-point digital signal processor (DSP) in
    the TMS320 family.
  • Low power DSP 0.54 mW/MIP
  • Acceleration for FIR and LMS filtering, code book
    search, polynomial evaluation, Viterbi decoding
    ,Fast Fourier transform

4
5
Some Typical Applications
  • General-Purpose
  • Adaptive filtering
  • Digital filtering
  • Fast Fourier transforms
  • Control
  • Disk drive control
  • Laser printer control
  • Robotics control
  • Military
  • Missile guidance
  • Radar processing
  • Secure communication
  • Telecommunications
  • 1200- to 19200-bps modems
  • Adaptive equalizers
  • Cellular telephones
  • Echo cancellation
  • Video conferencing

6
Software Applications
  • Circular Buffers
  • Single-Instruction Repeat (RPT) Loops
  • Extended-Precision Arithmetic
  • Addition and Subtraction
  • Multiplication
  • Division
  • Square Root
  • Floating-Point Arithmetic
  • Application-Oriented Operations
  • Symmetric FIR Filters
  • Adaptive Filtering
  • Viterbi Algorithm for Channel Decoding
  • Fast Fourier Transforms

7
Some key features
  • CPU
  • Advanced multi bus architecture with three
    separate 16-bit data buses and one program bus
  • 40-bit arithmetic logic unit (ALU), including a
    40-bit barrel shifter and two independent
    40-bit accumulators
  • 17-bit 17-bit parallel multiplier coupled to a
    40-bit dedicated adder for non-pipelined
    single-cycle multiply/accumulate (MAC) operation
  • Memory
  • 192K words 16-bit maximum addressable memory
    space (64K words program, 64K words data, and 64K
    words I/O)
  • 28K words 16-bit single-access on-chip ROM
    with 8K words configurable as program or data
    memory (C541 only)

8
Some key features
  • On-chip peripherals
  • On-chip phase-locked loop (PLL) clock generator
    with internal oscillator or external clock source
  • Two full-duplexed serial ports to support 8- and
    16-bit transfers (C541only)
  • Time-division multiplexed (TDM) serial port
    (C542/C543 only)
  • One 16-bit timer
  • Speed 25/20-ns execution time for a single-cycle
    fixed-point instruction (40 MIPS/50 MIPS) with
    5-V power supply

9
C54x Addressing Modes
  • Immediate
  • Operand is part of the instruction
  • Absolute
  • Address of operand is part of the instruction
  • Register
  • Operand is specified in a register

ADD 0FFh
LD (LABEL), A
READA DATA(data read from address in
accumulator A)
10
C54x Addressing Modes
  • Direct
  • Address of operand is part of the instruction
    (added to implied memory page)
  • Indirect
  • Address of operand is stored in a register
  • Offset addressing
  • Register offset (ar1ar0)
  • Autoincrement/decrement
  • Bit reversed addressing
  • Circular addressing

ADD 010h,A
ADD AR1
ADD AR1(10)
ADD AR10
ADD AR1
ADD AR1B
ADD AR10B
11
C54X Instructions Set by Category
LogicalANDBITBITFCMPLCMPMORROLRORSFTASFT
CSFTLXOR
ArithmeticADDMACMASMPYNEGSUBZERO
ProgramControlBBCCALLCCIDLEINTRNOPRCRET
RPTRPTBRPTZTRAPXC
ApplicationSpecificABSABDSTDELAYEXPFIRSLMS
MAXMINNORMPOLYRNDSATSQDSTSQURSQURASQURS
DataManagementLDMARMV(D,K,M,P)ST
NotesCMPL complement MAR modify address
reg.CMPM compare memory MAS multiply and subtract
12
Block FIR Filtering
  • yn h0 xn h1 xn-1 ... hN-1
    xn-(n-1)
  • h stored as linear array of N elements (in prog.
    mem.)
  • x stored as circular array of N elements (in data
    mem.)

Addresses a4 h, a5 N samples of x, a6 input
buffer, a7 output buffer Modulo addressing
prevents need to reinitialize regs each sample
Moving filter coefficients from program to data
memory is not shownfirtask ld firDP,dp
initialize data page pointer stm frameSize-1,brc
compute 256 outputs rptbd firloop-1 stm N,bk
FIR circular buffer size ld ar6,a load
input value to accumulator b stl a,ar4
replace oldest sample with newest rptz a,(N-1)
zero accumulator a, do N taps mac ar40,ar5
0,a one tap, accumulate in a sth a,ar7
store ynfirloop ret
13
Accelerating Symmetric FIR Filtering
  • Coefficients in linear phase filters are either
    symmetric or anti-symmetric
  • Symmetric coefficients using 2 mults 3 adds
  • yn h0 xn h1 xn-1 h1 xn-2 h0
    xn-3 yn h0 (xn xn-3) h1 (xn-1
    xn-2)
  • Accelerated by FIRS (FIR Symmetric) instruction

x in twocircularbuffers
h inprogrammemory
14
Accelerating Symmetric FIR Filtering
  • Addresses a6 input buffer, a7 output
    buffer a4 array with xn-4, xn-3, xn-2,
    xn-1 for N 8 a5 array with xn-5,
    xn-6, xn-7, xn-8 for N 8 Modulo
    addressing prevents need to reinitialize regs
    each samplefirtask ld firDP,dp initialize
    data page pointer stm frameSize-1,brc compute
    256 outputs rptbd firloop-1 stm N/2,bk FIR
    circular buffer size ld ar6,b load input
    value to accumulator b mvdd ar4,a50 move
    old xn-N/2 to new xn-N/2-1 stl b,ar4
    replace oldest sample with newest add a40,a5
    0,a a xn xn-N/2-1 rptz b,(N/2-1)
    zero accumulator b, do N/2-1 taps firs ar40,a
    r50,coeffs b a hi, do next
    a mar a4(2) to load the next newest
    sample mar ar5 position for xn-N/2
    sample sth b,ar7firloop ret

15
Architecture - FIRS
16
Accelerating Polynomial Evaluation
  • Function approximation and spline interpolation
  • Fast polynomial evaluation (N coefficients)
  • y(x) c0 c1 x c2 x2 c3 x3 Expanded form
  • y(x) c0 x (c1 x (c2 x (c3))) Horners
    form
  • POLY reduces 2 N cycles using MACADD to N cycles

ar2 contains address of array c3 c2 c1 c0
poly uses temporary register t for multiplicand
x first two times poly instruction executes
gives 1. a c(3) x 0 c(3) b c2
2. a c(2) x c(3) b c1 ld
ar2,16,b b c3 ltlt 16 ld ar3,t t x
(ar3 contains addr of x) rptz a,3 a 0,
repeat next inst. 4 times poly ar2 a b
xa b c(i-1) ltlt 16 sth a,ar4 store
result (ar4 is addr of y)
17
Integer Multiplication
  • Integer multiplication yields products larger
    than the inputs, as can be seen in the example
    below, using single digit decimal values as
    inputs
  • Does the user store the lower (1) or upper (8)
    result?
  • Both must be kept, resulting in additional
    resources (two cycles ,words of code, and RAM
    locations) to complete the store.
  • Worse, how can the double-sized result be used
    recursively as an input in later calculations,
    given that the multiplier inputs an input in
    later calculations, given that the multiplier
    inputs are single-width?

18
Fractional Multiplication
  • Multiplication of fractions yields products that
    never exceed the range of a fraction, as can be
    seen in the example below, using single digit
    decimal fractions as inputs
  • Dont we still have a double sized result to
    store?
  • In this case, we can store just the upper result
    (.8)
  • This allows storage of result with fewer
    resources
  • Results may be used recursively
  • Has accuracy been lost by dropping the lower
    accumulator value?

19
Accuracy vs. Precision
  • Often the programmer wants to retain the fullest
    accuracy of a calculation, thus dropping the 16
    LSBs of the result in the previous example seems
    a bad choice.
  • Note though, the inputs how much accuracy do
    they offer?
  • The product offers double precision but its
    accuracy is based on the single-width inputs.
  • Thus, storing a single precision result is not
    only an efficient solution, but represents the
    limit of the accuracy of the result.
  • The accumulator is double-sized for two reasons
  • To allow for integer operations, which would
    possibly require the LSBs for the result.
  • So that sum-of-product operations will generate
    accumulative noise at the 32nd vs. the 16th bit.

20
Redundant Sign Bit
Multiplication of two signed numbers yields
product with two sign bits Extra sign bit
causes problems if stored to memory as
result Wastes space Creates off-size Q
Solution Fractional mode bit! When FRCT (mode
bit in ST1) is set, the multiplier output is
left-shifted by one For 16-bit C54x Q1 5Q1
5Q1 5
21
Accumulation
  • With fractions, we were able to guarantee that no
    multiplicative overflow could occur, ie FFltF.
  • For addition, this rule does not apply, ie
    FFgtF.
  • Therefore, we need additional measures to manage
    the possibility of overflow for accumulation. Two
    general methods apply
  • Guard Bits the C54x offers an 8-bit extension
    above the high accumulator to allow valid
    representation of the result of up to 256
    summations.
  • Non-gain Systems offer additional criteria that
    allow a simple solution for unlimited length
    summations.

22
Guard Bits and saturation
  • Guard Bits the C54x offers an 8-bit extension
    above the high accumulator to allow valid
    representation of the result of up to 256
    summations.
  • Saturation (SAT)
  • SAT instruction saturates value exceeding 32-bit
    range in the selected accumulator

SAT A
SAT B
23
Non-gain Systems
  • Many systems can be modeled to have no DC gain
  • Filters with low Q.
  • Any systems scaled by its maximum gain value.
  • Input values from A/D converters are
    automatically fractions, if the limits of the A/D
    are presumed to be /-1
  • Coefficient values can similarly bonded by
    making the largest value the scaling factor for
    all other values.
  • For these systems, it is known that the final
    value of the process is less than or equal to the
    input values.
  • The accumulator therefore can be allowed to
    temporarily
  • overflow, since the final result is known to be
    bonded /-1.
  • Allows maximum usage of selected A/D and D/A
    converters
  • D/A bits for gain are more expensive than using
    analog components

24
Division
  • The C54x does not have a single cycle 16-bit
    divide instruction
  • Divide is a rare function in DSP
  • Division hardware is expensive
  • The C54x does have a single cycle 1-bit divide
    instruction conditional subtract or SUBC
  • Preceded by RPT 15, a 16-bit divide is performed
  • Is much faster than without SUBC
  • The SUBC process operates only on unsigned
    operands, thus software must
  • Compare the signs of the input operands
  • If they are alike, plan a positive quotient
  • If they differ, plan to negate (NEG) the quotient
  • Strip the signs of the inputs
  • Perform the unsigned division
  • Attach the proper sign based on the comparison
    of the inputs

25
Division Routine
  • B numden (tells sign)
  • Strip sign of numerator
  • Strip sign of denominator
  • 16 iterations
  • 1-bit divide
  • If result needs to be negative
  • Invert sign
  • Store negative result

26
Rounding
  • Result of multiplication can be rounded for MPY,
  • and MAS operations. This is specified by
    appending the instruction with an R suffix.
  • Example MAC with rounding is MACR. Rounding
    consists of adding 215 to the result and then
    clearing the low accumulator.
  • In a long sum-of-products, only the last MAC
    operation should specify rounding
  • Rounding can also be achieved with a load
    operation

27
Sign Extension (SXM)
28
Write code in C
  • Inline Assembly
  • Allows direct access to assembly language from C
  • Useful for operating on components not used by C,
    ex
  • Note first column after leading quote is label
    field
  • Long operations should be written in ASM and
    called from C
  • main C file retains portability
  • yields more easily maintained structures
  • eliminates risk of interfering with registers in
    use by C


29
Accessing MMRs from C
  • Using pointers to access Memory-Mapped Registers
  • Create a pointer and set its value to the
    assigned memory address
  • Read and write to the register as any other
    pointer
  • Accessing I/O Ports from C
  • 1. create the port
  • 2. access the port

volatile unsigned int SPC_REG (volatile
unsigned int ) 0x0022
SPC_REGOxC8
ioport unsigned port8000 x port8000 port8000
y
30
Summary and Conclusion
  • C54x is a conventional digital signal processor
  • Separate data/program busses (3 reads 1
    write/cycle)
  • Extended precision accumulators
  • Single-cycle multiply-accumulate
  • Saturation and wraparound arithmetic
  • Bit-reversed and circular addressing modes
  • C54x has instructions to accelerate algorithms
  • Communications FIR LMS filtering, Viterbi
    decoding
  • Speech coding vector distances for code book
    search
  • Interpolation polynomial evaluation

31
References
  • 1 Texas instrument TMS320C54x DSP Design
    Workshop May 1997
  • 2 TMS320C54x Users guide
  • 3 www.ti.com
  • 4 SIGNAL AND IMAGE PROCESSING ON THE
    TMS320C54x DSP by Prof. Brian L. Evans
  • 5 TMS320C54x Assembly Language Tools
Write a Comment
User Comments (0)
About PowerShow.com