Outline DSP Processors and Hardware - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Outline DSP Processors and Hardware

Description:

... bit word length; 56 bit accumulator. A favourite with audio. Example ... CLR A, X:(R0) ,X0 ,Y:(R4) ,Y0 ;Accumulator A=0, setup X0 and Y0 registers for first use ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 27
Provided by: sec199
Category:

less

Transcript and Presenter's Notes

Title: Outline DSP Processors and Hardware


1
Outline DSP Processors and Hardware
  • Week 1
  • Overview of DSP Processors
  • First, second and third generation
  • Week 2
  • Implementation FIR and IIR filters
  • Case study using 56000 series
  • Week 3
  • Finite word length effects

2
Resources
  • http//www.tech.plym.ac.uk/spmc/elec327/home.html
  • Examples of code
  • Instructions
  • Student Portal
  • Coming soon!

3
Why use a DSP - economics
YES
NO
  • Low power requirements
  • Low real estate (licensed core)
  • Very fast and repetitive arithmetic (up to
    8000MIPS)
  • High volume low cost
  • Dedicated DSP instruction set
  • MAC(D) Circ.Buff. bit-rev
  • Real-time interrupt driven software
  • Instructions dedicated to specific applications
  • Audio Digital Comms Filtering FFT
  • Large memory requirements
  • Rapid application development low TTM
  • Prototype design
  • General purpose computing
  • GUI database gaming
  • Cooling large PSU available
  • Require RTOS facilities
  • Networking queues pipes semaphores etc..

4
floating point(full IEEE floating point)
fixed point(INTEGER arithmetic only)
  • Shorter development time
  • Easy to perform complex operations
  • Translates to high level language more easily
  • Dynamic range is very high
  • Design ambition is much higher
  • Low Power High Speed
  • Lower Cost suits high volume
  • Lower silicon real-estate
  • High precision arithmetic (with careful design)
  • Good for specific apps.

PROS
  • Much longer development time
  • Dynamic range problems
  • Some operations are very inefficient (divide)
  • Difficult to perform non-standard DSP
  • Design ambition is reduced
  • Higher power
  • Higher cost
  • Larger silicon real-estate
  • Potentially lower precision arithmetic

CONS
5
Motorola 56000 series
  • Key features
  • Generation 2 DSP
  • Dual Harvard Architecture
  • Separate Program and Data Memory
  • Data and coefficients fetched in 1 clock cycle
  • Separate X and Y data Memory
  • Custom DSP instructions
  • Supports circular addressing
  • Zero overhead DO loops / Repeats
  • MAC with shift
  • 24 bit word length 56 bit accumulator
  • A favourite with audio

6
Example of a new generation DSPTMS320C64x
fixed point DSP
  • 1nS instruction time (1GHz)
  • SIMD / VLIW
  • 8 x 32bit instructions / cycle
  • 8 x independent function units
  • Six ALUs
  • Single 32 / Dual 16 / Quad 8
  • Two multipliers
  • Quad 16x16
  • 8 of 8x8
  • Up to 8000 MIPS (peak)

7
Revision the FIR filter
  • Filter LengthN
  • h(k),k0..N-1, are the filter coefficients
  • x(n) are the input data samples
  • y(n) are the output data samples

acc0.0//Set accumulator to zero x(0)
new_sample //new sample into buffer for
(unsigned k0 kltN k) //MAC acc acc
x(k)h(k) for (unsigned kN-1 kgt0 k) //Shift
data x(k) x(k-1)
Challenge can you write a more efficient
version with just one for-loop?
8
Illustrative Problem 1/2
  • n1 taps0.25 0 0
    y(n)0.125
  • n2 taps0.5 0.25 0
    y(n)0.4375
  • n3 taps0.25 0.5 0.25
    y(n)0.5625
  • n4 taps-0.25 0.25 0.5
    y(n)0.1875
  • n5 taps0.75 -0.25 0.25
    y(n)0.25
  • See MATLAB code handout fir1.m

9
Illustrative Problem-2/2 FIR filter output
10
Implementation on DSP Processors
  • Special instructions
  • Multiply Accumulate
  • MAC multiply accumulate (with shift)
  • MACD MAC move data
  • MACR MAC round result
  • Zero-overhead Repeat
  • REP
  • Modulo Arithmetic
  • Circular Addressing

11
56000 FIR Code example(See notes)
  • MOVE XDATA,R0 Address register R0 address of
    data samples
  • MOVE COEFF,R4 Address register R4 address of
    coefficients
  • MOVE N-1,M0 Address modifier register M0
    buffer/modulo size
  • MOVEP XINPUT,X(R0)
  • Move (Peripheral) data into X memory at address
    pointed to by R0
  • CLR A, X(R0),X0 ,Y(R4),Y0 Accumulator A0,
    setup X0 and Y0 registers for first use
  • REP N-1 Repeat next instruction N-1 times
  • MAC X0,Y0,A X(R0),X0 Y(R4),Y0
  • R4 address of coefficients
  • MACR X0,Y0,A (R0)-
  • R0 is decremented to position for next run, R4
    is automatically correct. We could now jump to
    the MOVEP instruction if we so wished.
  • There is an error in the notes for 2004

12
Quantise coefficients
  • 2s compliment arithmetic
  • For 16 bit coefficients we call this 1.15 fixed
    point arithmetic
  • 0..65536 (0..FFFFh)
  • Total range is 0..(216)-1
  • 1..(215)-1 are positive values
  • 215-(216)-1 are the negative values (msb is
    the sign)
  • Example
  • 0.123 gt (215)0.123 4030
  • Task Convert this back to fractional arithmetic
  • Given that 655360, -0.123 65536(0)-403061506
  • Task, calculate 0.5 - 0.123 using 16 and 24bit
    fixed point arithmetic - compare

13
Store in Y memory
  • org y0 Start address
  • COEFF dc 0.5 24-bit filter
    coefficients
  • dc 0.75
  • dc 0.25

Challenge Convert the above coefficients to
24-bit fixed point values
14
Multiply and Accumulate
  • MAC lt24-bit reggt, lt24-bit reggt,destA,B
  • Can be X with X, X with Y or Y with Y
  • Example A A0.1230.456
  • Start with A0
  • Convert 0.123 and 0.456 to 1.23 format
  • Multiply the result gt 2.46 format
  • Shift left gt1.47 format
  • Add to result
  • If finished, round off result and store.
  • Convert back to fractional arithmetic to check
    result
  • TASK
  • Repeat this twice check result.

15
56K Repeat Instruction
  • RPT N
  • Repeat the next instruction N times
  • N is a 16 bit value that is copied into the loop
    counter register (LC)
  • This cannot be interrupted
  • Fetch of next instruction is performed once
  • Cannot repeat itself of any type of jump
    instruction.

16
Modulo Arithmetic
  • Comes from the remainder of integer division
  • 0/4 0, 04 0
  • 1/4 0, 14 1
  • 2/4 0, 24 2
  • 3/4 0, 34 3
  • 4/4 1, 44 0
  • 5/4 1, 54 1
  • 6/4 1, 64 2
  • 7/4 1, 74 3
  • 8/4 2, 84 0
  • Similar logic is used for the Address Registers
    Rn so they wrap around to the start address
  • TASK What is 508
  • 50 / 8 6.25
  • 6 8 48, difference 2 modulus

17
IIR Filters
  • Advantages
  • Efficiency
  • Delay
  • Disadvantages
  • Requires high precision arithmetic
  • Round-off Error sensitivity
  • Phase distortion
  • More complex to implement
  • Overflows

18
IIR Filters fixed point
  • Structure is important
  • Noise
  • Stability
  • Efficiency
  • Cascaded solutions are most common
  • Sources of noise
  • Summationsgtround off error
  • Error feedback
  • Higher precision arithmetic
  • Not always effective
  • Complexity increases
  • See handout for self learning tutorial

19
IIR Structures
  • Each IIR should be of no more than 2nd order!
  • Cascaded 2nd order sections
  • Ordering is important to reduce round-off error
  • Parallel 2nd order sections
  • Partial fraction expansion ordering not an
    issue
  • More storage and computation
  • (care is needed with repeated poles)
  • Canonic 2nd order
  • Less memory required, simple to implement
  • More noise sources
  • Direct 2nd order
  • More difficult to implement
  • Less noise sources generally a better choice

20
Hardware constraints
  • Memory
  • Typically between 16Kb and 128Kb internal memory
  • Word-length
  • Precision of arithmetic
  • Overheads for extended precision
  • Speed
  • Number of clock cycles to execute
  • E.g. A simple FIR filter program takes 12 N-1
    cycles to complete, where N is the filter length
    139. The clock speed is 10MHz.
  • What is the maximum sampling rate?
  • If the sampling rate is 100kHz, what is the
    maximum filter length N?
  • Delay in actual filter
  • Remember! Delay of a signal is not just due to
    clock cycles there is inherent delay in the FIR
    / IIR filter itself (N-1)/2. What will be the
    total delay in the example above?

21
Finite word length effects 1
  • Coefficient Quantization
  • Coefficients will be quantised to N bits, Q
    1.(N-1)
  • This will effectively move the poles and zeros to
    preferred positions
  • Could go unstable!
  • Deviates from desired response
  • Coefficients gt 1 must be scaled

22
Finite word length effects 2
  • Over-flow error
  • Result of summations over-flowing
  • FIR and IIR can suffer from this
  • IIR must never overflow as it will possibly go
    unstable!
  • FIR can overflow if it then underflows also
    SAT instructions exist
  • Controlled with normalization (scaling) or with
    large accumulators

23
Finite word length effects 3
  • Round off error
  • IIR only
  • Introduced with each SUM
  • Seriously affects performance of IIR
  • Tackled with either
  • High precision arithmetic
  • Error feedback (ESS)

24
Error feedback - ESS
  • Critical to the success of fixed point IIR
    filters
  • (Although a bit beyond the scope of the course!)
  • Round-off error is fed back into the filter
  • Dramatically improves performance

25
Drills
  • DSP Overhead (delay and cycles)
  • Fixed point arithmetic
  • Coefficient quantisation
  • FIR (MAC and shift)
  • IIR
  • Round off errors

26
Drill 1 Overhead calculation
  • MOVE XDATA,R0 Address register R0 address of
    data samples
  • MOVE COEFF,R4 Address register R4 address of
    coefficients
  • MOVE N-1,M0 Address modifier register M0
    buffer/modulo size
  • MOVEP XINPUT,X(R0)
  • Move (Peripheral) data into X memory at address
    pointed to by R0
  • CLR A, X(R0),X0 ,Y(R4),Y0 Accumulator A0,
    setup X0 and Y0 registers for first use
  • REP N-1 Repeat next instruction N-1 times
  • MAC X0,Y0,A X(R0),X0 Y(R4),Y0
  • R4 address of coefficients
  • MACR X0,Y0,A (R0)-
  • This code is then repeated. There is some
    additional overhear for servicing interrupt
    routines, storing and writing results, serial
    ports etc (not shown), so assume this code takes
    a total of 45(N-1) instructions to complete.
  • Sketch a diagram and describe how the circular
    buffer works
  • If the clock frequency Fclk20MHz, and N129,
  • what is the maximum sampling rate
  • What is the real-time delay through the system?
  • Draw a diagram to illustrate your answer
  • What are the possible sources of error? Can this
    go unstable?
Write a Comment
User Comments (0)
About PowerShow.com