Outline DSP Processors and Hardware - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Outline DSP Processors and Hardware

Description:

... bit word length; 56 bit accumulator. A favourite with audio. Example ... CLR A, X:(R0) ,X0 ,Y:(R4) ,Y0 ;Accumulator A=0, setup X0 and Y0 registers for first use ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 27

Provided by: sec199

Category:

more less

Transcript and Presenter's Notes

Title: Outline DSP Processors and Hardware

1
Outline DSP Processors and Hardware

Week 1
Overview of DSP Processors
First, second and third generation
Week 2
Implementation FIR and IIR filters
Case study using 56000 series
Week 3
Finite word length effects

2
Resources

http//www.tech.plym.ac.uk/spmc/elec327/home.html
Examples of code
Instructions
Student Portal
Coming soon!

3
Why use a DSP - economics
YES
NO

Low power requirements
Low real estate (licensed core)
Very fast and repetitive arithmetic (up to
8000MIPS)
High volume low cost
Dedicated DSP instruction set
MAC(D) Circ.Buff. bit-rev
Real-time interrupt driven software
Instructions dedicated to specific applications
Audio Digital Comms Filtering FFT

Large memory requirements
Rapid application development low TTM
Prototype design
General purpose computing
GUI database gaming
Cooling large PSU available
Require RTOS facilities
Networking queues pipes semaphores etc..

4
floating point(full IEEE floating point)
fixed point(INTEGER arithmetic only)

Shorter development time
Easy to perform complex operations
Translates to high level language more easily
Dynamic range is very high
Design ambition is much higher

Low Power High Speed
Lower Cost suits high volume
Lower silicon real-estate
High precision arithmetic (with careful design)
Good for specific apps.

PROS

Much longer development time
Dynamic range problems
Some operations are very inefficient (divide)
Difficult to perform non-standard DSP
Design ambition is reduced

Higher power
Higher cost
Larger silicon real-estate
Potentially lower precision arithmetic

CONS
5
Motorola 56000 series

Key features
Generation 2 DSP
Dual Harvard Architecture
Separate Program and Data Memory
Data and coefficients fetched in 1 clock cycle
Separate X and Y data Memory
Custom DSP instructions
Supports circular addressing
Zero overhead DO loops / Repeats
MAC with shift
24 bit word length 56 bit accumulator
A favourite with audio

6
Example of a new generation DSPTMS320C64x
fixed point DSP

1nS instruction time (1GHz)
SIMD / VLIW
8 x 32bit instructions / cycle
8 x independent function units
Six ALUs
Single 32 / Dual 16 / Quad 8
Two multipliers
Quad 16x16
8 of 8x8
Up to 8000 MIPS (peak)

7
Revision the FIR filter

Filter LengthN
h(k),k0..N-1, are the filter coefficients
x(n) are the input data samples
y(n) are the output data samples

acc0.0//Set accumulator to zero x(0)
new_sample //new sample into buffer for
(unsigned k0 kltN k) //MAC acc acc
x(k)h(k) for (unsigned kN-1 kgt0 k) //Shift
data x(k) x(k-1)
Challenge can you write a more efficient
version with just one for-loop?
8
Illustrative Problem 1/2

n1 taps0.25 0 0
y(n)0.125
n2 taps0.5 0.25 0
y(n)0.4375
n3 taps0.25 0.5 0.25
y(n)0.5625
n4 taps-0.25 0.25 0.5
y(n)0.1875
n5 taps0.75 -0.25 0.25
y(n)0.25
See MATLAB code handout fir1.m

9
Illustrative Problem-2/2 FIR filter output
10
Implementation on DSP Processors

Special instructions
Multiply Accumulate
MAC multiply accumulate (with shift)
MACD MAC move data
MACR MAC round result
Zero-overhead Repeat
REP
Modulo Arithmetic
Circular Addressing

11
56000 FIR Code example(See notes)

MOVE XDATA,R0 Address register R0 address of
data samples
MOVE COEFF,R4 Address register R4 address of
coefficients
MOVE N-1,M0 Address modifier register M0
buffer/modulo size
MOVEP XINPUT,X(R0)
Move (Peripheral) data into X memory at address
pointed to by R0
CLR A, X(R0),X0 ,Y(R4),Y0 Accumulator A0,
setup X0 and Y0 registers for first use
REP N-1 Repeat next instruction N-1 times
MAC X0,Y0,A X(R0),X0 Y(R4),Y0
R4 address of coefficients
MACR X0,Y0,A (R0)-
R0 is decremented to position for next run, R4
is automatically correct. We could now jump to
the MOVEP instruction if we so wished.
There is an error in the notes for 2004

12
Quantise coefficients

2s compliment arithmetic
For 16 bit coefficients we call this 1.15 fixed
point arithmetic
0..65536 (0..FFFFh)
Total range is 0..(216)-1
1..(215)-1 are positive values
215-(216)-1 are the negative values (msb is
the sign)
Example
0.123 gt (215)0.123 4030
Task Convert this back to fractional arithmetic
Given that 655360, -0.123 65536(0)-403061506
Task, calculate 0.5 - 0.123 using 16 and 24bit
fixed point arithmetic - compare

13
Store in Y memory

org y0 Start address
COEFF dc 0.5 24-bit filter
coefficients
dc 0.75
dc 0.25

Challenge Convert the above coefficients to
24-bit fixed point values
14
Multiply and Accumulate

MAC lt24-bit reggt, lt24-bit reggt,destA,B
Can be X with X, X with Y or Y with Y
Example A A0.1230.456
Start with A0
Convert 0.123 and 0.456 to 1.23 format
Multiply the result gt 2.46 format
Shift left gt1.47 format
Add to result
If finished, round off result and store.
Convert back to fractional arithmetic to check
result
TASK
Repeat this twice check result.

15
56K Repeat Instruction

RPT N
Repeat the next instruction N times
N is a 16 bit value that is copied into the loop
counter register (LC)
This cannot be interrupted
Fetch of next instruction is performed once
Cannot repeat itself of any type of jump
instruction.

16
Modulo Arithmetic

Comes from the remainder of integer division
0/4 0, 04 0
1/4 0, 14 1
2/4 0, 24 2
3/4 0, 34 3
4/4 1, 44 0
5/4 1, 54 1
6/4 1, 64 2
7/4 1, 74 3
8/4 2, 84 0
Similar logic is used for the Address Registers
Rn so they wrap around to the start address
TASK What is 508
50 / 8 6.25
6 8 48, difference 2 modulus

17
IIR Filters

Advantages
Efficiency
Delay
Disadvantages
Requires high precision arithmetic
Round-off Error sensitivity
Phase distortion
More complex to implement
Overflows

18
IIR Filters fixed point

Structure is important
Noise
Stability
Efficiency
Cascaded solutions are most common
Sources of noise
Summationsgtround off error
Error feedback
Higher precision arithmetic
Not always effective
Complexity increases
See handout for self learning tutorial

19
IIR Structures

Each IIR should be of no more than 2nd order!
Cascaded 2nd order sections
Ordering is important to reduce round-off error
Parallel 2nd order sections
Partial fraction expansion ordering not an
issue
More storage and computation
(care is needed with repeated poles)
Canonic 2nd order
Less memory required, simple to implement
More noise sources
Direct 2nd order
More difficult to implement
Less noise sources generally a better choice

20
Hardware constraints

Memory
Typically between 16Kb and 128Kb internal memory
Word-length
Precision of arithmetic
Overheads for extended precision
Speed
Number of clock cycles to execute
E.g. A simple FIR filter program takes 12 N-1
cycles to complete, where N is the filter length
139. The clock speed is 10MHz.
What is the maximum sampling rate?
If the sampling rate is 100kHz, what is the
maximum filter length N?
Delay in actual filter
Remember! Delay of a signal is not just due to
clock cycles there is inherent delay in the FIR
/ IIR filter itself (N-1)/2. What will be the
total delay in the example above?

21
Finite word length effects 1

Coefficient Quantization
Coefficients will be quantised to N bits, Q
1.(N-1)
This will effectively move the poles and zeros to
preferred positions
Could go unstable!
Deviates from desired response
Coefficients gt 1 must be scaled

22
Finite word length effects 2

Over-flow error
Result of summations over-flowing
FIR and IIR can suffer from this
IIR must never overflow as it will possibly go
unstable!
FIR can overflow if it then underflows also
SAT instructions exist
Controlled with normalization (scaling) or with
large accumulators

23
Finite word length effects 3

Round off error
IIR only
Introduced with each SUM
Seriously affects performance of IIR
Tackled with either
High precision arithmetic
Error feedback (ESS)

24
Error feedback - ESS

Critical to the success of fixed point IIR
filters
(Although a bit beyond the scope of the course!)
Round-off error is fed back into the filter
Dramatically improves performance

25
Drills

DSP Overhead (delay and cycles)
Fixed point arithmetic
Coefficient quantisation
FIR (MAC and shift)
IIR
Round off errors

26
Drill 1 Overhead calculation

MOVE XDATA,R0 Address register R0 address of
data samples
MOVE COEFF,R4 Address register R4 address of
coefficients
MOVE N-1,M0 Address modifier register M0
buffer/modulo size
MOVEP XINPUT,X(R0)
Move (Peripheral) data into X memory at address
pointed to by R0
CLR A, X(R0),X0 ,Y(R4),Y0 Accumulator A0,
setup X0 and Y0 registers for first use
REP N-1 Repeat next instruction N-1 times
MAC X0,Y0,A X(R0),X0 Y(R4),Y0
R4 address of coefficients
MACR X0,Y0,A (R0)-
This code is then repeated. There is some
additional overhear for servicing interrupt
routines, storing and writing results, serial
ports etc (not shown), so assume this code takes
a total of 45(N-1) instructions to complete.
Sketch a diagram and describe how the circular
buffer works
If the clock frequency Fclk20MHz, and N129,
what is the maximum sampling rate
What is the real-time delay through the system?
Draw a diagram to illustrate your answer
What are the possible sources of error? Can this
go unstable?