Title: Blackfin Compute Unit
1A comparison of DSP Architectures BlackFin
ADSP-BFXXX Compute Unit
- Based on a ENEL619.23 white paperprepared by
Darrell Anklovitch
2Overview
- Architecture Overview
- Register Map
- ALU features and sample instructions
- Multiplier features and sample instructions
- Shifter features and sample instructions
3References
- ADSP-BF535 Blackfin Processor Hardware Reference,
Rev 2, April 2004, Analog Devices. Section 2 - Blackfin Processor Instruction Set Reference, Rev
2, May 2003, Analog Devices. Sections 8 10,
14 15 - A number of the figures in this presentation are
based on figures found in the ADSP-BF535 Blackfin
Processor Hardware Reference.
4ADSP-2106x Core Architecture
5Register File and COMPUTE Units
- Key issues
- 5 data paths FROM COMPUTE units
- 5 data paths TO COMPUTE units
- Highly parallel operations UNDER THE RIGHT
CONDITIONS
6BF533 Memory Accesses
Under the right conditions -- 4 memory accesses
at same time 64 bit Instruction Fetch, 2x32 bit
Data Loads, 32 bit Data Store PLUS up to 2 ALU(32
bit) and 2 MAC(16 bit) operations at the same
time PLUS background DMA activity
7Compute Unit Architecture
2 Multipliers
Register File
1 set of Video ALUs
1 Shifter
2 ALUs
8Register File
- DATA REGISTER SYNTAX
- R0, R1 etc refer to 32 bit registers
- R0.L refers to the low 16 bits of the R0 32 bit
reg - R0.H refers to the high 16 bits of the R0
register - ACCUMULATOR SYNTAX
- A0.L gt low 16 bits
- A0.H gt next 16 bits
- A0.W gt least significant 32 bit word
- A0.X gt MS 8 bit extension
SHARC 16 32-bit data registers, integer and
floatThere is a pair of SHARC accumulator
registers too
9ALU Data Flow
2 x 32 bit paths to dual Multiplier/ALU units
2 x 32 bit paths back to register file
10Sample instructions
BlackfinR0 R1 R2R0.L R1.L R2.H R0 R1 - R2 Means R0.L R1.L R2.Lin parallel withR0.H R1.H R2.H SHARCR0 R1 R2 Closest R0 R1 R2, R4 R1 R2 68KMOVE.L R2, R0ADD.L R1, R0 MOVE.W R2, R0ADD.W R1, R0MOVE.L R2, R0ASR.L 16, R0MOVE.L R1, R3ASR.L 16, R3ADD.W R3, R0ASL.L 16, R0MOVE.W R2, R0ADD.W R1, R0
11ALU Features
Single 16 bit OPS
31
Rm
Rp
Rn
Dual 16 bit Cross
Single 32 bit OPS
31
Rm
Rp
Rn
12ALU Sample Instructions
Single 16 bit ops
Dual 16 bit ops
Single 32 bit ops
Does not work in parallel
Must have this option
Operator order is important must come before -
- A B registers must stay on the same side of the
for both - Instructions
- For dual and quad 16 bit operations the (CO)
option causes the - destination registers to cross
13Multiply Data Flow
2 x 32 bit paths to dual Multiplier/ALU units
Multiplier share the same operand/result buses as
the ALU
2 x 40 bit accumulator
2 x 32 bit paths back to register file
14Multiply Features
- Multiplies are signed fractional by default
- Signed fractional multiply result is
automatically left - shifted 1 bit.
- Signed fractional multiply ! signed integer
multiply - Rounding available on fractional number
multiplies and - special option of integer number multiplies
15Rounding
2 cases
Rounding adds 0x8000 to the 32 bit multiplier
result or accumulator value before extracting a
16 bit value to the destination register
16Fractional Multiply
Fractional Multiply ! Integer Multiply
Fractional Multiply ! Integer Multiply
- When extracting a 16 bit fractional value from an
accumulator - the high 16 bits is taken
- Where in the destination register it goes depends
on which - accumulator is being extracted from
17Integer Multiply
Fractional Multiply ! Integer Multiply
- When extracting a 16 bit integer value from an
accumulator - the low 16 bits is taken.
- Where in the destination register the 16 bit
value goes depends - on which accumulator is being extracted from
18Multiply Sample Instructions
16 bit extraction from ACC 0
16 bit extraction from ACC 1
Multi-issue MAC Instruction Examples
32 bit extraction
A1 R1.H R2.L , A0 R1.L R2.L R3.H (A1
R1.H R2.L) , R3.L (A0 R1.L R2.L) Any
combination of .H and .L in the 2 operands is
allowed R3 (A1 R1.HR2.L), R2 (A0 R1.L
R2.L) Where destination registers must be
paired as follows R1,0, R3,2,
R5,4 and R7,6 R3.H (A1 R1.H R2.L), A0
R1.L R2.L
19Shifter Sample Instructions
20Parallel Instruction Examples
- In general there are 16 and 32 bit versions of
the arithmetic instructions - Most of the 32 bit instructions can be executed
in parallel with 2 x 16 bit memory/index
operations - Exceptions are DIVS, DIVQ and MULTIPLY with 32
bit operands - means parallel
- Examples
- A1R2.LR1.L,A0R2.HR1.HR2.HWI2
I3R3\ - R2R2R4, R4R2--R4 I0M0R1I0