Blackfin Compute Unit

About This Presentation

Title:

Blackfin Compute Unit

Description:

A comparison of DSP Architectures BlackFin ADSP-BFXXX Compute Unit Based on a ENEL619.23 white paper prepared by Darrell Anklovitch – PowerPoint PPT presentation

Number of Views:110

Avg rating:3.0/5.0

Slides: 21

Provided by: DarrellAn2

Category:

more less

Transcript and Presenter's Notes

Title: Blackfin Compute Unit

1
A comparison of DSP Architectures BlackFin
ADSP-BFXXX Compute Unit

Based on a ENEL619.23 white paperprepared by
Darrell Anklovitch

2
Overview

Architecture Overview
Register Map
ALU features and sample instructions
Multiplier features and sample instructions
Shifter features and sample instructions

3
References

ADSP-BF535 Blackfin Processor Hardware Reference,
Rev 2, April 2004, Analog Devices. Section 2
Blackfin Processor Instruction Set Reference, Rev
2, May 2003, Analog Devices. Sections 8 10,
14 15
A number of the figures in this presentation are
based on figures found in the ADSP-BF535 Blackfin
Processor Hardware Reference.

4
ADSP-2106x Core Architecture
5
Register File and COMPUTE Units

Key issues
5 data paths FROM COMPUTE units
5 data paths TO COMPUTE units
Highly parallel operations UNDER THE RIGHT
CONDITIONS

6
BF533 Memory Accesses
Under the right conditions -- 4 memory accesses
at same time 64 bit Instruction Fetch, 2x32 bit
Data Loads, 32 bit Data Store PLUS up to 2 ALU(32
bit) and 2 MAC(16 bit) operations at the same
time PLUS background DMA activity
7
Compute Unit Architecture
2 Multipliers
Register File
1 set of Video ALUs
1 Shifter
2 ALUs
8
Register File

DATA REGISTER SYNTAX
R0, R1 etc refer to 32 bit registers
R0.L refers to the low 16 bits of the R0 32 bit
reg
R0.H refers to the high 16 bits of the R0
register
ACCUMULATOR SYNTAX
A0.L gt low 16 bits
A0.H gt next 16 bits
A0.W gt least significant 32 bit word
A0.X gt MS 8 bit extension

SHARC 16 32-bit data registers, integer and
floatThere is a pair of SHARC accumulator
registers too
9
ALU Data Flow
2 x 32 bit paths to dual Multiplier/ALU units
2 x 32 bit paths back to register file
10
Sample instructions
BlackfinR0 R1 R2R0.L R1.L R2.H R0 R1 - R2 Means R0.L R1.L R2.Lin parallel withR0.H R1.H R2.H SHARCR0 R1 R2 Closest R0 R1 R2, R4 R1 R2 68KMOVE.L R2, R0ADD.L R1, R0 MOVE.W R2, R0ADD.W R1, R0MOVE.L R2, R0ASR.L 16, R0MOVE.L R1, R3ASR.L 16, R3ADD.W R3, R0ASL.L 16, R0MOVE.W R2, R0ADD.W R1, R0
11
ALU Features
Single 16 bit OPS
31
Rm
Rp
Rn
Dual 16 bit Cross
Single 32 bit OPS
31
Rm
Rp
Rn
12
ALU Sample Instructions
Single 16 bit ops
Dual 16 bit ops
Single 32 bit ops
Does not work in parallel
Must have this option
Operator order is important must come before -

A B registers must stay on the same side of the
for both
Instructions
For dual and quad 16 bit operations the (CO)
option causes the
destination registers to cross

13
Multiply Data Flow
2 x 32 bit paths to dual Multiplier/ALU units
Multiplier share the same operand/result buses as
the ALU
2 x 40 bit accumulator
2 x 32 bit paths back to register file
14
Multiply Features

Multiplies are signed fractional by default
Signed fractional multiply result is
automatically left
shifted 1 bit.
Signed fractional multiply ! signed integer
multiply
Rounding available on fractional number
multiplies and
special option of integer number multiplies

15
Rounding
2 cases
Rounding adds 0x8000 to the 32 bit multiplier
result or accumulator value before extracting a
16 bit value to the destination register
16
Fractional Multiply
Fractional Multiply ! Integer Multiply
Fractional Multiply ! Integer Multiply

When extracting a 16 bit fractional value from an
accumulator
the high 16 bits is taken
Where in the destination register it goes depends
on which
accumulator is being extracted from

17
Integer Multiply
Fractional Multiply ! Integer Multiply

When extracting a 16 bit integer value from an
accumulator
the low 16 bits is taken.
Where in the destination register the 16 bit
value goes depends
on which accumulator is being extracted from

18
Multiply Sample Instructions
16 bit extraction from ACC 0
16 bit extraction from ACC 1
Multi-issue MAC Instruction Examples
32 bit extraction
A1 R1.H R2.L , A0 R1.L R2.L R3.H (A1
R1.H R2.L) , R3.L (A0 R1.L R2.L) Any
combination of .H and .L in the 2 operands is
allowed R3 (A1 R1.HR2.L), R2 (A0 R1.L
R2.L) Where destination registers must be
paired as follows R1,0, R3,2,
R5,4 and R7,6 R3.H (A1 R1.H R2.L), A0
R1.L R2.L
19
Shifter Sample Instructions
20
Parallel Instruction Examples

In general there are 16 and 32 bit versions of
the arithmetic instructions
Most of the 32 bit instructions can be executed
in parallel with 2 x 16 bit memory/index
operations
Exceptions are DIVS, DIVQ and MULTIPLY with 32
bit operands
means parallel
Examples
A1R2.LR1.L,A0R2.HR1.HR2.HWI2
I3R3\
R2R2R4, R4R2--R4 I0M0R1I0