Survey of Digital Signal Processors - PowerPoint PPT Presentation

About This Presentation
Title:

Survey of Digital Signal Processors

Description:

Survey of Digital Signal Processors Michael Warner ECD: VLSI Communication Systems Agenda Industry Trends DSP Architecture DSP Micro-Architecture DSP Systems Agenda ... – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 28
Provided by: ECS65
Category:

less

Transcript and Presenter's Notes

Title: Survey of Digital Signal Processors


1
Survey of Digital Signal Processors
  • Michael Warner
  • ECD VLSI Communication Systems

2
Agenda
  • Industry Trends
  • DSP Architecture
  • DSP Micro-Architecture
  • DSP Systems

3
Agenda
  • Industry Trends
  • DSP Architecture
  • DSP Micro-Architecture
  • DSP Systems

4
Moores Law Drives Processor Development
But what if energy-delay had to be reduced every
generation by an order of magnitude?
Doubling the number of transistors every 18-24 at
same price point drives significant product
opportunities especially if you have little
regard for power
5
Genes Law DrivesDSP Development
Genes Law will have its challenges to hold the
line!
6
Whats Driving Genes Law?
7
DSP Design Constraints
DEVICE CAPABILITIES
8
Agenda
  • Industry Trends
  • DSP Architecture
  • DSP Micro-Architecture
  • DSP Systems

9
What Makes a DSP a DSP?
  • Hard Real-Time
  • Single-Cycle MAC
  • Multiple Execution Units
  • Custom Data Path
  • High Bandwidth (Flat) Memory Sub-Systems
  • Dual Access Memory
  • Efficient Zero-Overhead Looping
  • Short Pipeline
  • High Bandwidth I/O
  • Specialized Instruction Sets
  • Low Latency Interrupts
  • Sophisticated DMA
  • No Speculation
  • RTOS
  • Soft Real-Time (Application Processor)
  • Single-Cycle MAC
  • Multiple Execution Units
  • Custom Data Path
  • L1D, L1I, L2 with MMU
  • Speculative Fetching and Branching
  • Virtual Memory
  • Protected Memory
  • Virtual Machines
  • Semaphores
  • Context Save and Restore
  • Threading SMT, IMT
  • Efficient Zero-Overhead Looping
  • Short Pipeline
  • High Bandwidth I/O
  • Specialized Instruction Sets
  • Low Latency Interrupts
  • Sophisticated DMA
  • O/S

10
Single Cycle MAC
  • MACs Typically Determine DSP Performance and
    Pipeline Length (EX)
  • Most DSPs Have 2-8 MAC Units
  • MACs Typically Operate in Both a Scalar and
    Vector Mode

11
Multiple Instruction Units
  • VLIW Architectures Driving ILP
  • Typically Instruction Units
  • M-Unit - MAC
  • S-Unit - Shift
  • L-Unit - ALU
  • D-Unit Load/Store
  • Industry Has Converged on a ILP of 8

Registers B0 - B15
Registers A0 - A15
2X
1X
D2
M1
D1
L 1
S1
M2
L2
S2
D
S1
S2
D
S1
S2
D
S1
S2
S1
S2
DL
SL
SL
D
DL
S2
S1
D
S2
D
DL
SL
SL
D
DL
S2
S1
S1
S2
D
S1
DDATA_I2 (load data)
DDATA_I1 (load data)
12
High Bandwidth Memory Sub-Systems
  • Multiple Load-Store Units Required to Feed Data
    Path
  • Tightly Coupled Memory is Typically Dual Ported
  • Harvard Architecture is Heavily Banked

PC
CNTL
ARs
P
MUXES
D
MUX
INTERNAL MEMORY
EXTERNAL MEMORY
C
E
CentralArithmeticLogic Unit
MAC
ALU
SHIFTER
B
A
13
Specialized Instruction Sets
  • Base RISC ISA Plus CISC ISA Driven by End
    Application
  • MAC
  • SAD
  • LMS
  • FIRS
  • Viterbi
  • Support For Both Scalar and Vector Instructions
  • Support For 8, 16 and 32-Bit Instructions
  • Instructions are Highly Orthogonal

14
Scalar (55x) vs VLIW (64x)
  • Scalar DSPs Tend to be More CISC Like
  • Hurts Compiler Performance
  • Improves Energy-Delay
  • Improves Code Density
  • Limits Top End Performance
  • VLIW DSPs Tend to be More RISC Like
  • RISC GP Regs Orthogonality Makes For a Good C
    Compiler
  • Assembler Code Is Challenging
  • RISC ISA Allows for Higher Frequencies
  • Load-Store Hurts Energy-Delay

15
TMS320C54x
16
TMS320C54x Protected Pipeline
CYCLES
P1
X6
Prefetch Calculate address of instruction
Fetch Collect instruction Decode Interpret
instruction Access Collect address of
operand Read Collect operand Execute Perform
operation
Fully loaded pipeline
Note Protected Pipeline Limits
Micro-Architectural Flexibility and Performance
17
TMS320C6xx
C6xx CPU Core
Program Fetch
Control Registers
Instruction Dispatch
Instruction Decode
Control Logic
Data Path 1
Data Path 2
A Register File
B Register File
Test
Emulation
D1
M1
S1
L1
L2
S2
M2
D2
Interrupts
ArithmeticLogicUnit
Auxiliary LogicUnit
MultiplierUnit
18
TMS320C6xx Exposed Pipeline
Fetch
Decode
Execute
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
  • Fetch
  • PG Program Address Generate
  • PS Program Address Send
  • PW Program Access Ready Wait
  • PR Program Fetch Packet Receive
  • Decode
  • DP Instruction Dispatch
  • DC Instruction Decode
  • Execute
  • E1 - E5 Execute 1 through Execute 5

Execute Packet 1
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 2
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 3
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 4
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 5
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 6
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Execute Packet 7
PG
PS
PW
PR
DP
DC
E1
E2
E3
E4
E5
Note Exposed Pipeline Adds Risk to Programming
Model
19
Agenda
  • Industry Trends
  • DSP Architecture
  • DSP Micro-Architecture
  • DSP Systems

20
Micro-Architectural Challenges
  • Accessing (Flat) On Chip Memory At Speed Within
    2-3 cycles
  • Feeding Multiple Functional Units From a Single
    Register File
  • Running 600Mhz with a 7-9 Stage Pipeline
  • Linking Multiple Functional Units with Result
    Forwarding
  • Implementing CISC Data-path to Meet Area and
    Performance Goals
  • Achieving ARM Like Code Density

21
Agenda
  • Industry Trends
  • DSP Architecture
  • DSP Micro-Architecture
  • DSP Systems

22
DSP Systems
23
VIOP Platform
  • TNETV3010 Features
  • 6 C55x DSP _at_ 300 MHz
  • Shared Instruction Memory
  • Broadcast DMA
  • 24M Bits of On Chip SRAM

24
OMAP Platform
  • OMAP2420 Features
  • ARM 1136 _at_ 330 MHz, VFP (Vector Floating Point),
    32K/32K I/Dcache
  • DSP _at_ 220 MHz
  • 2D/3D graphics accelerator
  • IVA supports still images to gt4 Mpixels, 30 fps
    VGA video decode
  • Output to TV for gaming and video playback
  • Encryption hardware for DRM and security

Imaging VideoAccelerator(IVA)
2D/3DGraphics Accelerator
ARM11 VFP
TMS320C55x DSP
L3 Interconnect
LCD I/FVideoOut
Camera I/F
MemoryController
Internal SRAM
Peripherals
L4 Interconnect
Security
OMAP2420
25
IBM Cell Architecture
  • Design Features
  • Multi-Core Architecture
  • Based on the Power Architecture
  • Code compatibility
  • Coherent and cooperative off-load processing
  • Enhanced SIMD architecture
  • Power efficiency improved
  • Absolute timers allow "hard real-time data
    processing
  • Good estimation of execution time is possible
  • Big-endian memory
  • Support Apple, but not Intel
  • Isolation mechanism for secure code execution

26
FlexIO
27
DSP Architecture
  • SPE (synergistic Processing Element)
  • Dual issue, 128-bit 4-way SIMD
  • Vector Processing
  • 4 Integer Units 4 FP Units
  • 8-,16-,32-bit Integer 32-,64-bit FP
  • 128x128-bit Registers
  • 256KB Local-Store Memory (specially designed)
  • Caches are not used
  • Data Instruction in LS
Write a Comment
User Comments (0)
About PowerShow.com