A Programmable Coprocessor Architecture for Wireless Applications

About This Presentation

Title:

A Programmable Coprocessor Architecture for Wireless Applications

Description:

Software defined radio: implementing DSP algorithms in software rather than hardware ... Most DSP algorithms do not need 32-bit precision. Viterbi decoding ... – PowerPoint PPT presentation

Number of Views:122

Avg rating:3.0/5.0

Slides: 14

Provided by: lin66

Learn more at: https://cccp.eecs.umich.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Programmable Coprocessor Architecture for Wireless Applications

1
A Programmable Coprocessor Architecture for
Wireless Applications

Yuan Lin, Nadav Baron, Hyunseok Lee,
Scott Mahlke, Trevor Mudge
Advance Computer Architecture Lab
University of Michigan
Sept. 2004

2
Introduction

Growing need to support multiple wireless
protocols
Software defined radio implementing DSP
algorithms in software rather than hardware
ASIC high performance, low flexibility
Processor high flexibility, low performance
Objective achieve real time performance with
processor flexibility and programmability

3
Performance Requirements
UWB 200Mbps
Hiperlan2 36Mbps
802.11b 11Mbps
4
DSP Algorithms Characteristics

Streaming data
Short variable liveness
High data throughput
High data level parallelism
Low control flow overhead
Counted loops
Low data-dependent branches

5
Proposed Coprocessor Architecture MAPP

Stream Data
Macro pipeline architecture
No cache structure
High Data Level Parallelism
Vector architecture
Low Control Flow Overhead
No branch predictors
Programmability to support multiple protocols

6
MAPP Architectural Diagram
ARM Core
Instruction Cache
VPP Controller
Vector Processing Pipeline
Data Cache
PPU
PPU
PPU
7
PPU Architectural Diagram
Pipeline Processing Unit
Vector Register File
VPP Controller
Data Out
Out Queue
Vector ALU
Data In
In Queue
Internal Instruction Buffer
VPP Controller
8
Mapping DSP Algorithms Viterbi ACS
bm1
s1
s0
bm0
v0 0 4 8 8 2 8 0
4
4 8 4 8 2 4 8 2
v1
l l g e e g l g
mask
0 4 8 2
0
0 4 4 8 2 4
0 2
s
bm1
vadd v0, s0, bm0
S1
vadd v1, s1, bm1
cmp v0, v1
mux
bm0
S2
movele s, v1
moveg s, v2
S0
9
Increase Area/Power Efficiency

Data slice architecture
Most DSP algorithms do not need 32-bit precision
Viterbi decoding operates on 8 bits data
Filters may need 16 bit precisions
Partial processor execution
Statically determined code
Turn off architecture units not used
Energy saving, no area saving

10
Vector Cluster Diagram (4x8 bit data slice)
Register File
In Queue
Out Q.
ALU
Register File
In Queue
Out Q.
ALU
4x4 Local Interconnect Network
Register File
In Queue
Out Q.
ALU
Register File
In Queue
Out Q.
ALU
11
Performance Results
12
Simplistic Power Analysis