A Programmable Coprocessor Architecture for Wireless Applications - PowerPoint PPT Presentation

About This Presentation
Title:

A Programmable Coprocessor Architecture for Wireless Applications

Description:

Software defined radio: implementing DSP algorithms in software rather than hardware ... Most DSP algorithms do not need 32-bit precision. Viterbi decoding ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 14
Provided by: lin66
Category:

less

Transcript and Presenter's Notes

Title: A Programmable Coprocessor Architecture for Wireless Applications


1
A Programmable Coprocessor Architecture for
Wireless Applications
  • Yuan Lin, Nadav Baron, Hyunseok Lee,
  • Scott Mahlke, Trevor Mudge
  • Advance Computer Architecture Lab
  • University of Michigan
  • Sept. 2004

2
Introduction
  • Growing need to support multiple wireless
    protocols
  • Software defined radio implementing DSP
    algorithms in software rather than hardware
  • ASIC high performance, low flexibility
  • Processor high flexibility, low performance
  • Objective achieve real time performance with
    processor flexibility and programmability

3
Performance Requirements
UWB 200Mbps
Hiperlan2 36Mbps
802.11b 11Mbps
4
DSP Algorithms Characteristics
  • Streaming data
  • Short variable liveness
  • High data throughput
  • High data level parallelism
  • Low control flow overhead
  • Counted loops
  • Low data-dependent branches

5
Proposed Coprocessor Architecture MAPP
  • Stream Data
  • Macro pipeline architecture
  • No cache structure
  • High Data Level Parallelism
  • Vector architecture
  • Low Control Flow Overhead
  • No branch predictors
  • Programmability to support multiple protocols

6
MAPP Architectural Diagram
ARM Core
Instruction Cache
VPP Controller
Vector Processing Pipeline
Data Cache
PPU
PPU
PPU
7
PPU Architectural Diagram
Pipeline Processing Unit
Vector Register File
VPP Controller
Data Out
Out Queue
Vector ALU
Data In
In Queue
Internal Instruction Buffer
VPP Controller
8
Mapping DSP Algorithms Viterbi ACS
bm1
s1
s0
bm0
v0 0 4 8 8 2 8 0
4
4 8 4 8 2 4 8 2
v1
l l g e e g l g
mask
0 4 8 2
0
0 4 4 8 2 4
0 2
s
bm1
vadd v0, s0, bm0
S1
vadd v1, s1, bm1
cmp v0, v1
mux
bm0
S2
movele s, v1
moveg s, v2
S0
9
Increase Area/Power Efficiency
  • Data slice architecture
  • Most DSP algorithms do not need 32-bit precision
  • Viterbi decoding operates on 8 bits data
  • Filters may need 16 bit precisions
  • Partial processor execution
  • Statically determined code
  • Turn off architecture units not used
  • Energy saving, no area saving

10
Vector Cluster Diagram (4x8 bit data slice)
Register File
In Queue
Out Q.
ALU
Register File
In Queue
Out Q.
ALU
4x4 Local Interconnect Network
Register File
In Queue
Out Q.
ALU
Register File
In Queue
Out Q.
ALU
11
Performance Results
12
Simplistic Power Analysis
  • Based on ARM9 data in 0.13u
  • Viterbi Decoder (K7) 0.75W 1W
  • 64x4 8 bit ALU 240mW
  • 12KB Mem 310mW
  • Clock 200mW
  • Others 250mW
  • ASIC implementations 7.65mW 0.7W (with
    different throughputs)

13
Conclusion Future Work
  • Programmable coprocessor architecture
  • Can support multiple protocols
  • Achieves real-time computational requirements
  • Reasonable power consumptions
  • Future work
  • Realistic power model simulation
  • Implement complete protocols
  • Algorithm behavior studies
  • Shrink processor area
Write a Comment
User Comments (0)
About PowerShow.com