Title: Architectures for Wideband CDMA Software Radios
1Architectures for Wideband CDMA Software Radios
- CS 252 Final Project
- Rhett Davis Vandana Prabhu
- December 9, 1998
2Motivation for Software Radios
- General purpose architecture can support many
standards - Can we build them?
- multi-user detection
- adaptive interference cancellation
- low power / portable
Mitola, IEEE Comm. 95
3Project Goals
TI 'C54x implementation of LMS algorithm would
require 56 processors in parallel (35 with
C6x) Zhang et al., Allerton 98
- Are these numbers believable?
- How would a believable mapping of the algorithm
look on IRAM and Pleiades? Will they meet the
performance spec?
1.6M LMS loop iterations/sec
4Wireless Receiver System Overview
cos2p(1.96 GHz)t
RF input
I (50 MS/s)
RF Filter
LNA
Q (50 MS/s)
sin2p(1.96 GHz)t
Teuscher et al., ISCAS 98
5Digital Baseband Receiver
Adaptive Pilot Correlator
Adaptive Pilot Correlator
Data In
. . .
Acquisition Timing Recovery
Signal Update Block
Adaptive Data Correlator
Data Out
6Adaptive Pilot Correlator (LMS Algorithm)
Reg
S
S
gtgt
u
S
c
m
Reg
7The Search for Parallelism Data Dependency Graph
nth iteration
vector ops.
scalar ops.
vector ops.
n1th iteration
8VIRAM Simulation of Critical Path
- Assumptions
- all other operations for this loop have
completed - vectors of length 15
- input data already stored in memory
- suggested clock speed is 200 MHz
- max CPU speed is 300 MHz
- max Memory speed is 250 MHz
Critical Path
9Next Step Examine Outer Loop
10A Reconfigurable Solution?
- Architecture Overview
- Data intensive computation executed
- on a dataflow architecture
- ARM8 for overall control flow
- configuration of satellites(operation
connectivity) - Satellites are AGP, SRAM, MACP, ALUP, I/O Ports,
- reconfig. network
- Approach for mapping
- Extract the dataflow graph for the algorithm
- Parallelise the mapping to reduce latency
- meet throughput requirements
- Estimate timing based on configuration overhead
- kernel execution
AGP
AGP
Arm8
SRAM
SRAM
MACP
ALUP
11Kernel mapping for Pleiades
- Key aspects
- Broadcast
- Duplicating data streams to decrease latency
- Writeback mechanism
- for the X_ri loop
- 12 MACPs
- 14 ALUPs
- 10 MEMs
- 10 AGPs
X_ri
X_ii
Y_ii
Y_ri
Si
Zx_i1
Zx_r1
Zx_r2
Zx_i2
Zmf_r
Zmf_i
Si
Ys_ii
Ys_ri
Y_ii
Z_i
Z_r
Zx_r
Zx_i
C_i
C_r
Y_ri
?
E_r
E_i
Yx_ii
U_r
U_i
Yx_ri
X_ri
X ? _i2i
X? _i1i
X_ii
X_ri
X _ii
X ? _ii
X ? _ri
C_r
CONSTANT
AGP SRAM
ALUP
MACP
?
12Performance on Pleiades
- Architectural Assumptions
- Memory is updated with input data(yi) at 16MHz
- Reconfiguration time insignificant compared to
kernel execution time occurs very rarely - Timing Estimations
- for 0.25u process
- Throughput is dominated by the
- MACP satellite transport time
- on the network is bounded by throughput
- Formulated based on the data dependencies in the
kernel mapping
13Performance Analysis
- Critical path is the update of X_r limited by
MAC throughput - Available computation time for Zi is 625ns
- Meets the 1.6 MHz spec at 2.5 V
- At 1.2V computation times are
- 422 245 667ns for Zi
- 245 ns to update xi
- Hence increase Vdd to 2.5 to increase MACP
throughput
14Conclusions and Future Work
- Seems likely that LMS algorithm can be
implemented on a single VIRAM processor - Can first half of the algorithm be implemented on
VIRAM in less than 28 cycles/iteration? - Worthwhile to explore MIMD
- Pleiades architecture exploits the dataflow fully
- Architecture of satellites can be better tuned
for LMS algorithm to reduce hardware - Complete mapping/consequences of the rest of the
algorithm