Title: Implementing Multiuser Channel Estimation and Detection for W-CDMA
1Implementing Multiuser Channel Estimation and
Detection for W-CDMA
- Sridhar Rajagopal, Srikrishna Bhashyam,
- Joseph R. Cavallaro and Behnaam Aazhang
- Rice University
- sridhar,skrishna,cavallar,aaz_at_rice.edu
This work is supported by Nokia, Texas
Instruments, Texas Advanced Technology Program
and NSF
2Organization
- Joint Estimation Detection
- An Implementation-Friendly Scheme
- Simulations
- Architectural Features
- Task Partitioning
- Area-Time Tradeoffs
- Conclusions
- Future Work
3Base-Station with MUD
4Joint Estimation Detection
- Jointly estimate the channel response and detect
all the users bits. - Shown to have better performance as well as
reduced computational complexity. - Maximum Likelihood Based Channel Estimation
- C.Sengupta et al. PIMRC1998 WCNC1999
- Differencing Multistage Detection based on
Parallel Interference Cancellation - G.Xu et al. SPIE1999
5Computations Involved
delay
- Model
- Compute Correlation Matrices
Bits of K async. users aligned at times I and I-1
Received bits of spreading length N for K users
6Multishot Detection
Solve for the channel estimate, Ai
Multishot Detection
7Differencing Multistage Detection
- Stage 0 Matched Filter Detector
- Stage 1 to build differencing vector
- Successive Stages
Sdiag(AHA) y - soft decision d - detected
bits (hard decision)
8Structure of AHA
Not difficult to Compute AHA Block Bi-Diagonal
Matrix Use Structure
9Drawbacks
- Matrix Inversion/ Decomposition Needed
- Result not available till end of computation
- Delay before Detection
- Difficult for Tracking
- Higher Precision Needed
- Floating Point Units
- Larger Memory Requirements
- Storage of elements to compute inverse
- Float 32 bits / Input accuracy 12-14 bits
- SLOW! - Difficult to meet Real-Time
- S.Rajagopal et al. TI DSPFest1999
10Proposed Base-Station
No Multiuser Detection
TI's Wireless Basestation (http//www.ti.com/sc/do
cs/psheets/diagrams/basestat.htm)
11New Scheme
- Iterative Method to find the Channel Estimates
- S.Bhashyam et al. WCNC2000 (submitted)
- Can be easily adapted to Tracking for Fading
Channels - Fixed Point Implementation
- Estimates ready for detection Immediately
- Simpler Hardware and Software.
- Computation Savings only Per Bit
12Iterative Scheme
- Tracking
- Slow Fading Large Window L
- Fast Fading Smaller Window L
- Method of Steepest Descent
- Stable convergence behavior
- µ fixed Bit-by-Bit update
- Matches Closely to the Scheme with Inversions
13Simulations - AGWN Channel
Detection Window 12 SINR 0 Paths 3
Preamble 150 10000 bits/user
MF Matched Filter ML- Maximum Likelihood ACT
using inversion
14Fading Channel with Tracking
Doppler 10 Hz, 1000 Bits,15 users, 3 Paths
15DSP Implementation
- C6201 Texas Instruments
- Fixed Point Processor
- 200 MHz
- 32 -bit VLIW Architecture
- 8 Functional Units
- 2 Multipliers
- 4 Adders
- 2 Load/Store
- TI C Compiler
16Simulation
- Work in Progress!
- Why better?
- Fixed Point Implementation - Faster on DSPs
- Higher Clock Speeds / Faster Multiplications
- More SIMD Parallelism due to smaller wordlength.
- Software Code Simpler to write
- Smaller Program Size
- Problems
- Input Bit Precision Analysis
- Overflows
17Task - Partitioning the Algorithm
18Task Decomposition
S.Das et al Asilomar99
Block I
Block II
Block III
Task B
Matrix Products
Iterate
Correlation Matrices (Per Bit)
Block IV
M UX
d
A0HA1 O(K2N)
Multistage Detection (Per Window)
AR O(K2N)
RbrR O(KN)
b
A0HA0 O(K2N)
RbrI O(KN)
M UX
Data
AI O(K2N)
d
O(DK2M)
Rbb O(K2)
A1HA1 O(K2N)
Pilot
AHr O(KND)
Data
Multistage Detection
Channel Estimation
TIME
Task A
19Channel Estimation Architecture
- Detection Architecture
- One version already ready
- G.Xu - Masters Thesis 1999
- Advantages over DSP Implementation
- Optimal Memory Utilization
- Custom Blocks for exploiting available pipelining
and parallelism - Parts could be mapped to FPGA / Reconfigurable
logic - Shows theoretical bounds for maximum achievable
Data Rates - Shows how tasks could be split among different
processors
20Block Diagram
Each block shows no. of operations in it.
REAL
bit
IMAG
8-bit
21Channel Estimation
Each block shows no. of operations in it.
Window
b0b0 (2K2)
Inverter (2 K2)
A R (KN)
b0 (2K)
Rbb (2 K2)
b
Multiplier (2 K2N)
REAL
MUX (2 K2)
bb (2 K2)
Inverter (2K)
MUX (2K)
Rbr R (KN)
rR
r0 (N)
Atmp R
gtgt (4 K2)
MUX (N)
22Auto-correlation Structure
- b,b0 are 1-bit
- Subtraction by using inverter
- Rbb using a Counter
- Fully Parallel
- 2K2 elements O(1) Time
- Pipelined with LOAD
- 2K elements O(K) Time
- Serial with LOAD
- 1 element O(2K2) Time
Rbb (2 K2)
23Cross-Correlation Structure
- r is 8-bit, b is 1-bit
- Rbr using 8-bit Adders
- Based on sign of b
- Fully Parallel KN, O(1)
- Pipelined N , O(K)
- Serial 1, O(KN)
24Iterative Update Structure
- 8-bit Multipliers
- 16-bit Adders for Multiplier
- 8-bit Adders for A
- Parallel KN, O(K)
- Pipelined N , O(K2)
- Serial 1, O(K2N)
25Elements in each block
Example N 32,L 100, K 32 Fully Parallel
Solution 4K Multipliers, 12K Adders O(32)
Time Pipelined Solution 100 Multipliers, 300
Adders O(1K) Time
26Conclusions
- Iterative Scheme for Joint Estimation Detection
- No loss in algorithm performance
- Suitable for Hardware Implementation
- On DSPs, FPGAs and ASICs
- Supports Tracking for Fading Channels
- Fixed Point Implementation Feasible
- ASIC architecture
- To exploit available pipelining and parallelism
- Multiuser Channel Estimation and Detection
algorithms POSSIBLE to IMPLEMENT for W-CDMA.
27Future Work
- MS
- Extend Architecture to Long Codes
- Task Partition the algorithm on the Sundance
Multi-DSP/FPGA board to achieve real-time - Post-MS
- Downlink
- Architectures to Min. Power Consumption /Area
- Implementing Coding/Decoding Blocks and integrate
- RENE
28EXTRA SLIDES
29Data Rates Achieved
Assuming Channel Estimation Real-Time
30Fading Channel
- SNR 10 dB, Doppler 10 Hz, 1000 Bits