Title: Implementing Multiuser Channel Estimation and Detection for W-CDMA
1Implementing Multiuser Channel Estimation and
Detection for W-CDMA
- Sridhar Rajagopal, Srikrishna Bhashyam,
- Joseph R. Cavallaro and Behnaam Aazhang
- Rice University
- sridhar,skrishna,cavallar,aaz_at_rice.edu
This work is supported by Nokia, Texas
Instruments, Texas Advanced Technology Program
and NSF
- Joint Estimation Detection
- An Implementation-Friendly Scheme
- Simulations
- Architectural Features
- Task Partitioning
- Area-Time Tradeoffs
- Conclusions
- Future Work
3Base-Station with MUD
4Joint Estimation Detection
- Jointly estimate the channel response and detect
all the users bits. - Shown to have better performance as well as
reduced computational complexity. - Maximum Likelihood Based Channel Estimation
- C.Sengupta et al. PIMRC1998 WCNC1999
- Differencing Multistage Detection based on
Parallel Interference Cancellation - G.Xu et al. SPIE1999
5Computations Involved
- Model
- Compute Correlation Matrices
Bits of K async. users aligned at times I and I-1
Received bits of spreading length N for K users
6Multishot Detection
Solve for the channel estimate, Ai
Multishot Detection
7Differencing Multistage Detection
- Stage 0 Matched Filter Detector
- Stage 1 to build differencing vector
- Successive Stages
Sdiag(AHA) y - soft decision d - detected
bits (hard decision)
8Structure of AHA
Not difficult to Compute AHA Block Bi-Diagonal
Matrix Use Structure
- Matrix Inversion/ Decomposition Needed
- Result not available till end of computation
- Delay before Detection
- Difficult for Tracking
- Higher Precision Needed
- Floating Point Units
- Larger Memory Requirements
- Storage of elements to compute inverse
- Float 32 bits / Input accuracy 12-14 bits
- SLOW! - Difficult to meet Real-Time
- S.Rajagopal et al. TI DSPFest1999
10Proposed Base-Station
No Multiuser Detection
TI's Wireless Basestation (http//www.ti.com/sc/do
11New Scheme
- Iterative Method to find the Channel Estimates
- S.Bhashyam et al. WCNC2000 (submitted)
- Can be easily adapted to Tracking for Fading
Channels - Fixed Point Implementation
- Estimates ready for detection Immediately
- Simpler Hardware and Software.
- Computation Savings only Per Bit
12Iterative Scheme
- Tracking
- Slow Fading Large Window L
- Fast Fading Smaller Window L
- Method of Steepest Descent
- Stable convergence behavior
- ยต fixed Bit-by-Bit update
- Matches Closely to the Scheme with Inversions
13Simulations - AGWN Channel
Detection Window 12 SINR 0 Paths 3
Preamble 150 10000 bits/user
MF Matched Filter ML- Maximum Likelihood ACT
using inversion
14Fading Channel with Tracking
Doppler 10 Hz, 1000 Bits,15 users, 3 Paths
15DSP Implementation
- C6201 Texas Instruments
- Fixed Point Processor
- 200 MHz
- 32 -bit VLIW Architecture
- 8 Functional Units
- 2 Multipliers
- 4 Adders
- 2 Load/Store
- TI C Compiler
- Work in Progress!
- Why better?
- Fixed Point Implementation - Faster on DSPs
- Higher Clock Speeds / Faster Multiplications
- More SIMD Parallelism due to smaller wordlength.
- Software Code Simpler to write
- Smaller Program Size
- Problems
- Input Bit Precision Analysis
- Overflows
17Task - Partitioning the Algorithm
18Task Decomposition
S.Das et al Asilomar99
Block I
Block II
Block III
Task B
Matrix Products
Correlation Matrices (Per Bit)
Block IV
A0HA1 O(K2N)
Multistage Detection (Per Window)
RbrR O(KN)
A0HA0 O(K2N)
RbrI O(KN)
Rbb O(K2)
A1HA1 O(K2N)
Multistage Detection
Channel Estimation
Task A
19Channel Estimation Architecture
- Detection Architecture
- One version already ready
- G.Xu - Masters Thesis 1999
- Advantages over DSP Implementation
- Optimal Memory Utilization
- Custom Blocks for exploiting available pipelining
and parallelism - Parts could be mapped to FPGA / Reconfigurable
logic - Shows theoretical bounds for maximum achievable
Data Rates - Shows how tasks could be split among different
20Block Diagram
Each block shows no. of operations in it.
21Channel Estimation
Each block shows no. of operations in it.
b0b0 (2K2)
Inverter (2 K2)
A R (KN)
b0 (2K)
Rbb (2 K2)
Multiplier (2 K2N)
MUX (2 K2)
bb (2 K2)
Inverter (2K)
MUX (2K)
Rbr R (KN)
r0 (N)
Atmp R
gtgt (4 K2)
22Auto-correlation Structure
- b,b0 are 1-bit
- Subtraction by using inverter
- Rbb using a Counter
- Fully Parallel
- 2K2 elements O(1) Time
- Pipelined with LOAD
- 2K elements O(K) Time
- Serial with LOAD
- 1 element O(2K2) Time
Rbb (2 K2)
23Cross-Correlation Structure
- r is 8-bit, b is 1-bit
- Rbr using 8-bit Adders
- Based on sign of b
- Fully Parallel KN, O(1)
- Pipelined N , O(K)
- Serial 1, O(KN)
24Iterative Update Structure
- 8-bit Multipliers
- 16-bit Adders for Multiplier
- 8-bit Adders for A
- Parallel KN, O(K)
- Pipelined N , O(K2)
- Serial 1, O(K2N)
25Elements in each block
Example N 32,L 100, K 32 Fully Parallel
Solution 4K Multipliers, 12K Adders O(32)
Time Pipelined Solution 100 Multipliers, 300
Adders O(1K) Time
- Iterative Scheme for Joint Estimation Detection
- No loss in algorithm performance
- Suitable for Hardware Implementation
- On DSPs, FPGAs and ASICs
- Supports Tracking for Fading Channels
- Fixed Point Implementation Feasible
- ASIC architecture
- To exploit available pipelining and parallelism
- Multiuser Channel Estimation and Detection
algorithms POSSIBLE to IMPLEMENT for W-CDMA.
27Future Work
- MS
- Extend Architecture to Long Codes
- Task Partition the algorithm on the Sundance
Multi-DSP/FPGA board to achieve real-time - Post-MS
- Downlink
- Architectures to Min. Power Consumption /Area
- Implementing Coding/Decoding Blocks and integrate
29Data Rates Achieved
Assuming Channel Estimation Real-Time
30Fading Channel
- SNR 10 dB, Doppler 10 Hz, 1000 Bits