Title: Software Defined Radio
1Software Defined Radio A High Performance
Embedded Challenge
- Hyunseok Lee, Yuan Lin, Yoav Harel, Mark Woh,
- Scott Mahlke, Trevor Mudge, and 1Krisztian
Flautner - University of Michigan
- 1ARM Ltd
2Contents
- Software defined radio
- Categories of wireless networks
- Core technologies for future networks
- Case study W-CDMA Network
- Major algorithms
- Workload characterization
- Architectural implications
3Software Defined Radio
4Wireless Communication System
Transport
TCP/UDP
Network
IP
Baseband Processing
Analog Front-end
LINK
PPP
MAC
Physical Layer (PHY)
Upper Protocol Layers
Packets
Application bits
Air
5Anatomy of Cellular Phone
6Protocol on Wireless Platform
Application Processor
GPP (Software)
DSP/ Accelerator
Source coding
Audio AMR/QCELP
Video MPEG
Transport
GPP (Software)
Baseband Processor
Upper layers
Network
LINK
MAC
ASIC (Hardware)
Physical layer
PHY
7Software Defined Radio (SDR)
- Use software routines instead of ASICs for the
physical layer operations of wireless
communication system
ASICs (PHY)
Software Routines
Programmable Hardware
- Both Analog Frontend and Digital Baseband are the
scope of SDR
8Levels of SDR
Tier Name Description
Tier 0 Hardware Radio (HR) Implemented using hardware components. Cannot be modified
Tier 1 Software Controlled Radio (SCR) Only control functions are implemented in software inter-connects, power levels, etc.
Tier 2 Software Defined Radio (SDR) Software control of a variety of modulation techniques, wide-band or narrow-band operation, security functions, etc.
Tier 3 Ideal Software Radio (ISR) Programmability extends to the entire system with analog conversion only at the antenna.
Tier 4 Ultimate Software Radio (USR) Defined for comparison purposes only
ltsourcehttp//www.sdrforum.orggt
9Why we need SDR ?
- Seamless wireless connection End User
- Widely different wireless protocols
- TDMA GSM, AMPS
- CDMA IS-95, cdma2000, W-CDMA, IEEE 802.11b
- OFDM IEEE 802.11a/g/n, WiMAX
- Needs a terminal that can support multiple
wireless protocols - Easy infrastructure upgrade Service Provider
- Wireless protocols evolve continuously
- Ex) W-CDMA ? W-CDMA HSDPA
- Time to market Manufacturer
- Reduce hardware development time and cost
10Where can we use SDR ?
- Basestations
- Weak constraints on power and area
- Support several hundred subscribers
- Will be commercialized first
- Wireless terminals
- Tight constraints on power and area.
- Will be commercialized next
11Why SDR is challenging ?
- Analog Frontend
- Must be tunable across a range of carrier
frequencies and bandwidths. - Digital Baseband
- Super computer level computation power.
- gt 50 Gops per subscriber
- Tight power budget.
- 200 300 mW (_at_terminal)
- High level of programmability.
- Combination of heterogeneous signal processing
algorithms.
12Our Strategy
- Performance
- Exploit the parallelism in signal processing and
forward error correction (FEC) algorithms - Power
- Limit the programmability to minimize power
consumption. - Minimize both active and idle mode power
consumption - There exists trade off between power efficiency
and programmability
13Categories of Wireless Networks
14Categories of Wireless Networks
ltsource Wireless communication technology
landscape, DELL gt
15WWAN (Wireless Wide Area Network)
16WLAN / WMAN
- WLAN Wireless Local Area Network
- High data rate
- Poor mobility support
- WMAN Wireless Metro Area Network
- For last mile problem
- 802.16d Fixed WiMax
- 802.16e Mobile WiMax
17WPAN (Wireless Personal Area Network)
- Interconnecting personal devices
18Core technologies of future networks
19OFDM (Orthogonal Frequency Division Multiplexing)
- Transmit signal over several sub-carriers.
- Frequency spectrum of sub-carriers are
overlapped. (High spectral efficiency) - Highly susceptible to frequency error in
receiver.
20Major Computation in OFDM system
- FFT / IFFT
- N 64 IEEE 802.11a
- N 2562048 IEEE 802.16 WiMax
- Data precision 1216bits
- Amount of computations for OFDM operation
- 108 complex multiplications / sec
21MIMO (Multiple Input Multiple Output)
- Use multiple antennas for signal transmission and
reception - In ideal case, linearly increase channel capacity
- Can effectively compensate multipath fading
effect - Significantly increase receiver complexity
ltSingle Input Single Output (SISO)gt Channel
Capacity C W log2(1SNR)
ltMultiple Input Multiple Output (MIMO)gt Channel
Capacity C min(n, m) W log2(1SNR)
22Computation in MIMO receiver
- Amount of computation in MIMO receiver
- M of Tx/Rx antenna
- LT Length of preamble
- LP Length of payload
- 4 Tx/Rx antenna, 100 Mbps, 64 QAM, ½ coding rate
- 6 x 108 Computations / Sec
ltsource B. Hassibi, An Efficient Square-Root
Algorithm for BLASTgt
23LDPC code
- Low Density Parity Check (LDPC) code
- Turbo code like coding gain with lower
implementation cost. - Encoding
- Matrix multiplication, c xG
- G (Generator matrix) is large matrix. (e.g. 4K X
4K matrix) - Decoding
- Equivalent to find most probable vector x such
that Hx mod 2 0. - H (Parity check matrix) is large sparse matrix.
- Implementation
- There exist trade-off between coding gain and
implementation complexity
24Hybrid ARQ
- Reuse error frames for the decoding of
retransmitted frame - Require huge buffer space
25Case Study W-CDMA system
26Major Algorithms
27Physical layer of W-CDMA
Error Correction
Suppress the signal term in outside of stop band
Overcome severe error in short time interval
Assign signal waveform optimal for data
transmission
28Channel Encoder/Decoder
- Encoder
- Add systematic redundancy on source data
- Decoder
- Fix errors on received data with the systematic
redundancy information generated by encoder - W-CDMA system uses
- Convolutional code (for short voice and control
message) - Turbo code (for video stream and high speed
packet data)
29Channel Encoder
- Consists of flip-flops and exclusive OR gates
- Has negligible impact on workload
ltconvolutional encoder of W-CDMA systemgt
30Channel Decoder
- Determine maximally probable code sequence from
the received sequence. - Select C having minimum distance with received
sequence r - One of dominant workload
C1
C2
- ci code set - r received signal
r
d1
d2
. . .
dN
CN
31Channel Decoder Viterbi Algorithm
- Most popular decoding algorithm of convolutional
code - Consists of three steps
- Branch metric calculation (BMC)
- abs(a-b), Parallelizable
- Add compare select (ACS)
- min(ab, cd), Parallelizable
- Trace back (TB)
- Recursive pointer tracing, Sequential
- Amount of operation in W-CDMA
- 16Kbps voice 2Gops
32Channel Decoder Turbo decoder
- Two algorithms are widely used
- SOVA (Soft Output Viterbi Algorithm)
- Less computation intensive
- Lower error correction performance
- Max-LogMap algorithm
- More computation required
- Higher error correction performance
- Amount of operation in W-CDMA
- For 128 Kbps streaming data 18 Gops
33Turbo Decoder
- Based on the multiple iteration of SOVA /
Max-LogMap blocks. - More iterations show better performance.
ltHigh level block diagram of turbo decodergt
34Block Interleaver/Deinterleaver
- Overcome severe signal attenuation within short
time interval which frequently appears at
wireless channel. - Interleaver (_at_transmitter)
- Randomize the sequence of source data.
- Deinterleaver (_at_receiver)
- Recover original sequence by reordering.
- Amount of operation lt 10 Mops
ltexample of signal strength variationgt
Interleaving
Deinterleaving
123456789
? 147258369
? 123456789
? 147258369
35Spreader/Despreader
- Allow the transmission of several signals at the
same time. (xn and yn in the below diagram) - It is based on the orthogonality between
spreading codes
ltorthogonality between codesgt
36Spreader/Despreader
- Spreader / Despreader also suppress noise
- Amount of operation 4 Gops
37Scrambler/Descrambler
- Randomize the output signal by multiplying pseudo
random sequence so called scrambling code. - Allow multiple terminals to communicate at the
same time. - Amount of operation 3 Gops
Terminal 1, with scrambling code n
Terminal 2, with scrambling code m
38Low Pass Filter
- Suppress the signal terms at the outside of stop
band frequency.
Impulse signal
sinc function
Time domain
Filtering
Band limited signal
Band unlimited signal
Freq. domain
ltInput signalgt
ltOutput signalgt
39Low Pass Filter
- Use conventional FIR filter
- Number of filter tap (N) 32 64
- Amount of operation 12 Gops
40Rake Receiver Multipath fading
- Rake receiver mitigates multipath fading effect
- Multipath fading is a major cause of unreliable
wireless channel characteristic
x(t)
y(t) a0x(t)
y(t) a0x(t)a1x(t-d1)
y(t) a0x(t)a1x(t-d1)a2x(t-d2)
41Rake Receiver - Functions
- Ideally the function of rake receiver is to
aggregate the signal terms with proper delay
compensation
y(t) a0x(t)a1x(t-d1)a2x(t-d2)
Rake receiver
r(t) a0x(t-tdealy)a1x(t-d1-dest1)a2x(t-d2-dest
2) (a0a1a2) x(t-tdelay)
- We need to know delay spread of received signal
that randomly varies
42Rake Receiver Detect Delay Spread
- Scan the received signal in frame buffer while
computing correlation with scrambling code
sequence.
Correlation window
Received signal
Correlation Result
a1
a2
a0
0
d1
d2
43Computation of Rake Receiver
- Correlation computation LWLBF
- LW Correlation window 320
- LB Frame buffer size 5120
- F Operation Frequency 50
- 80 Mega Multiplications / sec
- Multiplications can be converted into subtraction
- Amount of operation in W-CDMA 25 Gops
- Most dominant workload
44Rake Receiver Overall Architecture
Detects delay spread
Compensates propagation delay
recombine signal terms without delay
45Power Control
Pilot Signal
u
Power Control Command
- Receiver controls the transmission power of
transmitter in order to minimize the interference
to other users. - Required computation is negligible
Strength of pilot signal is below the reference
level
Strength of pilot signal is above the reference
level
Refrence level
Terminal
Basestation
u
d
u
u
d
d
u
Terminal sends DOWN command
Terminal sends UP command
46H/W operation states
- For long idle period between sessions
- Periodic wake up for control message reception
- Minimum workload but dominate terminal standby
time
Idle
- For short idle period between packet burst
- Hold narrow control channel for fast transition
to Active - Intermediate workload
Control Hold
- For packet burst transmission period
- Use high speed packet channels up to 2Mbps
- Most heavily loaded state
Active
Radio resource control state defined in W-CDMA
specification
operation states defined according to H/W activity
47Workload Characterization
48Workload Profile
- One operation is equivalent to one RISC
instruction
- Searcher, Turbo decoder, and LPF are dominant
workloads
- Workload profile varies according to operation
state
49Processing Time Requirement
- Mixture of algorithms with various processing
time requirements - Classified into two categories
- Heavy workload with long processing time (turbo
decoder, searcher) - Light workload with short processing time
(Scrambler, spreader, LPF, Power control)
50Parallelism
- Most heavy workload algorithms have significant
vector parallelism
- Data width of most operation is 8 bit
51Memory Access Pattern
- Huge memory is not required
- Traffic between algorithm is not dominant
- Access rate of scratch pad memory is very high.
52Instruction Breakdown
- ADD/SUB are dominant instruction
- Multiplication is not dominant in heavy workloads
53Frequent Computations
- Most multiplications are simplified into cheaper
operations - Multiplication in LPF-Rx can not be simplified
because both operands are 16bit integer number.
54Architectural Implications
55Architectural Implications
- SIMD because
- We can exploit vector parallelism in W-CDMA
algorithms - Highly power efficiency can be achieved by
sharing control logic between datapath elements. - Chip multiprocessor because
- There exist substantial algorithm level
parallelism - There exist many tiny sequential algorithms
- Multiple SIMD Scalar
SIMD
SIMD
SIMD
.
Interconnection Network
Scalar
56Architectural Implications
- Memory structure
- Cache free
- Memory access pattern exhibits very dense spatial
locality. - Small data memory (lt64K)
- Small instruction memory (lt4K)
- Simple interconnection network
- Low inter-processor communication is possible by
algorithm level task mapping on each PE.
57Architectural Implication
- Power management
- Large workload variation according to operation
state and radio channel condition change. - Various power management schemes can be applied
- DVS, DFS, Clock gating.
- Idle mode power must be minimized because it
dominates terminal standby time.
58W-CDMA benchmark suite
- C based implementation of W-CDMA physical layer
operation. - Used for the workload characterization done in
this paper. - Available at
- www.eecs.umich.edu/sdrg
59Conclusion
- We discussed
- what is SDR and why it is challenging topic for
embedded system. - the evolution history of wireless protocols and
what are the core technologies of emerging
protocols. - We analyzed
- the workload characteristic of W-CDMA protocol
and its architectural implication.
60Backup Slides
61Viterbi Algorithms Trellis Diagram
- Viterbi algorithm is based on trellis diagram.
- Trellis diagram represents all possible state
transition of encoder.
lt Example of trellis diagram and corresponding
convolutional encodergt
62Viterbi Algorithm - BMC
- BMC (Branch metric calculation) operation is to
compute difference between the received sequence
r and outputs of trellis diagram.
BMCi,j distance(rij, oij)abs(rij, oij) oij
output of state transition form i to j rij
corresponding received sequence
Cn
distance between r(01) and Cn(10) 1 1 2
- All BMC operation in a trellis diagram can be
done in parallel.
63Viterbi Algorithm - ACS
- ACS(Add Compare Select) operation is
Add
Compare, Select
- This procedure is equivalent to finding a local
optimal code sequence. - If C1 has smallest ACS value at node state i,
then the ACS values of C2 and C3 are always
greater than that of C1
64Viterbi Algorithm - TB
- Trace back a code sequence which is most close to
the received sequence - Sequential algorithm
65Block Interleaver/Deinterleaver
- Interleaver
- Write row by row sequentially
- read column by column according to the predefined
permutation pattern - Deinterlever
- Write column by column according to the
predefined permutation pattern - read row by row sequentially
ltinterleaving proceduregt