Title: LDPC FEC for IEEE 802.11n Applications
1LDPC FEC forIEEE 802.11nApplications
- Eric Jacobsen
- Intel Labs
- Communications Technology Laboratory
- November 10, 2003
2Agenda
- Background why LDPCs?
- Fitting LDPCs to WLAN
- Details of candidate code
- Performance and use of candidate code
- Complexity analysis
- Summary
3Candidate Iterative FECs
- Turbo Codes (PCCC or SCCC)
- High complexity
- Poor performance with short blocks
- IP Issues
- Turbo Product Codes
- Medium Complexity
- Best performance at R 0.8
- Poor performance with short blocks
- Possible IP issues
- Low Density Parity Check Codes (LDPCs)
- Invented in 1962 No basic IP!
- Potential for low complexity constituent codes
are Parity Check relationships - Extremely good performance with long blocks
(C-0.0045dB!) - Very good performance with short blocks (Lin)
- Eliminate channel interleaver
4LDPC Codes solve several problems
- Close the large gap between current and
theoretical performance - Only known solution for good performance with
small block sizes - Enable Adaptive Bit Loading by eliminating the
channel bit interleaver - LDPCs incorporate the required randomization into
the code These are the only known codes that do
this! - This also provides a significant complexity
reduction - Offsets complexity of code
- Decoupling the FEC and modulation increases
flexibility
5Low Density Parity Check FEC
- Iterative decoding of simple parity check codes
- Published examples of good performance with short
blocks - Kou, Lin, Fossorier, Trans IT, Nov. 2001
- Near-capacity performance with long blocks
- Very near! - Chung, et al, On the design of
low-density parity-check codes within 0.0045dB of
the Shannon limit, IEEE Comm. Lett., Feb. 2001 - Complexity fears, especially in encoder
- Implementation Challenges
- Many options wrt decoding algorithms,
architectures, techniques
6LDPC Bipartite (Tanner) Graph
Check Nodes
Edges
Variable Nodes (Codeword bits)
This is an example bipartite graph for an
irregular LDPC code.
7BICM System with LDPC
The nature of the LDPC calls into question
whether the deinterleaver produces any benefit or
just defines a different LDPC code.
Receiver
FFT
Slicer
De- Interleaver
Demodulated Constellation Symbols
Detected Coded Bits
De-Interleaved Coded Bits
Corrected Bits
8Direct Coding with LDPC
Since the interleaver merely permutes the
order of the rows of the parity check matrix, it
can be deleted and its effects taken into account
in the code design.
Receiver
FFT
Slicer
A system with LDPC FEC should provide
superior performance with reasonable simplicity.
Since the interleaver can be excluded the
complexity drops further.
Demodulated Constellation Symbols
Detected Coded Bits
Corrected Bits
9191-bit block results, Kou
Capacity 1.2dB for R 0.69
10Large Block LDPCs in Fading
For large block sizes, In this case 105 and
106, LDPCs perform extremely close to
capacity. For a code with R ½ in AWGN, C
1.2 dB Eb/No (BICO).
11Candidate LDPC Code
- (2000, 1600) code, R 0.8
- Long enough for good performance, short enough to
implement - BER in AWGN is lt1.5dB from Capacity at Pe 10-5
- Column weights are controlled by the code design
- Four edges per information bit, two per parity
bit - Last parity bit has one edge
- 18 edges per check node (regular in H1)
- Total of 7199 edges
- Simplified Encoder
- BCJR or Min-Sum decoding algorithm
- Min-Sum costs 0.3dB in peformance, cuts gate
count
12Performance in AWGN
Capacity for R 0.8 is 2.044dB, shown with a
vertical dashed red line. At Pe 10-5 the LDPC
code is lt1.5dB from Capacity.
13LDPC, ABL in fading
These results are in Channel Model D, 50ns delay
spread. The Viterbi-UBL results are essentially
an 802.11a reference system. The LDPC-UBL
results use a fixed code rate of R 0.8.
14LDPC, ABL in fading
These results are in Channel Model D, 50ns delay
spread. The Viterbi-ABL results use puncturing
and modulations BPSK, QPSK, 16-QAM and 64-QAM,
with variable code rate. The LDPC-ABL
results use puncturing, QPSK, 16-QAM, and
64-QAM, with a fixed code rate of R 0.8. The
throughput curve drops off at low SNR because
BPSK is not part of the adaptation menu.
15LDPC, ABL in fading
These results are in Channel Model D, 50ns delay
spread. The Viterbi-ABL results use puncturing
and modulations BPSK, QPSK, 16-QAM and 64-QAM,
with variable code rate. The LDPC-ABL
results use puncturing, QPSK, 16-QAM, and
64-QAM, with a fixed code rate of R 0.8. The
throughput curve drops off at low SNR because
BPSK is not part of the adaptation menu.
16Selected LDPC Code Use
- Long packets are encoded by concatenating
codewords - 1500 byte packet overhead is 8 codewords
- Short packets are accommodated with code
shortening - Parity stays constant, information field
shortened - Short packets consume the minority of airtime, so
code rate reduction carries little penalty - Increase in reliability for short packets comes
at low cost
17Dartmouth Usage Statistics
1500 byte packets are the driving long packet
type.
18Packet size accommodation
2000 bit codeword
Long packets use concatenated codewords
400 bit parity
N bit data field
1600-N bit zero pad
Short blocks use shortened codewords. The zero
pad is not transmitted.
19Comparative Performance(AWGN)
LDPC (2000, 1600) r 4/5 vs. K7 convolutional code r 3/4.
20LDPC Shortened Packet Performance vs Eb/No
Shown are the effects of shortening the code from
1600 information bits to 800 and 400 bits (code
rates of R 2/3 and R ½ , respectively. Perfor
mance for both 50 and 8 iterations are shown to
verify performance for the shortened
codes. Allowing the code rate to drop with packet
size maintains power efficiency for short packets.
21LDPC Shortened Packet Performance vs SNR
Shortened code Performance Is shown vs SNR. The
gain from shortening the codes can be used
to increase range if also applied to
longer packets by concatenation.
22Iteration Management
- LDPCs are iteratively decoded
- The number of iterations affects the code
performance - The number of iterations also affects the
complexity
23Mother Code Iteration Study
Viterbi, R 0.8 (estimated)
11, 12
4
5
10
Viterbi, R 3/4
50
6
9
7
8
1600-bit packets for all cases.
24Complexity Tradeoffs
- Gate and memory complexity decrease with
increasing clock rate - Serialization of processing allows gate and
memory reuse - Gate complexity increases with number of
iterations - Memory stays constant
- BCJR more than 2x gate complexity over Min-Sum
kernel - 0.3dB performance improvement
- If memory complexity drives, then BCJR is a good
option
25Latency Drives
- For any block code for 802.11 the MAC latency
requirements will drive - 1600 bits at 240 Mbps takes 6.6us to receive
- SIFS budget drives, so for worst-case we assume a
1us budget allocated to the FEC block
26Analysis Assumptions
- 240 Mbps target
- Should encompass most modes
- Eight iterations
- Two processing clocks per information bit
- Keep duty cycle low, reduces power consumption?
- BCJR algorithm
27Complexity Estimates
- Gates
- 1us 240 cycles at 240 MHz
- Computation gates, BCJR 124k gates
- Additional control, sums, etc., 40k gates
- Estimated BCJR total gate count 164k gates
- Estimated Min-Sum total gate count 98k gates
- Memory
- Scratchpad, computation, buffering 120k bits
- Code address ROM 93.6k bits
28LDPC Decoder Area vs Latency
BCJR reference case
Shown is the estimated normalized die area,
relative to a target reference, as a function of
decoding latency. This takes into account only
the reduction in gates by allowing the reuse of
the maxx() hardware, and does not consider that
the scratchpad memory size could also be reduced.
29Encoder Complexity
The generic block encoder definition. A typical
LDPC generator matrix, G, is high density for a
low density parity check matrix H.
By carefully partitioning G, the low density H
matrix may be used and separated into two
portions, H1 and H2, where H2 takes the
low- density form shown. The inverse transpose
of H2 can then be implemented as a
differential encoder.
30Encoder Implementation
The final encoder structure is as shown above.
The data vector, u, is the systematic portion of
the codeword, v. The parity bits, p,
are generated from the low-density matrix H1 and
the differential encoder 1/1D.
31Complexity Summary
- 164k gates computation and control with BCJR
- 98k gates computation and control with Min-Sum
- This is to achieve 1us decode time. Gate counts
drop dramatically as latency is allowed to
increase. - Memory estimate is 120k bits of RAM and 93.6k
bits of control ROM - This is a conservative budgetary estimate. Other
decoding algorithms or trick implementations may
yield different results.
32Summary
- This LDPC code by itself provides 2-3dB of gain
- Implementation is practical much flexibility in
approach - Less than 1.5dB from AWGN Capacity at Pe 10-5
with a 1600-bit data block and R 0.8 - Flexible in code rate and data block size
- Shortening schemes allow no restrictions on data
block size - Observing OFDM symbol boundaries is not required
- Eliminates Channel Interleaver
- Decouples FEC from modulation, MIMO/SISO,
higher-order modulation, etc.
33Backup
34Partial Reference List
- TCM
- G. Ungerboeck, Channel Coding with
Multilevel/Phase Signals, IEEE Trans. IT, Vol.
IT-28, No. 1, January, 1982 - BICM
- G. Caire, G. Taricco, and E. Biglieri,
Bit-Interleaved Coded Modulation, IEEE Trans.
On IT, May, 1998 - LDPC
- Ryan, W., An Introduction to Low Density Parity
Check Codes, UCLA Short Course Notes, April,
2001 - Kou, Lin, Fossorier, Low Density Parity Check
Codes Based on Finite Geometries A Rediscovery
and New Results, IEEE Transactions on
Information Theory, Vol. 47, No. 7, November 2001 - R. Gallager, Low-density parity-check codes,
IRE Trans. IT, Jan. 1962 - Chung, et al, On the design of low-density
parity-check codes within 0.0045dB of the Shannon
limit, IEEE Comm. Lett., Feb. 2001 - J. Hou, P. Siegel, and L. Milstein, Performance
Analysis and Code Optimisation for Low Density
Parity-Check Codes on Rayleigh Fading Channels
IEEE JSAC, Vol. 19, No. 5, May, 2001 - L. Van der Perre, S. Thoen, P. Vandenameele, B.
Gyselinckx, and M. Engels, Adaptive loading
strategy for a high speed OFDM-based WLAN,
Globecomm 98 - Numerous articles on recent developments LDPCs,
IEEE Trans. On IT, Feb. 2001
35Performance comparison around 1.5 bit/s/Hz
Hughes NS (LDPC) SpaceBridge (PCCC)
Efficiency
DVBS
DVBS30
C/N