Title: Pr
1ON THE ARCHITECTURE OF THE CDMA2000
VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH
CODING STANDARD
Milan Jelinek, Redwan Salami, Sassan Ahmadi,
Bruno Bessette, Philippe Gournay and Claude
Laflamme
University of Sherbrooke, Canada - VoiceAge
Corp., Canada - Nokia inc., USA
Encoder Flow Chart
- VMR-WB
- Variable-Rate Multi-Mode Wideband Speech Codec
- New 3GPP2 WB Speech Coding Standard for 3G
applications - Main Features
- Near Face-to-Face Communication Speech Quality
- Source and Channel Controlled Operation (4
Modes) - 3GPP/ITU AMR-WB Directly Interoperable in Mode 3
- Average Bit Rates (ABR)
- Compliant with CDMA2000 Rate Set 2
VMR-WB Coding Techniques
- Source-Controlled Operation
- Hierarchical Signal Classification
- Operating on Frame-level
1. Voice Activity Detection (VAD)
2. Unvoiced Frame Decision
Based on the following parameters
Coding Type Bitrate kbit/s Description
Inactive Speech Coding CNG ER 1.0 -Noise excited LP filter -Smoothed over time
Inactive Speech Coding CNG QR 2.7 -As previous, but interoperable with AMR-WB CNG
Unvoiced Coding Unvoiced HR 6.2 -13 bit Gaussian codebook (4x/frame)
Unvoiced Coding Unvoiced QR 2.7 -As previous, but randomly chosen vectors
Voiced Coding Voiced HR 6.2 -Frame level signal modification -12 bit ACELP codebook (4x/frame)
Generic Coding Interoperable FR 13.3 -Similar to AMR-WB _at_ 12.65 kbit/s
Generic Coding Generic FR 13.3 -As previous FER protection
Generic Coding Interoperable HR 6.2 -As Interoperable FR, but with random algebraic codebook indices
Generic Coding Signaling HR 6.2 -As previous FER protection
Generic Coding Generic HR 6.2 -Pitch coded 2x/frame -12 bit ACELP codebook (4x/frame)
T open-loop pitch period estimate xi
perceptually weighted input signal
Eh average energy of last 2 critical bands.
El average energy of pitch-synchronous bins in
the first 10 critical bands
Active speech kbit/s 40 Speech Activity kbit/s
Mode 3 13.3 6.1
Mode 0 12.8 5.7
Mode 1 10.5 4.8
Mode 2 8.1 3.8
- Noise Estimation Update Decision
- Based on parameters with low sensitivity to noise
level - Pitch period varying
- AND normalized correlation at pitch period low
- AND low estimated order of AR model
- AND signal energy stationary
- INDEPENDENT of VAD decision!
- - Robust to noise level variations
- - Conservative approach the noise estimation is
updated only if quite sure the frame is inactive
E32(j) energy maximum in a bloc of 32-samples
- Relative Frame Energy - Erel
Decision
3. Voiced Frame Decision / Signal Modification
4. Low Energy Decision
- Channel-Controlled Operation
- 4 Operational Modes Controlled by Channel
Conditions - Transparent Memory-less Mode Switching
- Per-Frame Bit Rate Control Capability
- Coding Types Relative Usage in Active Speech
- Mode Switching Performance
- Enhancements at Decoder
- Low Frequency Post-processing
- Enhancement of the periodicity in low frequency
region
Performance (MOS scores from selection
test) CDMA Specific Modes (Modes 0, 1, 2), WB
Input
Performance (MOS scores from characterization
test)
- Voiced Decision is an Inherent Part of Original
Signal Modification Algorithm - Frame is coded as voiced if all constraints of
the modification are satisfied - Signal modification is done pitch-synchronously
- Pitch period evolution is piecewise linear
(constant at frame end) to avoid pitch period
oscillations - Modified input is synchronous with original
input at frame end - Modification is transparent at least up to 30 of
active speech frames (in the example bellow, no
coding is used and 30 of active clean speech
frames are modified)
- NB Input Test
- Modes 0, 1, 2, 3,
- Clean speech, nominal level
- Test on Interworking with AMR-WB _at_ 12.65 kbit/s
- -WB input, clean speech conditions
Purpose To avoid encoding unclassified frames
with low perceptual importance at Full Rate
Condition
Ref 0 AMR-WB _at_ 14.25 Ref 1 AMR-WB _at_ 12.65 Ref
2 AMR-WB _at_ 8.85
Test 0 VMR-WB Mode 0 Test 1 VMR-WB Mode
1 Test 2 VMR-WB Mode 2
Et sum of critical band energies for current
frame, in dB Ef long-term mean of Et for active
speech
Clean Speech Conditions
Example Typical example of a low-energy frame
encoded with Generic HR in mode 2
- Frame Errors Concealment
- Lost Frame Concealment
- Excitation energy and spectral envelope converge
to estimated noise. - Excitation periodicity converges to 0.
- Convergence rate depends on the signal class of
last good frame. - Recovery after erasure
- Careful energy control of synthesized speech.
- Artificial onset reconstruction in case of lost
voiced onset.
Channel Error Conditions
Background Noise Conditions