Other Features

About This Presentation

Title:

Other Features

Description:

Title: Voice over Packet - Part 3 Subject: compression Author: Y(J)S Keywords: telephony, packet networks, speech compression Last modified by: Yaakov Stein –

Number of Views:85

Avg rating:3.0/5.0

Slides: 47

Provided by: YJS5

Category:

more less

Transcript and Presenter's Notes

Title: Other Features

1
OtherFeatures
2
Echo Cancellation
3
Acoustic Echo
Ecan
4
Line echo
Ecan
hybrid
hybrid
Telephone 1
Telephone 2
5
Subjective reaction to echo
Ecan
Required suppression (dB) Round-Trip Delay (ms)
1.4 0
11.1 20
17.7 40
22.7 60
27.2 80
30.9 100
6
Ecan
7
Subjective effect of 15 dB echo returns loss.
Ecan
Percent Difficulty Decrease in MOS Round-trip Delay (ms)
0 0 0
30 1.3 300
60 2.0 600
60 2.0 1200
8
Echo suppressor
Ecan

In practice need more VOX, over-ride, reset, etc.
9
Why not echo suppresion?
Ecan

Echo suppression makes conversation half duplex
Waste of full-duplex infrastructure
Conversation unnatural
Hard to break in
Dead sounding line
It would be better to cancel the echo
subtract the echo signal allowing desired signal
through
but that requires DSP.

10
Echo cancellation?
Ecan

Unfortunately, its not so easy
Outgoing signal is delayed, attenuated, distorted
Two echo canceller architectures
MODEM TYPE
LINE ECHO CANCELLER (LEC)

-
echo path
near end
far end
clean
clean
-
near end
far end
echo path
11
LEC architecture
Ecan

h y b r i d
A/D
NLP
-
Y
filter H
doubletalk detector
adapt
near end
far end
X
D/A
12
Adaptive Algorithms
Ecan

How do we
find the echo cancelling filter?
keep it correct even if the echo path parameters
change?
Need an algorithm that continually changes the
filter parameters
All adaptive algorithms are based on the same
ideas
(lack of corellation between desired signal and
interference)
Lets start with a simpler case - adaptive noise
cancellation

13
Noise cancellation
Ecan
y
h n
x
e n
y
x
-
n
h
e
14
Noise cancellation - cont.
Ecan

Assume that noise is distorted only by unknown
gain h
We correct by transmitting e n so that the
audience hears
y x h n - e n x (h-e) n
the energy of this signal is
Ey lt y2 gt lt x2 gt (h-e)2 lt n2 gt 2 (h-e) lt
x ngt
Assume that Cxn lt x ngt 0
We need only set e to minimize Ey ! (turn knob
until minimal)
Even if the distortion is a complete filter h
we set the ANC filter e to minimize Ey

15
The LMS algorithm
Ecan

Gradient descent on energy
correction to H is proportional to error d times
input X

H H l d X
16
Nonlinear processing
Ecan

Because of finite numeric precision
the LEC (linear) filtering can not completely
remove echo
Standard LEC adds center clipping to remove
residual echo
Clipping threshold needs to be properly set by
adaptation

17
Doubletalk detection
Ecan

Adaptation of H should take place only when far
end speaks
So we freeze adaptation when no far end or
double-talk,
that is whenever near end speaks
Geigel algorithm compares absolute value of
near-end speech
to half the maximum absolute value in X buffer
If near-end exceeds far-end can assume only
near-end is speaking

18
DataRelays
19
The need for relays
Relays

Voice is a relatively forgiving signal (rather
the ear is)
Compression techniques are designed to pass voice
but may hopelessly distort other signals
Even simple tones (or DTMF) may not be passed by
coders
We could go back to 64Kbps G.711 for non-voice
signals
But isnt that silly?
Using 64Kbps for 64bps or even 9.6Kbps data?
The solution is to use a relay

20
Open Channel

Reasons to use 64Kbps G.711 (open channel)
(32 Kbps ADPCM may work as well)
Inexpensive
Simple design
Robust
Even open channel is not trivial!
Need dynamic BW mechanism
Need to detect the event (fax/modem tone, DTMF,
MF, CPT, etc.)
Need to return to compressed voice (end of
session, time-out)

21
Tone / Fax / Modem Relay
Relays
Demodulate/ Remodulate
Demodulate/ Remodulate
A/D D/A
Analog
64 Kbps
64 Kbps
A/D D/A
Analog

Problems
need highly accurate detectors
need low false alarm rate
need appropriate protocol
need accurate timing
need expensive DSP processing
delay may be too large
may need spoofing
can sides operate with different parameters?

22
VoP DSP Architecture
Relays
Voice Packet Module
Tone Detector
PCM Interface Tone Generator
VAD CNG DISC.
LEC
Packet Voice Protocol
Multi Channel Codec
Speech Coders
Serial Port
Playout Unit
Real Time Operating System
Control
23
VoP System Implementation
Relays
Signaling
Network Management Module
NM info
Telephony Signaling Module Microprocessor
PSTN
ATM / FR / IP Network
Voice Packet Module
Packet Protocol Module
Voice
Voice Signaling Packets
DSP
Microprocessor
24
Quality of Service
25
The meaning of QoS
QoS

For general purpose data
Every little bit counts
only lossless compression
best effort delivery
Real-time not essential
dynamic routing and packet reordering allowed
For speech
Only subjective quality counts
Can use lossy compression
Can drop segments with little effect
Real-time essential
predetermined route preferable (traffic
engineering)

26
PSTN QoS
QoS

Virtually all calls (gt95) completed
Once connected virtually no disconnects or faults
Toll quality voice
Low delay (except satellite calls)
Full switching, optimized routing
Call Management
Fax/Modem functions
Wireline and wireless services

27
Paying for QoS
QoS

Law of Photonics
Price of transmitting a bit drops by half
every 9 months
Free Internet telephony
Several firms offering free long distance
service over Internet
Strong compression, significant delay and
jitter
We no longer need to pay for service
but we are willing to pay for quality
of service

28
Paying for QoS
QoS
toll
wire service
mobile service
29
SpeechQualityMeasurement
30
Why does it sound the way
it sounds?
SQM

PSTN
BW0.2-3.8 KHz, SNRgt30 dB
PCM, ADPCM (BER 10-3)
five nines reliability
line echo cancellation
Voice over packet network
speech compression
delay, delay variation, jitter
packet loss/corruption/priority
echo cancellation

31
Subjective Voice Quality
SQM

Old Measures
5/9
DRT
DAM
The modern scale
MOS
DMOS

meet neat seat feet Pete beat heat
32
MOS according to ITU
SQM

P.800 Subjective Determination of Transmission
Quality
Annex B Absolute Category Rating (ACR)
Listening Quality
Listening Effort
5 excellent relaxed
4 good attention needed
3 fair moderate effort
2 poor considerable effort
1 bad no meaning
with feasible
effort

33
MOS according to ITU (cont)
SQM

Annex D Degradation Category Rating (DCR)
Annex E Comparison Category Rating (CCR)
ACR not good at high quality speech
DCR
CCR
5 inaudible
4 not annoying
3 slightly annoying much better
2 annoying better
1 very annoying slightly better
0 the same
-1 slightly worse
-2 worse
-3 much worse

34
Some MOS numbers
SQM

Effect of Speech Compression
(from ITU-T Study Group 15)
Quiet room 48 KHz 16 bit linear sampling 5.0
PCM (A-law/mlaw) 64 Kb/s 4.1
G.723.1 _at_ 6.3 Kb/s 3.9
G.729 _at_ 8 Kb/s 3.9
ADPCM G.726 32 Kb/s 3.8
toll quality
GSM _at_ 13Kb/s 3.6
VSELP IS54 _at_ 8Kb/s 3.4

35
The Problem(s) with MOS
SQM

Accurate MOS tests are the only reliable
benchmark
BUT
MOS tests are off-line
MOS tests are slow
MOS tests are expensive
Different labs give consistently different
results
Most MOS tests only check one aspect of system

36
The Problem(s) with SNR
SQM

Naive question Isnt CCR the same as SNR?
SNR does not correlate well with subjective
criteria
Squared difference is not an accurate comparator
Gain
Delay
Phase
Nonlinear processing

37
Speech distance measures
SQM

Many objective measures have been proposed
Segmental SNR
Itakura Saito distance
Euclidean distance in Cepstrum space
Bark spectral distortion
Coherence Function
None correlate well with MOS
ITU target - find a quality-measure that does
correlate well

38
Return to Biology
SQM

Standard speech model (LPC)
(used by most speech processing/compression/re
cognition systems)
is a model of speech production
Unfortunately, speech production and perception
systems
are not matched
Speech quality measurement idea
use a models of human auditory system
(perception)
ITU-T P.861 Perceptual Speech Quality Measurement
(PSQM)
ITU-T P.862 Perceptual Evaluation of Speech
Quality (PESQ)
ITU-R BS1387 Objective Measurements of Perceived
Audio Quality

39
Some objective methods
SQM

Perceptual Speech Quality Measurement (PSQM)
ITU-T P.861
Perceptual Analysis Measurement System (PAMS)
BT proprietary technique
Perceptual Evaluation of Speech Quality (PESQ)
ITU-T P.862
Objective Measurement of Perceived Audio Quality
(PAQM)
ITU-R BS.1387
E-model
ITU-T G.107, G.108 ETSI ETR-250

40
Objective Quality Strategy
SQM
speech
41
PSQM philosophy(from P.861)
SQM
Internal Representation
Perceptual model
Audible Difference
Cognitive Model
Perceptual model
Internal Representation
42
PSQM philosophy (cont)
SQM

Perceptual Modelling (Internal representation)
Short time Fourier transform
Frequency warping (telephone-band filtering, Hoth
noise)
Intensity warping
Cognitive Modelling
Loudness scaling
Internal cognitive noise
Asymmetry
Silent interval processing
PSQM Values
0 (no degradation) to 6.5 (maximum degradation)
Conversion to MOS
PSQM to MOS calibration using known references
Equivalent Q values

43
Problems with PSQM
SQM

Designed for telephony grade speech codecs
Doesnt take network effects into account
filtering
variable time delay
localized distortions
Draft standard P.862 adds
transfer function equalization
time alignment, delay skipping
distortion averaging

44
PESQ philosophy(from P.862)
SQM
Perceptual model
Internal Representation
Cognitive Model
Audible Difference
Time Alignment
Perceptual model
Internal Representation
45
E-model
SQM