Title: Fundamentals of Digital Audio
1Fundamentals of Digital Audio
2The Central Problem
- Sound waves consist of air pressure changes
- This is what we see in an oscilloscope view
changes in air pressure over time
3The Central Problem
- Waves in nature, including sound waves, are
continuous
Between any two points on the curve,
no matter how close together they are, there are
an infinite number of points
4The Central Problem
- Analog audio (vinyl, tape, analog synths, etc.)
involves the creation or imitation of a
continuous wave. - Computers cannot represent continuity (or
infinity). - Computers can only deal with discrete values.
- Digital technology is based on converting
continuous values to discrete values.
5Digital Conversion
- The instantaneous amplitude of a continuous wave
is measured (sampled) regularly. The measurement
values, samples, may be stored in a digital
system.
6Digital Conversion
- The instantaneous amplitude of a continuous wave
is measured (sampled) regularly. The measurement
values, samples, may be stored in a digital
system.
1.0
0.9998
0.9998
0.9993
0.9993
0.9986
0.9986
0.9975
0.9975
0.9961
0.9961
0.9945
0.9945
0.9925
0.9925
7Digital Conversion
- The amplitude of a continuous wave is measured
(sampled) regularly. The measurement values,
samples, may be stored in a digital system.
0.9925, 0.9945, 0.9961, 0.9975, 0.9986, 0.9993,
0.9998, 1.0, 0.9998, 0.9993, 0.9986, 0.9975,
0.9961, 0.9945, 0.9925
8Digital Audio
- Digital representation of audio is analogous to
cinema representation of motion. - We know that moving pictures are not really
moving cinema is simply a series of pictures of
motion, sampled and projected fast enough that
the effect is that of apparent motion. - With digital audio, if a sound is sampled often
enough, the effect is apparent continuity when
the samples are played back.
9Digital Audio
- Con
- It is, at best, only an approximation of the
wave - Pros
- Significantly lower background noise levels
- Sounds are more reliably stored and duplicated
- Sounds are easier to manipulateRather than
worry about how to change the shape of a wave,
engineers need only perform appropriate numerical
operations.e.g., changing the volume level of a
digital audio file is simply a matter of
multiplication each sample value is multiplied
by a value that raises or lowers it by a certain
percentage.
10Digital Audio
- The theory behind digital representation has
existed since the 1920s. - It wasnt until the 1950s that technology caught
up to the theory, and it was possible to
implement digital audio.
11Digital Audio
- Bell Labs produced the first digital audio
synthesis in the 1950s. - For computer synthesis, a series of samples was
calculated and stored in a wavetable. - The wavetable described, in connect-the-dots
fashion, the shape of a wave (i.e., its timbre). - Reading through the wavetable at different rates
(skipping every n samples, the sampling
increment) allowed different pitches to be
created. - Audio was produced by feeding the samples that
were to be audified through a digital to analog
converter (DAC).
12Digital Audio
- Contemporary computer sound cards often contain a
set of wavetable sounds. - The function is the same a library of samples
describing different waveforms. - They are triggered by MIDI commands. (These will
be covered fully in a few weeks.) For example, a
given note number will translate to the table
being read at a certain sampling increment to
produce the desired pitch.
13Digital Audio
- Digital recording became possible in the 1970s.
- Voltage input from a microphone is fed to an
analog to digital converter (ADC), which stores
the signal as a series of samples. - The samples can then be sent through a DAC for
playback.
14Digital Audio
- Thus, the ADC produces a dehydrated version of
the audio. - The DAC then rehydrates the audio for playback.
- (Gareth Loy, Musimathics v. 2)
15Characteristics of Digital Audio
- With digital audio, we are concerned with two
measurements - Sampling rate
- Quantization
- With these measurements, we can describe how well
a digitized audio file represents the analog
original.
16Sampling Rate
- This number tells us how often an audio signal is
sampled, the number of samples per second. - The more often an audio signal is sampled, the
better it is represented in discrete form
17Sampling Rate
- This number tells us how often an audio signal is
sampled, the number of samples per second. - The more often an audio signal is sampled, the
better it is represented in discrete form
18Sampling Rate
- This number tells us how often an audio signal is
sampled, the number of samples per second. - The more often an audio signal is sampled, the
better it is represented in discrete form
Of course, this staircase-shaped wave needs to be
smoothed. This process will be covered during the
discussion on filtering.
19Sampling Rate
- So we want to sample an audio wave every so
often.The question is how often is often
enough? - Harry Nyquist of Bell Labs addressed this
question in a 1925 paper concerning telegraph
signals.
20Sampling Rate
- Given that a wave will be smoothed by a
subsequent filtering process, it is sufficient to
sample both its peak and its trough
21Sampling Rate
- Thus, we have the sampling theorem(also called
the Nyquist theorem)
To represent digitally a signal containing
frequency components up to X Hz, it is necessary
to use a sampling rate of at least 2X samples per
second.
- Conversely, the maximum frequency contained in a
signal sampled at a rate of SR is SR/2 Hz. - The frequency SR/2 is also termed the Nyquist
frequency.
22Sampling Rate
- In theory, since the maximum audible frequency is
20 kHz, a sampling rate of 40 kHz would be
sufficient to re-create a signal containing all
audible frequencies.
23Sampling Rate
- For most frequencies, we will oversample (the
audio frequency is below the Nyquist frequency)
24Sampling Rate
- For most frequencies, we will oversample (the
audio frequency is below the Nyquist frequency)
25Sampling Rate
- If we sample at precisely the Nyquist frequency,
our critically sampled signal runs the risk of
missing peaks and troughs
or
- This problem is also addressed by filtering.
26Sampling Rate
- More serious is the problem of undersampling a
frequency greater than the Nyquist frequency
Audio signal at 30 kHz, sampled at 40 kHz
RESULT
27Sampling Rate
- More serious is the problem of undersampling a
frequency greater than the Nyquist frequency
Audio signal at 30 kHz, sampled at 40 kHz
RESULT
The frequency is misrepresented at 10 kHz, at
reverse phase
Misrepresented frequencies are termed aliases.
28Sampling Rate
- In general, if a frequency, F, sampled at a
sampling rate of SR, exceeds the Nyquist
frequency, that frequency will alias to a
frequency of- (SR - F)
The minus sign indicates that the frequency is in
opposite phase
29Sampling Rate
- It is useful to illustrate sampled frequencies on
a polar diagram, with 0 Hz at 300 and the
Nyquist frequency at 900
f
The upper half of the circle represents
frequencies from 0 Hz to the Nyquist frequency
0 Hz
Nyquist
The lower half of the circle represents negative
frequencies from 0 Hz to the Nyquist frequency
(there is no distinction in a digital audio
system between NF)
-f
Any audio frequency above the Nyquist frequency
will alias to a frequency shown on the bottom
half of the circle, a negative frequency between
0 Hz and the Nyquist frequency.
Frequencies above the Nyquist frequency do not
exist in a digital audio system
30Sampling Rate
- In the recording process, filters are used to
remove all frequencies above the Nyquist
frequency before the audio signal is sampled. - This step is critical since aliases cannot be
removed later. - Provided these frequencies are not in the sampled
signal, the signal may be sampled and later
reconverted to audio with no loss of frequency
information.
31Sampling Rate
- The sampling rate for audio CDs is 44.1 kHz.
- The origin of this rate lies in video formats.
- When digital audio recording began, audio tape
was not capable of handling the density of
digital signals. - The first digital masters were stored on video as
a psuedo video signal, in which binary values of
1 and 0 were stored as video levels of black and
white.
32Sampling Rate
Video is drawn left to right, starting from the
top of the screen and moving down.
First the odd numbered lines are drawn, then the
even numbered lines.
Each video frame has two fields the odd field
and the even field. The fields are adjacent to
each other on the video tape.
Frame n1, even
Frame n2, odd
Frame n, even
Frame n1, odd
Frame n, odd
33Sampling Rate
- There are two video formats
- 525 lines, 30 frames per second (USA)Minus 35
blank lines, leaving 490 lines per frame60
fields per second, 245 lines per field - 625 lines, 25 frames per second (European)Minus
37 blank lines, leaving 588 lines per frame50
fields per second, 294 lines per field - Three samples could be stored on each line,
allowing
60 x 245 x 3 44,100 samples per second
or
50 x 294 x 3 44,100 samples per second
- 44.1 kHz remains the standard sampling rate for
CD audio.
34Quantization
- This has a few names
- Sample size
- Bit depth
- Word size
- The term quantization takes its origin from
quantum physics - Electrons orbit an atoms nucleus in one of a
number of well-defined layers - An electron may be knocked from one layer to
another, but it can never stay between one of the
layers.
35Quantization
- In the discussion of sampling rate, we only
considered how often the amplitude of the wave
was measured. - We did not discuss how accurate these
measurements were. - The effectiveness of any measurement depends on
the precision of our ruler. (Measuring the
thickness of something with many small
indentations with a ruler only marking feet will
probably not give a very accurate measurement we
have to estimate many measurements.) - Just as there are limits to how often we can
sample, there are limits to the resolution of our
ruler.
36Quantization
- Like all numbers stored in computers, the
amplitude values are stored as binary numbers. - The value that gets stored is the closest
available binary number - akin to the nearest
marking on a ruler. - The accuracy of our measurement depends on how
many bits we have to represent these values. - Clearly, the more bits we have, the finer the
resolution of our ruler.
2 bits
Each change of bit represents a change in voltage
level
37Quantization
- Like all numbers stored in computers, the
amplitude values are stored as binary numbers. - The value that gets stored is the closest
available binary number - akin to the nearest
marking on a ruler. - The accuracy of our measurement depends on how
many bits we have to represent these values. - Clearly, the more bits we have, the finer the
resolution of our ruler.
3 bits
Each change of bit represents a change in voltage
level
38Quantization
- Like all numbers stored in computers, the
amplitude values are stored as binary numbers. - The value that gets stored is the closest
available binary number - akin to the nearest
marking on a ruler. - The accuracy of our measurement depends on how
many bits we have to represent these values. - Clearly, the more bits we have, the finer the
resolution of our ruler.
4 bits
Each change of bit represents a change in voltage
level
39Quantization
- CD audio uses 16-bit quantization.
40Quantization
- While aliasing is eliminated if our signal
contains no frequencies above the Nyquist
frequency, quantization error can never be
completely eliminated. - Every sample is within a margin of error that is
half the quantization level (the voltage change
represented by the least significant bit).
41Quantization
- For a sine wave signal represented with n bits,
the signal to error ratio is
S/E (dB) 6.02n 1.76
- The problem is that low-level signals do not use
all available bits, and therefore the error level
is greater.
42Quantization
- While quantization error may be masked at high
audio levels, it can become audible at low levels
Worst case a sine wave fluctuating within one
quantization increment is stored as a square wave
Thus, unlike the constant hissing noise of analog
recordings, quantization error is correlated with
the signal, and is thus a type of distortion,
rather than noise.
43Quantization
- The problem of quantization distortion is
addressed by dither. - Dither is low-level noise added to the audio
signal before it is sampled.
Low level audio signal with dither added
44Quantization
- Dither adds random errors to the signal,
therefore the quantization results in added
noise, rather than distortion. - The noise is a constant factor, not correlated
with the signal like quantization distortion. - The result is a noisy signal, rather than a
signal broken up by distortion.
45Quantization
- The auditory system averages the signal at all
times. We do not hear individual samples. - With dither, this averaging alows the musical
signal to co-exist with the noise, rather than be
temporarily eliminated due to distortion.
46Quantization
- Dither allows resolution below the least
significant quantization bit. - Without dither, digital recordings would be far
less satisfactory than analog recordings - a
plucked guitar string, for example, fades into
something close to a sine tone. Without dither, a
guitar sound would gradually turn into the sound
of a square wave. - With dither, there is significantly less noise in
digital recordings than in analog recordings.
47Quantization and Sampling Rate
- The sampling rate determines the signals
frequency content. - The number of quantization bits determines the
amount of quantization error.
48Size of Audio Files
x 2
x 2
x 60
10 MB/minute
44,100
samples per second
bytes per sample(16 bits)
channels(for stereo audio)
secondsper minute