MPEG%204%20Structured%20Audio:

About This Presentation

Title:

MPEG%204%20Structured%20Audio:

Description:

Write a computer program that generates the desired audio stream. ... Parametric coders: Very-low bit rate coder, works best as. as a speech coder. MPEG 4 ... – PowerPoint PPT presentation

Number of Views:100

Avg rating:3.0/5.0

Slides: 40

Provided by: valueds268

Category:

more less

Transcript and Presenter's Notes

Title: MPEG%204%20Structured%20Audio:

1
MPEG 4 Structured Audio

Algorithmic Sound
for the Internet and Beyond

John Lazzaro John Wawrzynek Sep 1, 1999
CS Division University of California at
Berkeley www.cs.berkeley.edu/johnw
2
MPEG 4 Structured Audio

Outline
Motivation for structured audio
Introduction to MP4-SA
Example encoding
C translator
Physical Instrument Modeling
Hardware Architectures
Future directions

3
Digital Audio Basics

mono 705.6 kbps
Cell-phone network 5-10kbps
dialup modems 50 kpbs
xDSL 128 to 1000 kbps

How well does this work?
True Lossless 2.5X reduction
Shorten, T. Robinson (Cambridge University)
Perceptually Lossless 10X-20X reduction
MP3, Dolby AC3,

4
The Kolmogorov alternative

Write a computer program that generates the
desired audio stream.
Transmit the computer program.
To decode, execute the program.

Similar to Postscript!

MPEG-4 Structured Audio (MP4-SA) uses this
approach.
Final draft standard Nov 15, 1998.
Eric Schierer, Editor (MIT Media Lab).
http//sound.media.mit.edu/eds/mpeg4/

5
MP4-SA Encoding

may be a creative act writing a program.
directly (emacs), or
indirectly (GUI, webpage)
In this case, MP4-SA is a lossless compressor.
may be automatic -- given a sound, an encoder
writes a program that generates the sound.
Automatic encoding is a hard problem in the
general case.

6
Key Application Music Production

Modern Music Production is Computer based.
Musicians enter performances into computers as
control information, not audio waveforms.
Digital synthesizers, effects, and mixes create
the final audio, under engineer/producer control.

7
Key Application Music Production

Modern Music Production is Computer based.
Musicians enter performances into computers as
control information, not audio waveforms.
Digital synthesizers, effects, and mixes create
the final audio, under engineer/producer control.

Ideal format for collaborative productions,
remixes, ...
8
MPEG 4 Structured Audio

A binary file format that encodes
The programming language SAOL (say sail).
The musical score language SASL.
Legacy support for MIDI.
Audio sample data.
Result is normative an MP4-SA file will sound
identical on all compliant decoders.
Different from MIDI files.

9
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Structured Audio One component in the MPEG
audio standard.
ISO/IEC 14496-3 sec5
10
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Advanced Audio Coding successor to MP3, delivers
highest quality audio, and highest bit-rate.
11
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Time-Frequency Coding Meant for a moderate
bit/sec range, with moderate quality.
12
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Code Excited Linear Prediction Low bit rate
coder, works best as a speech coder.
13
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Parametric coders Very-low bit rate coder, works
best as as a speech coder.
14
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
Text-to-Speech Takes phonetic and prosadic
control information, produces syntesized
speech.
15
MPEG 4 Standard
MPEG 4
audio
system
video
Natural coding
Synthetic coding
SA
TTS
AAC
T/F
CELP
Parametric
System level includes mechanisms for composing
and synchronizing audio ( video) components.
16
Why SAOL and MP4-SA?Why not Java?

Musical performance have temporal structure that
changes over several timescales

Writing sound generation code in a conventional
language results in code dominated by time-scale
management.
Hard to maintain, hard to optimize.

17
Time management is built into SAOL.

A SAOL program executes by moving a simulated
clock forward in time, performing calculations
along the way in a synchronous fashion.
Work is scheduled to happen
at the a-rate (the audio sample rate)
at the k-rate (envelope control rate)
at the i-rate (rate for new notes)
Language variables are typed as a/k/i-rate.
A language statement is scheduled based on the
rate of the variables it contains.

18
SAOL, SASL, and Scheduling

Sound creation in MP4-SA can be compared to a
musician playing notes on an instrument.
A SAOL subprogram (called an instr or instrument)
serves as the instrument.
SASL commands (called score lines) act to play
notes on SAOL instruments.
Many instances of a SAOL instr can be active at
one time, making sounds corresponding to notes
launched by different score lines in a SASL file.

19
Single Note Execution Trace

SAOL Instruments ...
Contains all the
instructions for
playing a note
-- Code that runs
at note launch.
(once per i-pass)
-- Code that models
timbre evolution
at the k-rate.
(once per kpass)
-- Code to generate
audio samples at

Executing a Note (k-rate 4 kHz, a-rate 40
kHz) time(us)
pass 0
i-pass 0 k-pass 0
a-pass 25 a-pass 50 a-pass
... 225 a-pass 250 k-pass
250 a-pass 275 a-pass 300
a-pass ... 475 a-pass
500 k-pass 500 a-pass 525
a-pass ...
20
An example

SAOL instrument tone, that plays a gated sine
wave. (SAOL code in next slide.)

21
SAOL code for tone
instr tone (note, loudness) ivar a //
sets osc f ksig env // env output asig
x, y // osc state asig init a
2sin(3.141597cpsmidi(note)/s_rate) env
kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0) if
(init 0) // first a-pass only x
loudness init 1 x x - ay
// the FLOPS happen in y y ax //
these 3 statements output(yenv) //
creates audio output // end of instr tone
22
SAOL code for tone
instr tone (note, loudness) ivar a //
sets osc f ksig env // env output asig
x, y // osc state asig init a
2sin(3.141597cpsmidi(note)/s_rate) env
kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0) if
(init 0) // first a-pass only x
loudness init 1 x x - ay
// the FLOPS happen in y y ax //
these 3 statements output(yenv) //
creates audio output // end of instr tone
i-rate
23
SAOL code for tone
instr tone (note, loudness) ivar a //
sets osc f ksig env // env output asig
x, y // osc state asig init a
2sin(3.141597cpsmidi(note)/s_rate) env
kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0) if
(init 0) // first a-pass only x
loudness init 1 x x - ay
// the FLOPS happen in y y ax //
these 3 statements output(yenv) //
creates audio output // end of instr tone
k-rate
24
SAOL code for tone
instr tone (note, loudness) ivar a
// sets osc f ksig env // env output
asig x, y // osc state asig init a
2sin(3.141597cpsmidi(note)/s_rate) env
kline(0, 0.1, 0.5, dur-0.2, 0.5, 0.1, 0) if
(init 0) // first a-pass only x
loudness init 1 x x - ay
// the FLOPS happen in y y ax //
these 3 statements output(yenv) //
creates audio output // end of instr tone
a-rate
25
SAOL Unique Features

Rate semantics
i/k/a-rate execution
Vector arithmetic
ex ABC ? for i1,n AiBiCi
All floating-point arithmetic.
Extensive build-in audio function library
signal generators, table operators, pitch
converters, filters, fft, sample rate conversion,
effects, ...

26
SAOL Unique Features

Instrument communication through bus structures
Dynamic instrument creation and control.
Scheduler and language support for MIDI and SASL
scores.

27
Sfront - a SAOL-to-C translator

Converts MP4-SA files to a C program, that when
executed, produces audio.

Runs on UNIX, Win98/NT.
Licensed under the GNU public license (GPL).
www.cs.berkeley.edu/lazzaro/sa

28
Sfront Benchmarks
Sfront version 0.36 Machine 450 Mhz Pentium
III, 128 MB, gcc version egcs-2.91.66, -O3
optimizer Audio sample rate 44.1 kHz for all
examples MP3 compression ratio 11
29
Sfront Performance Summary

Rendering (file decoding)
Current performance a benchmark suite of
moderately complex MP4-SA streams computes in a
time equivalent to the audio it generates, on a
400 Mhz Ultrasparc 450 Mhz Pentium.
Real-time interaction
with a MIDI keyboard with acceptable latency (20
ms) and microphone input.

30
Interesting Issues

MP4-SA puts emphasis on sound synthesis methods
that can be described in a small amount of space.
Physical Modeling
good
Sampling Natural Instruments bad
If models are chosen carefully, compression
ratios of 100 to 10,000 are possible.
Physical Modeling is relatively immature, but
holds much promise.

31
Struck/Plucked Instrument Model
Examples struck bars, bells, drums, plucked
strings
Parameters striker characteristics, resonator
constants
32
Blown Instrument Model
Examples pipes, flutes, etc.
Parameters shape of non-linear function,
resonator constants
33
Physical Modeling Summary

Models instrument not sound.
Advantages over traditional synthesis techniques
(FM, sample-based)
Compact descriptions.
Physical parameterization leads to
more intuitive control
lower control bandwidth
State accurate simulation leads to
efficiency in re-excitation
emulation of otherwise missing effects
Ultimately - more realistic sounds.

34
Physical Modeling Summary (cont.)

Disadvantages
potential for high computational complexity
Approaches
PDE (partial differential equation) approach
would be nice, but probably not practical.
ODE (ordinary differential equation, lumped
circuit models) practical and very general.
Capture essential physics.
Wave-guide filters provide a more efficient
alternative in some cases.

35
Interesting Issues (cont.)

MP4-SA specifies that a decoder produces audio
that sounds identical to computing the program
accurately.
A new role for psychophysics
Instead of using psychophysics to squeeze bits
out of a sound representation, MP4-SA decoders
will use psychophysics to squeeze FLOPS out of
sound computations.
Leverage spectral and temporal masking.

36
Interesting Issues (cont.)

MP4-SA can be used in a way similar to
traditional compression except that the
compression method can be ad hoc
Frame-work for experimentation in encoding.
Hope for automatic encoding, if done in a voice
specific way
vocals
guitar
sax
and other hard-to-synthesize sounds.

37
Running SAOL on Conventional Architectures

Lessons Learned from SAOL development
Temporal typing of variables has the nice side
effect of marking the inner loops.
Typically, a-rate 10X to 100X k-rate
A-rate code optimization moving subexpressions
into k-rate or i-rate.
SAOL semantics support a static heap.
No recursion, all variables sp floats, no
pointers ... simplifies optimization.
Other researchers (Giorgio Zoia - ETH) focusing
on blocking all a-passes for an instance,
reducing overhead.
Processors with SIMD FP support (Intel SSE, AMD
3DNow!) will be a good match.

38
Fixed-Function Hardware for SAOL Accelerators

Unlike MPEG-2 chips, DVD chips, etc., its not
clear how MP4-SA can be accelerated by rolling an
ASIC.
Since every MP4-SA file is a new algorithm.
Common opcodes can be hardwired and the general
characteristics of typical MP4-SA files could be
leveraged to specialize a conventional processor
design.
But the language is only six months old
execution frequencies are not known.
Reconfigurable computing architectures might hold
promise (however, MP4-SA is all floating point).

39
Directions / Research Opportunities

Compiler optimizations for
SAOL and other languages with rate semantics
high-performance SIMD architectures
runtime code specialization
Runtime scheduling under limited compute
resources.
SAOL programming environments.
Physical modeling.
Automatic encoding.

Write a Comment

User Comments (0)

About PowerShow.com

MPEG%204%20Structured%20Audio: - PowerPoint PPT Presentation

MPEG%204%20Structured%20Audio:

Write a computer program that generates the desired audio stream. ... Parametric coders: Very-low bit rate coder, works best as. as a speech coder. MPEG 4 ... – PowerPoint PPT presentation