Title: MPEG-4 Structured Audio
1MPEG-4 Structured Audio
- Eric D. Scheirereds_at_media.mit.edu
- Machine Listening GroupMIT Media Laboratory
- Editor, ISO 14496-3 (MPEG-4 Audio)
Project Bar-B-Q 1999Guadalupe River Ranch15 Oct
1999
2MPEG-4 Structured Audio,A New Standard for
Interactive Sound, in the Creation of Which Tom
White did not Run the Whole Show, but Only Played
a Small (Though Valuable) Part
- Eric D. Scheirereds_at_media.mit.edu
- Machine Listening GroupMIT Media Laboratory
- Editor, ISO 14496-3 (MPEG-4 Audio)
Project Bar-B-Q 1999Guadalupe River Ranch15 Oct
1999
3Whats this all about?
- MPEG-4 is not just about compression
- MPEG-4 shows one way for the IA world to move
beyond wavetable synthesis
4Overview
- What is MPEG?
- What is MPEG-4 Structured Audio?
- Why was it created?
- How does it work?
- How can it be used in IA applications?
- What is its current status?
- A brief note on MPEG-4 AudioBIFS
5Intellectual property in MPEG-4
- Structured Audio and AudioBIFS are free
- All patentable IP has been released to public
domain - No licensing or other costs to build tools
players - (Standard itself costs 300 for
printing/bureaucracy) - SA and AudioBIFS are open standards
- Companies competing through cooperation
- Interoperability makes the whole pie bigger
- MPEG processes for improving/correcting standard
- MIT has no veto over the future of the standard
6What is MPEG?
- MPEG is ISO/IEC JTC1 SC29 WG11
- A subcommittee of the Intl Standards
Organization - The Moving Pictures Experts Group
- MPEG-1 1993 (ISO 11172)
- Digital audio/video coding (MP3)
- MPEG-2 1994-7 (ISO 13818)
- Digital coding for broadcast
- MPEG-4 1998 (ISO 14496)
- Object based, synthetic/natural, interactive
coding
7MPEG Marketplace Model
MPEG Committee
MPEG Standard
Server-side tools makers
Client-side tools makers
Authoring tools
Playback tools
MPEGContent
Content developers
Content consumers
8MPEG Marketplace Model
MPEG Committee
This talk
MPEG Standard
Server-side tools makers
Client-side tools makers
Authoring tools
Playback tools
MPEGContent
Content developers
Content consumers
9MPEG Marketplace Model
MPEG Committee
MPEG Standard
Server-side tools makers
Client-side tools makers
The businessopportunities
Authoring tools
Playback tools
MPEGContent
Content developers
Content consumers
10MPEG-4 Audio
- High-quality sound
- Based on MPEG-AAC algorithm twice as good as MP3
- Low-bitrate sound
- For WWW and cellular speech/music as low as 4
kbps - Synthetic sound
- Interface to Text-to-Speech synthesizers
- High-quality audio synthesis with Structured
Audio - AudioBIFS
- Mix and postproduce multi-track sound streams
11MPEG-4 Structured Audio
- Transmit structured description of sound
- Use real-time synthesis to play sound
- PostScript for audio
- Based on new (to MPEG) technology
- SAOL New music synthesis language
- SASL New music control format
- A lot of related technology in academia
- Csound, Music-11, SynthScript, Nyquist, CLM, ...
12Standardization goals
- Provide synthetic sound in MPEG-4
- Bring algorithmic synthesis to wider community
- Standardize academic state-of-the-art dont
innovate - Get new companies to work on synthesis
- Implementation required for full MPEG-4 system
- Set a higher bar for PC sound architecture
- Drive forward the world of sound on PCs!
Stated goals
Secret goals
13MPEG-4 SA decoding process
SAOL Decoder
Bitstream header
Reconfigurable Synthesis Engine
Samples
SASL/MIDI Decoder
Bitstream
Control parameters
Multichannel high-quality audio
14What SAOL looks like
- A C-like language
- Based on the Music-N model
- Variables hold audio signals
- Unit generators do basic functions
- Instruments controlled by score or MIDI
instr beep(mp, vol) asig wave ksig env
table sig(harm,2048,1,1) wave
oscil(sig,cpsmidi(mp)) env
kline(0,dur0.05,vol, dur0.6,vol,
dur0.35,0) output(wave
env)
SAOL Structured Audio Orchestra Language
15SAOL capabilities
- Many nice features built in
- Wavetable manipulation FFT/IFFT
- Multitap delay lines Arrays of signals
- FIR IIR filters Effects routing
- Granular synthesis 3-D audio interface
- Dynamic layering and triggering
- SAOL is extensible-from-within
- (Allows encapsulation and structured programming)
- Any kind of synthesis can be used in SAOL
16Example
- Xanadu (Joseph Kung)
- 60 seconds long, 44 KHz stereo (10.5 MB as WAVE)
- 2.2 KB in header
- 4.2 KB in bitstream ( 0.07 kbps)
- No samples anywhere, only algorithmic synthesis
- More than 12001 compression, no loss of
quality - Could be controlled/restructured interactively
17MPEG-MMA relationship
- MIDI can control MPEG-4 SA synth
- SASL more flexible, more tightly coupled
- DLS-2 synthesis embedded in SA synth
- Do wavetable in series or parallel with other
techniques - Wavetable-only profile of MPEG-4
- MIDI DLS-2 compressed audio video (no SAOL)
- Logical path of progression from today to
tomorrow - Lots of help from MMA - appreciated!
- MPEG is ready to help in the other direction
(MIDI-DLA?)
18Applications ideas
- MPEG-4 is not an application!
- Its a tool - enables functionality and
interoperability - Implementations could be hardware, software, both
- Authoring tools also very important
- Use MPEG-4 SA like Staccato Synthcore
- Use MPEG-4 SA like Beatnik
- Use MPEG-4 SA like Koan
- Use MPEG-4 SA for new music applications
19Application example Gaming
MPEG-4algorithm andsample editors
MPEG-4 algorithm marketplace
MPEG-4 synthesis/effects algorithms
Startup
MPEG-4 enabledsound card
Host program (game)
Runtime
MPEG-4 MIDI controls
- Not just music -- parametric sound effects as
well - All audio programming and asset development in
SAOL - No host-language audio programming needed
- Host APIs (e.g. DirectMusic) can generate
controls Embedded MPEG-4 side can do this too,
if useful
Multichannel, 3-D, post-processed sound
20Current status
- Standard and reference software finished
- Many implementation projects starting
- Creative Tech Center Compression Interactive
Audio - Studer EPFL ThreeDSpace project
- Hobbyist projects (Java API, ActiveX plugin)
- Others Be Inc., Sseyo, Kings College, UC
Berkeley, - Catholic U. Leuven, Q-Team DE, Nokia,
... - 3 complete implementations already!
- A few authoring tools projects
- Active mailing list for developers
21A brief note on AudioBIFS
- BIFS is scene-description part of MPEG-4
- Binary Format for Scenes
- Based on VRML, but with many new features
- AudioBIFS is the audio mixing part
- Stream audio in multitrack format
- Deliver mixdown instructions in AudioBIFS
- Mixing, spatialization, effects in SAOL,
multichannel - Terminal-adaptive capability
- Candidate for PC DSP architecture?
22AudioBIFS - scene graph model
Sound
Attach sound to main scene (spatially position if
desired)
Create sound objectwith AudioBIFS (mixing,
filtering, reverb, etc)
AudioBIFSmanipulation
AudioSource
AudioSource
Inject sound into scene graph
NaturalDecoder
SyntheticDecoder
Decode into raw audio samples
Streaming compressed audio synthesis controls
23Summary
- MPEG-4 Structured Audio
- The international standard for algorithmic sound
synthesis - MPEG-4 AudioBIFS
- The international standard for audio
postproduction - New market opportunities for
- Hardware/software MPEG-4 players (embedded or
not) - Authoring tools (editors, sequencers)
- Advanced interactive audio content
24What was this all about?
- MPEG-4 is not just about compression
- MPEG-4 shows one way for the IA world to move
beyond wavetable synthesis
25For more information
- MPEG home page
- http//www.cselt.it/mpeg
- Requirements, future of MPEG
- MPEG-4 SA home page
- http//sound.media.mit.edu/mpeg4
- Draft standard, code, mailing lists, matchmaking
- Contact
- eds_at_media.mit.edu
- Slides, technical papers, discussion available