Title: Methods for Automatic Music Structure Discovery
1Methods for Automatic Music Structure Discovery
- Robert Turetsky
- rob_at_ee.columbia.edu
- IEEE NNJ SMC Society
- December 19, 2002
2Can computers compose music?
- A 'program' which could produce brilliant music
would have to wander around the world on its own,
fighting its way through the maze of life and
feeling every moment of it. It would have to
understand the joy and loneliness of a chilly
night wind, the longing for a cherished hand, the
inaccessibility of a distant town, the heartbreak
and regeneration after a human death. - - Douglas Hofstadter, Godel Escher Bach
3Talk Organization
- Music structure discovery Motivation
- Tools of the Trade
- Discovery Techniques
- Liquid Music
4Technology enables liquid music
Production
Consumption
Distribution
5Structure Discovery Motivation
- Recommendation engines and artist discovery
- Query by segment/prototype (without metadata)
- Machine feedback/collaboration in composition
- Custom-tailored playlists / Auto DJ
- Improved audio feature extraction algorithm
performance (ie pitch extraction) - Machine Listener Predict changes in music just
like we do!
6Levels of Abstraction Musical Structure
- Musical structure guides our expectations based
on previous experience
7Structure Why is it so tough to
find?Char/Word/Phrase Boundaries
Text
Video
Audio?
8Structure Why is it so tough to find?Signals to
Semantics
9Automatic Methods for Music Structure Discovery
- Music structure discovery Motivation
- Tools of the Trade
- Discovery Techniques
- Future Directions
10Tools of the Trade MIDI
- MIDI Musical Instrument Digital Interface
- Developed in 1983 as protocol for synthesizers to
communicate/control each other - MIDI Files information about song (tempo, patch
names) and notes (timing, velocity) - Does not contain audio, just instructions to
synthesizer (real or virtual) - MIDI transcriptions of many songs available
online (hobby, karaoke)
11Tool of the Trade Similarity Matrix
- Pioneered by Foote, 2001
- Measure self similarity of every window in a song
with every other window - Theory Windows of same section will have similar
features. Windows of different sections will
have features. - Off diagonal lines correspond to repeated
sections - Novelty Score - measure of newness
correlation with checkerboard matrix. - Section breaks are peaks in the Novelty Score.
i
j
cos(i, j)
Novelty Score
12Tools Dynamic Programming
- Search for lowest cost path
- Algorithm is O(n2) instead of O(2n)
- Useful for string matching problems, i.e. gene
sequence matching, query-by-humming
Forward
Backwards
13Tools of the Trade HMMs
14Tools of the Trade Humdrum
- Given a musical score, what can we answer about a
song or style? - Humdrum Music notation and programmer toolkit
for musicologists - Identify French 6th chords, double 7th scale
degrees - Are German drinking songs likely to be in
triple-meter? - Identify pieces that end in tierce de picardie
- Compare pitch-class sets at beginning/end of
slurs - Compare frequency of dynamic swells vs. dynamic
dips - Adapted from http//www.music-cog.ohio-state.edu/
Humdrum/sample.problems.html
15Raw Audio vs. Transcriptions
- Current research and systems use either
- Raw audio data of performance (CD, .mp3)
- Transcription/Score Symbolic representation of
music (MIDI, Humdrum, pitch contour) - Examples Raw Audio (ISMIR 2002)
- Yang MACSIS Acoustic Indexing Framework for
Music Retrieval - Rauber et al Using Psycho-Acoustic Models and
SOMs to Create a Hierarchical Structuring of
Music - Dannenberg Pattern Discovery Techniques for
Music - Examples Symbolic
- Query-by-humming
- Humdrum
16Raw Audio vs. Transcriptions
Raw Audio
Score
17Automatic Methods for Music Structure Discovery
- Music structure discovery Motivation
- Tools of the Trade
- Discovery Techniques
- Liquid Music
18Structure Levels of Abstraction
- Phrase Segmentation
- Sequences Approach SS-Matrix
- States Approach Models
- Hybrid Technique
- Song
- Note
- Score
19Phrases Mining the SS-Matrix
- Off-Diagonals ? repeated segments
- Bartsch and Wakefield (2001)
- Assume Most repeated most important
- Shift and blur SS-matrix, look for vert. lines
- Dannenberg (2002) Find best path with DP along
promising off-diagonals
Segmentation Cure-Lovesong
20Phrases Model Based
- Logan, Chu (Compaq ? HP, 2000)
- Method 1 Clustering
- Join closest (KL) clusters until fixed
threshold reached - Method 2 HMM
- State transition matrix w/unsupervised Baum-Welsh
- Label each frame using Viterbi
- Gish Distance (1991), from speaker segmentation
- Feature MFCC (instrumentation transtitions)
- Two models one speaker, two speakers
21Phrase Detection Hybrid Method
- Peeters et al (IRCAM, the ISMIR 2002 site)
- 2 stage Approach
- 1st pass Detect distinct variations within song
- 2nd pass Group states (K-means) and choose
appropriate model with HMM
22Structure Levels of Abstraction
- Phrase Segmentation
- Song
- Genre Detection Playola
- Artist Simliarity Community Metadata
- Note
- Score
23Playola Music Similarity Browsing
- Ellis, Berenzweig (LabROSA), Lawrence (NEC),
Whitman (MIT). Patent pending. - Main Idea Anchor Models
- Each genre has a model in feature-space
- Each song is characterized by its closeness to
each genres model
24Playola Web Interface
Artists similar to
Ani DiFranco
Rage Againstthe Machine
25Artist Similarity Community Metadata
- Create correlation between human-entered semantic
descriptions and signal - Berenzweig, Whitman (MIT). Mine
allmusicguide.com, reviews, weblogs - Create models of descriptions
26Structure Levels of Abstraction
- Phrase Segmentation
- Song
- Note
- Graphical Models
- MIDI Alignment
- Score
27Multi-Pitch Extraction Modus Ponens
- Untrained listeners can recognize single
pitches? Design single-pitch recognizers based
on Human Auditory System - Only experts can transcribe polyphonic audio
Expert recognize patterns? Design multi-pitch
extractor in a pattern classification framework
28Pitch Extraction Graphical Models
- Model of polyphonic frame
- Fit model to data
- Search model-space using MCMC
29Graphical Models Problems
- Model Simplicity
- Manually keyed in by expert or statistician
- No human can enumerate all rules of music theory
- Mega-model
- One model for all styles of music
- Can take advantage of different features of
different styles
30Bridging the Gap MIDI Alignments
- Hypothesis By aligning MIDI transcriptions with
the raw data of performance, it is possible to
create music models based on both transcription
and raw audio data. - Dataset
- 35 MIDI transcriptions downloaded from various
internet sites, esp. http//www.musicrobot.com - Corresponding .mp3 files downloaded from KaZaA
filesharing network - Different genres represented New Wave, Rock,
Classical, Dance, Pop - All analysis done at 22.05 kHz, mono.
31MIDI Alignments Methodology
Note Extraction
Timing Ticks to Samples
Alignment DTW
MIDI
Synthesis
Raw
Feature Calc
Estimated Transcription of real audio
32Alignment Example New Wave
33Note Mapping MIDI to Raw
34Applications How to build a better pitch
extraction algorithm - Locally
- Idea Exploit large corpus of labeled raw audio.
I.e. Let the data do the talking! - Train classifier (e.g. neural net) with extracted
notes of real-world audio mixtures - Have decent estimate of algorithm performance
(N.E.R. - Note Error Rate) - Use MMI/relative entropy to reduce feature vector
dimensionality one classifier per note - Operates under the assumption that there are
vastly more good extractions then bad ones.
35Applications How to build a better pitch
extraction algorithm - Globally
- Graphical models for MPE Godsill et al, 1999.
- Incorporate knowledge of musical structure at
different levels to improve likelihood estimation - Their model currently very simple Frequencies
dont change much within a short-time window. - Our idea Learn the graphical model based on both
pitch information and spectral features. - Different models for different genres to exploit
unique characteristics Dance gt repetitive at
phrase level - Use inherent musical structure for more
information in ambiguous situations (can we
recognize inversion?) - Tune extractor for instruments present in genre
(piano)
36Structure Levels of Abstraction
- Phrase Segmentation
- Song
- Note
- Score
- Piano Roll -gt Score Clustering
- Main Melody Extraction
37Automatic Methods for Music Structure Discovery
- Music structure discovery Motivation
- Tools of the Trade
- Discovery Techniques
- Liquid Music
38Liquid Music The future
- Music will accompany every aspect of our lives
- Any song ever made, any time, anywhere
- Playlists will be constantly refreshed
- Adaptivity playlists/format seamlessly adapt to
mood/location (transcoding) - Kelly Music will be a verb not a noun.
- Fans can easily remix/interpret songs
- Linn Stars will be born on the Internet
39You cant beat free or can you?How to ensure
sales of little plastic discs in the face of
filesharing
- Offer things that cannot be downloaded
- High quality digital audio format
- Adaptive music, optimized for room/car/phones
- Password-protected access to special content
(e.g. online communities) - Large format artwork (i.e. posters)
- Licensed (official) remixes
- Individual tracks for remxing
40If you cant beat em, join emHow an industry
of middlemen can make money through filesharing
- Targeted Advertisements
- Direct access to every individuals personal
taste and preferences - Create mapping between songs and demographic
- Personalized Radio model
- Labels as music finding scouts
- Filter whats hot from the garbage
- Subscription services integrates with playlists
on each of users audio devices
41Special Advertising Section
42Conclusion (Yay!)
- Structure Discovery is just the beginning
- Were _at_ the beginning of the beginning
- To be continued