Title: Music Structure Discovery and Transcription of RealWorld Music
1Music Structure Discovery and Transcription of
Real-World Music
- Robert Turetsky
- rob_at_ee.columbia.edu
- Admitted Students Open House
- April 26, 2003
2Technology enables liquid music
Production
Consumption
Distribution
3Levels of Abstraction Musical Structure
- Musical structure guides our expectations based
on previous experience
4Structure Discovery Motivation
- Recommendation engines and artist discovery
- Query by segment/prototype (without metadata)
- Machine feedback/collaboration in composition
- Custom-tailored playlists / Auto DJ
- Improved audio feature extraction algorithm
performance (ie pitch extraction) - Machine Listener Predict changes in music just
like we do!
5Structure Why is it so tough to
find?Char/Word/Phrase Boundaries
Text
Video
Audio?
6Structure Why is it so tough to find?Signals to
Semantics
7Tool of the Trade Similarity Matrix
- Pioneered by Foote, 2001
- Measure self similarity of every window in a song
with every other window - Theory Windows of same section will have similar
features. Windows of different sections will
have features. - Off diagonal lines correspond to repeated
sections - Novelty Score - measure of newness
correlation with checkerboard matrix. - Section breaks are peaks in the Novelty Score.
i
j
cos(i, j)
Novelty Score
8Phrases Mining the SS-Matrix
- Off-Diagonals ? repeated segments
- Bartsch and Wakefield (2001)
- Assume Most repeated most important
- Shift and blur SS-matrix, look for vert. lines
- Dannenberg (2002) Find best path with DP along
promising off-diagonals
Segmentation Cure-Lovesong
9Raw Audio vs. Transcriptions
Raw Audio
Score
10Multi-Pitch Extraction Modus Ponens
- Untrained listeners can recognize single
pitches? Design single-pitch recognizers based
on Human Auditory System - Only experts can transcribe polyphonic audio
Expert recognize patterns? Design multi-pitch
extractor in a pattern classification framework
11Pitch Extraction Graphical Models
- Model of polyphonic frame
- Fit model to data
- Search model-space using MCMC
12Applications How to build a better pitch
extraction algorithm - Locally
- Idea Exploit large corpus of labeled raw audio.
I.e. Let the data do the talking! - Train classifier (e.g. neural net) with extracted
notes of real-world audio mixtures - Have decent estimate of algorithm performance
(N.E.R. - Note Error Rate) - Use MMI/relative entropy to reduce feature vector
dimensionality one classifier per note - Operates under the assumption that there are
vastly more good extractions then bad ones.
13MIDI Alignments Methodology
Note Extraction
Timing Ticks to Samples
Alignment DTW
MIDI
Synthesis
Raw
Feature Calc
Estimated Transcription of real audio
14Note Mapping MIDI to Raw
15Conclusion (Yay!)
- Structure Discovery is just the beginning
- Were _at_ the beginning of the beginning
- To be continued