Methods for Automatic Music Structure Discovery - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Methods for Automatic Music Structure Discovery

Description:

Raw audio data of performance (CD, .mp3) ... Corresponding .mp3 files downloaded ... You can't beat free... or can you? How to ensure sales of little plastic ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 43
Provided by: robt
Category:

less

Transcript and Presenter's Notes

Title: Methods for Automatic Music Structure Discovery


1
Methods for Automatic Music Structure Discovery
  • Robert Turetsky
  • rob_at_ee.columbia.edu
  • IEEE NNJ SMC Society
  • December 19, 2002

2
Can computers compose music?
  • A 'program' which could produce brilliant music
    would have to wander around the world on its own,
    fighting its way through the maze of life and
    feeling every moment of it. It would have to
    understand the joy and loneliness of a chilly
    night wind, the longing for a cherished hand, the
    inaccessibility of a distant town, the heartbreak
    and regeneration after a human death.
  • - Douglas Hofstadter, Godel Escher Bach

3
Talk Organization
  • Music structure discovery Motivation
  • Tools of the Trade
  • Discovery Techniques
  • Liquid Music

4
Technology enables liquid music
Production
Consumption
Distribution
5
Structure Discovery Motivation
  • Recommendation engines and artist discovery
  • Query by segment/prototype (without metadata)
  • Machine feedback/collaboration in composition
  • Custom-tailored playlists / Auto DJ
  • Improved audio feature extraction algorithm
    performance (ie pitch extraction)
  • Machine Listener Predict changes in music just
    like we do!

6
Levels of Abstraction Musical Structure
  • Musical structure guides our expectations based
    on previous experience

7
Structure Why is it so tough to
find?Char/Word/Phrase Boundaries
Text
Video
Audio?
8
Structure Why is it so tough to find?Signals to
Semantics
9
Automatic Methods for Music Structure Discovery
  • Music structure discovery Motivation
  • Tools of the Trade
  • Discovery Techniques
  • Future Directions

10
Tools of the Trade MIDI
  • MIDI Musical Instrument Digital Interface
  • Developed in 1983 as protocol for synthesizers to
    communicate/control each other
  • MIDI Files information about song (tempo, patch
    names) and notes (timing, velocity)
  • Does not contain audio, just instructions to
    synthesizer (real or virtual)
  • MIDI transcriptions of many songs available
    online (hobby, karaoke)

11
Tool of the Trade Similarity Matrix
  • Pioneered by Foote, 2001
  • Measure self similarity of every window in a song
    with every other window
  • Theory Windows of same section will have similar
    features. Windows of different sections will
    have features.
  • Off diagonal lines correspond to repeated
    sections
  • Novelty Score - measure of newness
    correlation with checkerboard matrix.
  • Section breaks are peaks in the Novelty Score.

i
j
cos(i, j)
Novelty Score
12
Tools Dynamic Programming
  • Search for lowest cost path
  • Algorithm is O(n2) instead of O(2n)
  • Useful for string matching problems, i.e. gene
    sequence matching, query-by-humming

Forward
Backwards
13
Tools of the Trade HMMs
14
Tools of the Trade Humdrum
  • Given a musical score, what can we answer about a
    song or style?
  • Humdrum Music notation and programmer toolkit
    for musicologists
  • Identify French 6th chords, double 7th scale
    degrees
  • Are German drinking songs likely to be in
    triple-meter?
  • Identify pieces that end in tierce de picardie
  • Compare pitch-class sets at beginning/end of
    slurs
  • Compare frequency of dynamic swells vs. dynamic
    dips
  • Adapted from http//www.music-cog.ohio-state.edu/
    Humdrum/sample.problems.html

15
Raw Audio vs. Transcriptions
  • Current research and systems use either
  • Raw audio data of performance (CD, .mp3)
  • Transcription/Score Symbolic representation of
    music (MIDI, Humdrum, pitch contour)
  • Examples Raw Audio (ISMIR 2002)
  • Yang MACSIS Acoustic Indexing Framework for
    Music Retrieval
  • Rauber et al Using Psycho-Acoustic Models and
    SOMs to Create a Hierarchical Structuring of
    Music
  • Dannenberg Pattern Discovery Techniques for
    Music
  • Examples Symbolic
  • Query-by-humming
  • Humdrum

16
Raw Audio vs. Transcriptions
Raw Audio
Score
17
Automatic Methods for Music Structure Discovery
  • Music structure discovery Motivation
  • Tools of the Trade
  • Discovery Techniques
  • Liquid Music

18
Structure Levels of Abstraction
  • Phrase Segmentation
  • Sequences Approach SS-Matrix
  • States Approach Models
  • Hybrid Technique
  • Song
  • Note
  • Score

19
Phrases Mining the SS-Matrix
  • Off-Diagonals ? repeated segments
  • Bartsch and Wakefield (2001)
  • Assume Most repeated most important
  • Shift and blur SS-matrix, look for vert. lines
  • Dannenberg (2002) Find best path with DP along
    promising off-diagonals

Segmentation Cure-Lovesong
20
Phrases Model Based
  • Logan, Chu (Compaq ? HP, 2000)
  • Method 1 Clustering
  • Join closest (KL) clusters until fixed
    threshold reached
  • Method 2 HMM
  • State transition matrix w/unsupervised Baum-Welsh
  • Label each frame using Viterbi
  • Gish Distance (1991), from speaker segmentation
  • Feature MFCC (instrumentation transtitions)
  • Two models one speaker, two speakers

21
Phrase Detection Hybrid Method
  • Peeters et al (IRCAM, the ISMIR 2002 site)
  • 2 stage Approach
  • 1st pass Detect distinct variations within song
  • 2nd pass Group states (K-means) and choose
    appropriate model with HMM

22
Structure Levels of Abstraction
  • Phrase Segmentation
  • Song
  • Genre Detection Playola
  • Artist Simliarity Community Metadata
  • Note
  • Score

23
Playola Music Similarity Browsing
  • Ellis, Berenzweig (LabROSA), Lawrence (NEC),
    Whitman (MIT). Patent pending.
  • Main Idea Anchor Models
  • Each genre has a model in feature-space
  • Each song is characterized by its closeness to
    each genres model

24
Playola Web Interface
Artists similar to
Ani DiFranco
Rage Againstthe Machine
25
Artist Similarity Community Metadata
  • Create correlation between human-entered semantic
    descriptions and signal
  • Berenzweig, Whitman (MIT). Mine
    allmusicguide.com, reviews, weblogs
  • Create models of descriptions

26
Structure Levels of Abstraction
  • Phrase Segmentation
  • Song
  • Note
  • Graphical Models
  • MIDI Alignment
  • Score

27
Multi-Pitch Extraction Modus Ponens
  • Untrained listeners can recognize single
    pitches? Design single-pitch recognizers based
    on Human Auditory System
  • Only experts can transcribe polyphonic audio
    Expert recognize patterns? Design multi-pitch
    extractor in a pattern classification framework

28
Pitch Extraction Graphical Models
  • Model of polyphonic frame
  • Fit model to data
  • Search model-space using MCMC

29
Graphical Models Problems
  • Model Simplicity
  • Manually keyed in by expert or statistician
  • No human can enumerate all rules of music theory
  • Mega-model
  • One model for all styles of music
  • Can take advantage of different features of
    different styles

30
Bridging the Gap MIDI Alignments
  • Hypothesis By aligning MIDI transcriptions with
    the raw data of performance, it is possible to
    create music models based on both transcription
    and raw audio data.
  • Dataset
  • 35 MIDI transcriptions downloaded from various
    internet sites, esp. http//www.musicrobot.com
  • Corresponding .mp3 files downloaded from KaZaA
    filesharing network
  • Different genres represented New Wave, Rock,
    Classical, Dance, Pop
  • All analysis done at 22.05 kHz, mono.

31
MIDI Alignments Methodology
Note Extraction
Timing Ticks to Samples
Alignment DTW
MIDI
Synthesis
Raw
Feature Calc
Estimated Transcription of real audio
32
Alignment Example New Wave
33
Note Mapping MIDI to Raw
34
Applications How to build a better pitch
extraction algorithm - Locally
  • Idea Exploit large corpus of labeled raw audio.
    I.e. Let the data do the talking!
  • Train classifier (e.g. neural net) with extracted
    notes of real-world audio mixtures
  • Have decent estimate of algorithm performance
    (N.E.R. - Note Error Rate)
  • Use MMI/relative entropy to reduce feature vector
    dimensionality one classifier per note
  • Operates under the assumption that there are
    vastly more good extractions then bad ones.

35
Applications How to build a better pitch
extraction algorithm - Globally
  • Graphical models for MPE Godsill et al, 1999.
  • Incorporate knowledge of musical structure at
    different levels to improve likelihood estimation
  • Their model currently very simple Frequencies
    dont change much within a short-time window.
  • Our idea Learn the graphical model based on both
    pitch information and spectral features.
  • Different models for different genres to exploit
    unique characteristics Dance gt repetitive at
    phrase level
  • Use inherent musical structure for more
    information in ambiguous situations (can we
    recognize inversion?)
  • Tune extractor for instruments present in genre
    (piano)

36
Structure Levels of Abstraction
  • Phrase Segmentation
  • Song
  • Note
  • Score
  • Piano Roll -gt Score Clustering
  • Main Melody Extraction

37
Automatic Methods for Music Structure Discovery
  • Music structure discovery Motivation
  • Tools of the Trade
  • Discovery Techniques
  • Liquid Music

38
Liquid Music The future
  • Music will accompany every aspect of our lives
  • Any song ever made, any time, anywhere
  • Playlists will be constantly refreshed
  • Adaptivity playlists/format seamlessly adapt to
    mood/location (transcoding)
  • Kelly Music will be a verb not a noun.
  • Fans can easily remix/interpret songs
  • Linn Stars will be born on the Internet

39
You cant beat free or can you?How to ensure
sales of little plastic discs in the face of
filesharing
  • Offer things that cannot be downloaded
  • High quality digital audio format
  • Adaptive music, optimized for room/car/phones
  • Password-protected access to special content
    (e.g. online communities)
  • Large format artwork (i.e. posters)
  • Licensed (official) remixes
  • Individual tracks for remxing

40
If you cant beat em, join emHow an industry
of middlemen can make money through filesharing
  • Targeted Advertisements
  • Direct access to every individuals personal
    taste and preferences
  • Create mapping between songs and demographic
  • Personalized Radio model
  • Labels as music finding scouts
  • Filter whats hot from the garbage
  • Subscription services integrates with playlists
    on each of users audio devices

41
Special Advertising Section
42
Conclusion (Yay!)
  • Structure Discovery is just the beginning
  • Were _at_ the beginning of the beginning
  • To be continued
Write a Comment
User Comments (0)
About PowerShow.com