Timbre Similarity Work by Aucouturier - PowerPoint PPT Presentation

About This Presentation
Title:

Timbre Similarity Work by Aucouturier

Description:

Basic approach to quantifying timbre and timbre similarity ' ... Rolling Stones You Can't always get what you want (Pop/Blues) ... – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 22
Provided by: rebeccaf7
Category:

less

Transcript and Presenter's Notes

Title: Timbre Similarity Work by Aucouturier


1
Timbre Similarity Work by Aucouturier Pachet
  • Rebecca Fiebrink
  • MUMT 611
  • 3 March 2005

2
Presentation Overview
  • Pachet Aucouturier why timbre similarity?
  • Basic approach to quantifying timbre and timbre
    similarity
  • Finding songs that sound the same, 2002
  • The CUIDADO project
  • P As work in context
  • Practical and theoretical improvements, 2004
  • Remaining problems and future work

3
Who are they?
  • Sony Computer Science Library (CSL), Paris
  • François Pachet Music access and interaction,
    interestingness
  • Jean-Julien Aucouturier PhD student
  • A host of papers on music browsing, genre,
    metadata, segmentation,

4
Why timbre similarity?
  • Electronic Music Distribution (EMD) systems
  • Move from mass-market to individualized
    distribution
  • Collaborative filtering isnt sufficient
  • High-level, perceptually relevant descriptors
    play complementary / competing role allow for
    more interesting music browsing
  • Makes more sense than melodic similarity
  • Tied to genre, but not too tightly

5
How to quantify timbre?
  • High-level descriptor for an entire song or piece
  • Mel Frequency Cepstral Coefficients (MFCCs) are
    building blocks
  • Related to spectral envelope
  • First few coefficients account for timbre
    envelope later ones describe pitch
  • Derive a compact representation of a pieces MFCC
    space and a way to compare representations for
    two pieces

6
A Ps implementation (2002)
  • Find first 8 MFCCs every 50 ms
  • Model song as mixture of 3 Gaussian densities
    over all possible MFCCs of length 8 (GMM
    Gaussian mixture model)
  • Calculate distance between GMMs by sampling
  • Sample from one GMM, compute likelihood of the
    samples given the other GMM
  • Force symmetry and normalize
  • Use 1000 samples
  • Store GMM information for each song and calculate
    similarity matrix

7
Results of 2002 version
  • Same artist
  • Harpsichord pieces Bach - Wohltemperierte
    Clavier Fuga II in C minor and Bach
    Wohltemperierte Clavier - Praeludium IV in C
    sharp minor
  • Trip Hop Portishead - Mysterons (live) and
    Portishead - Sour Times
  • Different artists, same genre
  • Harpsichord pieces Bach - Das Wohltemperierte
    Clavier - Praeludium IV in C sharp minor BWV849
    and Couperin Gavotte
  • "Woman Rock Singer" Leah Andreone - It's OK and
    Meredith Brooks Bitch
  • Interesting results
  • Classical and Pop" Beethoven - Romanze fur
    Violine und Orchester Nr. 2 F-dur op.50 and
    Beatles - Eleanor Rigby
  • "Trip Hop" and "Celtic Folk " Portishead -
    Mysterons and Alan Stivell - Arvor You. (same
    kind of harpy theremin-like ambiance)

8
Evaluating results
  • No ground truth exists
  • Similarity is subjective
  • People dont hear timbre alone
  • Survey of 10 people Is A more like B or C?
  • Algorithm matches people 80 of time
  • One view Divergence from expectation makes it
    useful

9
Generating aha!
  • Produce interesting matches when genre and
    timbre are not correlated
  • Allow user control over size of Aha!
    exploration

10
Using the measure CUIDADO
  • Content-based Unified Interfaces and
    Descriptors for Audio and Music Databases
    available Online
  • 2001-2003 European research project
  • aims at developing a new chain of applications
    through the use of audio/music content
    descriptors, in the spirit of the MPEG-7
    standard
  • design of appropriate description structures
  • development of extractors for deriving high-level
    information from audio signals
  • design and implementation of two applications
    the Sound Palette and the Music Browser
  • (From the CUIDADO website)

11
CUIDADO Music Browser
  • Client/server architecture for music browsing
  • Target audience casual music lover
  • 17,075 popular music titles with metadata

Picture from The CUIDADO project
12
Music Browser Query Panel
Picture from Popular music access
13
Using Timbre in the Music Browser
  • Nearest-neighbor search
  • Find me something that sounds like this song
  • Allow user control over size of exploration Aha
    slider
  • ? Same artist Same genre interesting ?
  • Playlist generation
  • Example
  • 1- Timbre continuity throughout the sequence
  • 2- Genre Cardinality 30 Rock, 30 Folk, 30Pop
  • 3- Genre Distribution the titles of the same
    genre should be as separated as possible

14
Sample playlist
  • Arlo Guthrie City Of New Orleans (Folk/Rock)
  • Belle Sebastien The boy done wrong again
    (Rock/Alternative)
  • Ben Harper Pleasure Pain (Pop/Blues)
  • Joni Mitchell Borderline (Folk/Pop)
  • Badly Drawn Boy Camping Next to Water
    (Rock/Alternative)
  • Rolling Stones You Cant always get what you
    want (Pop/Blues)
  • Nick Drake - One of these things first (Folk/Pop)
  • Radiohead - Motion Picture Soundtrack (Rock/Brit)
  • The Beatles - Mother Nature's Son (Pop/Brit)
  • Tracy Chapman - Talkin' about a Revolution
    (Rock/Folk)

15
Work in Context
  • Several other researchers also use MFCCs with
    reasonable results Baumann 2003, Berenzweig et
    al. 2002, Foote 1997, Kulesh 2003, Logan and
    Salomon 2001,
  • Pampalk, Dixon, and Widmer 2003
  • P As work is relatively accurate
  • Implementation is relatively slow
  • Incorporating use of 1st MFCC integrates average
    dynamic level into results
  • Hard to compare one groups work with anothers
  • Hard to propose future research directions beyond
    parameter tweaking

16
Practical Theoretical Improvements, 2004
  • A P conducted extensive tests varying
    algorithms and parameters of 2002 system
  • Can optimal parameter settings be found?
  • What is the limit on improvement?
  • Evaluate in the context of CUIDADO Music Browser

17
Optimal parameter values
  • Signal sample rate higher is better
  • Distance sample rate (used to compare GMMs)
    higher is better, but little improvement over
    1000
  • Sampling can perform as well as Earth Movers
    distance (EMD)
  • The number of MFCCs and the number of components
    in the GMM jointly affect the outcome
  • 50 components and 20 MFCCs is optimal
  • components can be reduced without hurting
    performance much
  • 30 ms is optimum window size
  • Adhering to above guidelines leads to absolute
    improvement of 16 to precision
  • Precision is underestimated considers same-genre
    only

18
Alternative algorithms
  • Several speech-processing algorithms were tried
  • Mixed results
  • No drastic improvements 2 additional precision
    at most
  • HMM instead of GMM offers no improvement

19
Conclusions of 2004 study
  • Ceiling of 65 precision (conservative
    estimate)
  • False positives remain a problem
  • Jimi Hendrix ! Joni Mitchell
  • Due to hubs in nearest-neighbor space
  • Problems are inherent in approach itself?

20
Proposals for future work
  • Address perception of timbre
  • Some frames are more important than others
  • Some timbres more salient than others
  • People assess similarity by choosing This sounds
    like X or This doesnt sound like X

21
Conclusions
  • High-level, perceptually based similarity has a
    place in electronic music distribution
  • Current systems for timbre similarity have some
    use
  • There is still room for new, innovative, and
    cross-disciplinary work
Write a Comment
User Comments (0)
About PowerShow.com