Musical Genre Classification - PowerPoint PPT Presentation

About This Presentation
Title:

Musical Genre Classification

Description:

Country of Origin ('Japanese music' ... Soltau et al 1998 'Recognition of Music Types' ... Have users rate their music, match like-tasted users, recommend ... – PowerPoint PPT presentation

Number of Views:3741
Avg rating:3.0/5.0
Slides: 37
Provided by: noug
Category:

less

Transcript and Presenter's Notes

Title: Musical Genre Classification


1
Musical Genre Classification
  • Prepared by Elliot Sinyor
  • for MUMT 611
  • March 3, 2005

2
Table of Contents
  • What is Genre?
  • Approaches to Genre Classification
  • Manual
  • Automatic
  • Related Work
  • Soltau 1998
  • Tzanetakis Cook
  • prescriptive approach
  • Pachet et al. 2001
  • emergent approach
  • Conclustion

3
What is Genre?
  • A way of describing what an item shares with
    other items as well as what differentiates it
    from other items
  • From Aucouturier and Pachet
  • The genesis of genre is therefore to be found in
    our natural and irrepressible tendency to
    classify

4
What is Genre?
  • AP separate into two broad categories
  • Intentional vs. Extensional

5
What is Genre? - Intentional
  • More subjective
  • Relies on collective cultural knowledge
  • Social/Historical context
  • Eg 60s, hippies, brit-pop

6
Problems with Genre
  • What do the names mean?
  • Rock? Pop?
  • No fixed semantics
  • Amazon.com Genres by
  • Period (60s pop)
  • Topic (love song)
  • Country of Origin (Japanese music)
  • Genre is based on extrinsic habits rather than
    intrinsic properties
  • To a French person C. Aznavour Variety
  • To an English person C. Aznavour French

7
What is Genre? - Extensional
  • Analysis-based
  • Describes the music itself
  • Tempo, timbre, pitch, language, etc.
  • (sometimes) easier for automatic genre
    classification systems
  • Eg fast rock, mellow classical.

8
Problems with Genre
  • What granularity to use?
  • By Artist?
  • Please Please Me vs. Sgt. Pepper
  • By Album?
  • Revolution 9 vs. Helter Skelter vs. Mother
    Natures Son
  • Does work for broad categories
  • Rock vs. Classical

9
Problems with Genre
  • Does anyone agree?
  • Allmusic.com 531 genres
  • Amazon.com 719 genres
  • Mp3.com 430 genres
  • Only 70 words common to the three taxonomies
    (Pachet and Cazaly 2000)

10
Approaches to Genre Classification
  • Manual
  • Musicologists and Elbow Grease
  • Automatic
  • Prescriptive
  • Signal Analysis based
  • Emergent
  • Uses existing human-entered meta-data to group
    things together

11
Manual Classification
  • Dannenberg et al. 2001
  • To build a taxonomy for MSN Music Search Engine
  • Few hundred thousand songs
  • Hired full-time musicologists
  • Took 30 human years
  • The details of the taxonomy and the design
    methodology are, however, not available

12
Manual Classification
  • Pachet and Cazaly 2001 (CUIDADO)
  • Separated descriptors country, instrumentation,
    artist type, etc
  • _____ Rock
  • Too sensitive to musical evolution, difficult to
    build, difficult to maintain
  • Changed focus to artists instead of titles.
  • In any case, insufficient for millions of titles

13
Prescriptive History
  • Originated from Speech Recognition work
  • Most Classified audio from TV into
    music/speech/environmental

14
Prescriptive Various Approches
  • Saunders 1996
  • Thresholding/ZCR techniques
  • Scheirer and Slaney 1997
  • Multiple features and statistical pattern
    recognition
  • Kimber and Wilcox 1996
  • MFCCs and HMM to classify into music, speech,
    laughter and nonspeech
  • Zhang and Kuo 2001
  • Rule-based system for classifying audio from
    movies and TV into
  • Non-music
  • Pure speech, non harmonic environmental sound
  • Music
  • Harmonic environmental sound, pure music, song,
    speech with music, environmental sound with music

15
Prescriptive
  • Soltau et al 1998 Recognition of Music Types
  • New approach Explicit Time Modelling with
    Neural Network (ETM-NN)

16
Prescriptive Soltau et al. 1998
  • In a nutshell
  • Transform acoustic signal into sequence of
    abstract sonic events
  • Look at statistical patterns derived from
    sequences ? combine into vectors that represent
    temporal structure
  • 3-layer feed-forward network

17
Prescriptive Soltau et al. 1998
  • Experimental Results
  • 3 hours of data (360 samples, 30 sec each)
  • Rock, Pop, Techno, Classical
  • 67 training, 13 cross-validation, 20
    evaluation
  • Compare ETM-NN vs. HMM, using cepstral
    coefficients
  • ETM-NN 86.1 HMM 79.2

18
Musical Genre Classification of Audio Signals
Tzanetakis and Cook, 2002
  • Timbral Texture Features
  • Spectral Centroid, Rolloff, Flux, ZCR, MFCC (5
    coefficients)
  • Analysis Window features should be stable 23
    ms
  • Texture Window minimum amount of time to
    identify a 'texture 43 analysis windows, 1 sec.
  • Memory of the past
  • Statistics (means, variances) of features over
    the texture window

19
Musical Genre Classification of Audio Signals
Tzanetakis and Cook, 2002
  • Timbral Texture Features
  • Spectral Centroid, Rolloff, Flux, ZCR, MFCC (5
    coefficients)
  • Analysis Window features should be stable 23
    ms
  • Texture Window minimum amount of time to
    identify a 'texture 43 analysis windows, 1 sec.
  • Memory of the past

20
Timbral Texture Feature Vector
  • Statistics (means, variances) of features over
    the texture window
  • 19 dimensions
  • (m, v) of SC, SF, SR, ZCR, 5 MFCC
  • low energy feature fraction of analysis windows
    over texture window that have less than average
    RMS energy
  • Eg vocal music will have more silences

21
Rhythmic Content Beat Histogram
  • Pitch detection with larger periods
  • Use DWT to divide signal into frequency bands

22
Rhythmic Content Beat Histogram
23
Features taken from BH
  • A0, A1 relative amplitude (divided by the sum of
    amplitudes) of the first, and second histogram
    peak
  • RA ratio of the amplitude of the second peak
    divided by the amplitude of the first peak
  • P1, P2 period of the first, second peak in bpm
  • SUM overall sum of the histogram (indication of
    beat strength).

24
Pitch Content Features
  • Used enhanced Autocorrelation function to create
    folded (1 octave) and unfolded (all notes) pitch
    histograms
  • Mapped to MIDI note numbers
  • Folded- common pitch classes
  • Unfolded pitch range
  • Higher for jazz, classical
  • FA0, UP0, UP1, IPO1 (interval between 2 highest
    peaks), SUM

25
Experimental Results
  • Used GMM classifiers with diagonal covariance
    matrices

26
Experimental Results
27
Prescriptive Some Results (from AP)
  • Gaussian and Gaussian Mixture Models, used in 48
    of successful classification in Ermolinskiy et
    al.(2001) using 100 songs for each class in the
    training phase. This result has to be taken with
    care since the system uses only pitch
    information.
  • Tzanetakis et al. (2001) achieves a rather
    disappointing 57, but also reports 75 in
    Tzanetakis and Cook (2000a) using 50 songs per
    class.
  • 90 in Lambrou and Sandler (1998) and 75 in
    Deshpande et al. (2001) on a very small training
    and test set, which may not be representative.
  • Pye (2000) reports 90 on a total set of 175
    songs.
  • Soltau (1998) reports 80 with HMM, 86 with NN,
    with a database of 360 songs.

28
Emergent
  • Unlike Prescriptive, it is unsupervised
  • Based on cultural similarity from text
    documents
  • Possible to extract similarities that are not
    possible to extract from the audio signal

29
Emergent Collaborative Filtering
  • Shardanand Maes 1995, Pestoni et al. 2001
  • There are patterns in tastes
  • Have users rate their music, match like-tasted
    users, recommend unknown items to users
  • Problems
  • Good for naïve profiles, bad for broad, eclectic
    tastes
  • Favors middle of the road liked by large
    proportion
  • Only works some time after release of new music

30
Emergent co-concurrent analysis
  • Pachet et al. 2001
  • Looks at online text sources for co-occurrences
    of songs (aka data mining)
  • If 2 items appear in the same context (or share a
    common neighbour), this is evidence of some sort
    of similarity

31
Co-occurrence
  • Pachet et al. 2001 Musical Data Mining for
    Electronic Music Distribution
  • Sources used
  • Track listing databases (CDDB)
  • Mostly look at compilations of similar artists
  • Radio Show playlists
  • Specialty programs better than daily commercial
    radio
  • Lists made by experts

32
Co-occurrence
  • Build a matrix where
  • Value of entry (i, j) corresponds to number of
    times title i co-occurs with title j
  • What about indirect co-occurrence?
  • Eg Eleanor Rigby/Good Vibrations, Good
    Vibrations/God Only Knows ? Eleanor Rigby God
    Only Knows
  • Correlation measure, using co-variance matrices
    of each title

33
Experimental Results
  • Using distance functions, use Ascendant
    Hierarchical Clustering
  • Used CDDB database, compared co-occurrence vs
    correlation
  • Manually examined results
  • 70 of clusters had interesting similarities

34
Experimental Results
35
Challenges
  • Name format is not strictly enforced
  • The Beatles Beatles, The Beatles
  • Difficult to characterize the nature of the
    similarities
  • Cover songs can sound nothing alike

36
Conclusions and Future directions
  • It seems that samples of Techno and Classical
    are easy to discriminate Rock and Pop seems to
    be more difficult Soltau et al 1998
  • Manual classification not feasible
  • Why not combine prescriptive/emergent techniques?
Write a Comment
User Comments (0)
About PowerShow.com