Title: Musical Genre Classification
1Musical Genre Classification
- Prepared by Elliot Sinyor
- for MUMT 611
- March 3, 2005
2Table of Contents
- What is Genre?
- Approaches to Genre Classification
- Manual
- Automatic
- Related Work
- Soltau 1998
- Tzanetakis Cook
- prescriptive approach
- Pachet et al. 2001
- emergent approach
- Conclustion
3What is Genre?
- A way of describing what an item shares with
other items as well as what differentiates it
from other items - From Aucouturier and Pachet
- The genesis of genre is therefore to be found in
our natural and irrepressible tendency to
classify
4What is Genre?
- AP separate into two broad categories
- Intentional vs. Extensional
-
5What is Genre? - Intentional
- More subjective
- Relies on collective cultural knowledge
- Social/Historical context
- Eg 60s, hippies, brit-pop
6Problems with Genre
- What do the names mean?
- Rock? Pop?
- No fixed semantics
- Amazon.com Genres by
- Period (60s pop)
- Topic (love song)
- Country of Origin (Japanese music)
- Genre is based on extrinsic habits rather than
intrinsic properties - To a French person C. Aznavour Variety
- To an English person C. Aznavour French
7What is Genre? - Extensional
- Analysis-based
- Describes the music itself
- Tempo, timbre, pitch, language, etc.
- (sometimes) easier for automatic genre
classification systems - Eg fast rock, mellow classical.
8Problems with Genre
- What granularity to use?
- By Artist?
- Please Please Me vs. Sgt. Pepper
- By Album?
- Revolution 9 vs. Helter Skelter vs. Mother
Natures Son - Does work for broad categories
- Rock vs. Classical
9Problems with Genre
- Does anyone agree?
- Allmusic.com 531 genres
- Amazon.com 719 genres
- Mp3.com 430 genres
- Only 70 words common to the three taxonomies
(Pachet and Cazaly 2000)
10Approaches to Genre Classification
- Manual
- Musicologists and Elbow Grease
- Automatic
- Prescriptive
- Signal Analysis based
- Emergent
- Uses existing human-entered meta-data to group
things together
11Manual Classification
- Dannenberg et al. 2001
- To build a taxonomy for MSN Music Search Engine
- Few hundred thousand songs
- Hired full-time musicologists
- Took 30 human years
- The details of the taxonomy and the design
methodology are, however, not available
12Manual Classification
- Pachet and Cazaly 2001 (CUIDADO)
- Separated descriptors country, instrumentation,
artist type, etc - _____ Rock
- Too sensitive to musical evolution, difficult to
build, difficult to maintain - Changed focus to artists instead of titles.
- In any case, insufficient for millions of titles
13Prescriptive History
- Originated from Speech Recognition work
- Most Classified audio from TV into
music/speech/environmental
14Prescriptive Various Approches
- Saunders 1996
- Thresholding/ZCR techniques
- Scheirer and Slaney 1997
- Multiple features and statistical pattern
recognition - Kimber and Wilcox 1996
- MFCCs and HMM to classify into music, speech,
laughter and nonspeech - Zhang and Kuo 2001
- Rule-based system for classifying audio from
movies and TV into - Non-music
- Pure speech, non harmonic environmental sound
- Music
- Harmonic environmental sound, pure music, song,
speech with music, environmental sound with music
15Prescriptive
- Soltau et al 1998 Recognition of Music Types
- New approach Explicit Time Modelling with
Neural Network (ETM-NN)
16Prescriptive Soltau et al. 1998
- In a nutshell
- Transform acoustic signal into sequence of
abstract sonic events - Look at statistical patterns derived from
sequences ? combine into vectors that represent
temporal structure - 3-layer feed-forward network
17Prescriptive Soltau et al. 1998
- Experimental Results
- 3 hours of data (360 samples, 30 sec each)
- Rock, Pop, Techno, Classical
- 67 training, 13 cross-validation, 20
evaluation - Compare ETM-NN vs. HMM, using cepstral
coefficients - ETM-NN 86.1 HMM 79.2
18Musical Genre Classification of Audio Signals
Tzanetakis and Cook, 2002
- Timbral Texture Features
- Spectral Centroid, Rolloff, Flux, ZCR, MFCC (5
coefficients) - Analysis Window features should be stable 23
ms - Texture Window minimum amount of time to
identify a 'texture 43 analysis windows, 1 sec. - Memory of the past
- Statistics (means, variances) of features over
the texture window
19Musical Genre Classification of Audio Signals
Tzanetakis and Cook, 2002
- Timbral Texture Features
- Spectral Centroid, Rolloff, Flux, ZCR, MFCC (5
coefficients) - Analysis Window features should be stable 23
ms - Texture Window minimum amount of time to
identify a 'texture 43 analysis windows, 1 sec. - Memory of the past
20Timbral Texture Feature Vector
- Statistics (means, variances) of features over
the texture window - 19 dimensions
- (m, v) of SC, SF, SR, ZCR, 5 MFCC
- low energy feature fraction of analysis windows
over texture window that have less than average
RMS energy - Eg vocal music will have more silences
21Rhythmic Content Beat Histogram
- Pitch detection with larger periods
- Use DWT to divide signal into frequency bands
22Rhythmic Content Beat Histogram
23Features taken from BH
- A0, A1 relative amplitude (divided by the sum of
amplitudes) of the first, and second histogram
peak - RA ratio of the amplitude of the second peak
divided by the amplitude of the first peak - P1, P2 period of the first, second peak in bpm
- SUM overall sum of the histogram (indication of
beat strength).
24Pitch Content Features
- Used enhanced Autocorrelation function to create
folded (1 octave) and unfolded (all notes) pitch
histograms - Mapped to MIDI note numbers
- Folded- common pitch classes
- Unfolded pitch range
- Higher for jazz, classical
- FA0, UP0, UP1, IPO1 (interval between 2 highest
peaks), SUM
25Experimental Results
- Used GMM classifiers with diagonal covariance
matrices
26Experimental Results
27Prescriptive Some Results (from AP)
- Gaussian and Gaussian Mixture Models, used in 48
of successful classification in Ermolinskiy et
al.(2001) using 100 songs for each class in the
training phase. This result has to be taken with
care since the system uses only pitch
information. - Tzanetakis et al. (2001) achieves a rather
disappointing 57, but also reports 75 in
Tzanetakis and Cook (2000a) using 50 songs per
class. - 90 in Lambrou and Sandler (1998) and 75 in
Deshpande et al. (2001) on a very small training
and test set, which may not be representative. - Pye (2000) reports 90 on a total set of 175
songs. - Soltau (1998) reports 80 with HMM, 86 with NN,
with a database of 360 songs.
28Emergent
- Unlike Prescriptive, it is unsupervised
- Based on cultural similarity from text
documents - Possible to extract similarities that are not
possible to extract from the audio signal
29Emergent Collaborative Filtering
- Shardanand Maes 1995, Pestoni et al. 2001
- There are patterns in tastes
- Have users rate their music, match like-tasted
users, recommend unknown items to users - Problems
- Good for naïve profiles, bad for broad, eclectic
tastes - Favors middle of the road liked by large
proportion - Only works some time after release of new music
30Emergent co-concurrent analysis
- Pachet et al. 2001
- Looks at online text sources for co-occurrences
of songs (aka data mining) - If 2 items appear in the same context (or share a
common neighbour), this is evidence of some sort
of similarity
31Co-occurrence
- Pachet et al. 2001 Musical Data Mining for
Electronic Music Distribution - Sources used
- Track listing databases (CDDB)
- Mostly look at compilations of similar artists
- Radio Show playlists
- Specialty programs better than daily commercial
radio - Lists made by experts
32Co-occurrence
- Build a matrix where
- Value of entry (i, j) corresponds to number of
times title i co-occurs with title j - What about indirect co-occurrence?
- Eg Eleanor Rigby/Good Vibrations, Good
Vibrations/God Only Knows ? Eleanor Rigby God
Only Knows - Correlation measure, using co-variance matrices
of each title
33Experimental Results
- Using distance functions, use Ascendant
Hierarchical Clustering - Used CDDB database, compared co-occurrence vs
correlation - Manually examined results
- 70 of clusters had interesting similarities
34Experimental Results
35Challenges
- Name format is not strictly enforced
- The Beatles Beatles, The Beatles
- Difficult to characterize the nature of the
similarities - Cover songs can sound nothing alike
36Conclusions and Future directions
- It seems that samples of Techno and Classical
are easy to discriminate Rock and Pop seems to
be more difficult Soltau et al 1998 - Manual classification not feasible
- Why not combine prescriptive/emergent techniques?