11-751 Term Project Fall 2004 Emotion Detection in Music - PowerPoint PPT Presentation

About This Presentation

Title:

11-751 Term Project Fall 2004 Emotion Detection in Music

Description:

Music Information Retrieval Conferences (ISMIR) - http://www.ismir.net ... Nina Simone, Nine Inch Nails, Nirvana, Norah Jones, Sticky Rice, Olodum, Opeth, ... – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 22

Provided by: cch8

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: 11-751 Term Project Fall 2004 Emotion Detection in Music

1
11-751 Term ProjectFall 2004Emotion Detection
in Music

Vitor R. Carvalho Chih-yu Chao

2
Problem Tackled

Using machine learning techniques to
automatically detect emotion in music
Define a good set of emotion categories
Select the feature set
Classification problem

3
Related Work

Music Information Retrieval Conferences (ISMIR) -
http//www.ismir.net/
Li Mitsunori - ISMIR 2003
Liu, Lu Zhang - ISMIR 2003
Feng Zhuang ISMIR 2003 and IEEE/WIC-03

4
Taxonomy of emotion classification

5-point Likert scale
1 stands for very sad
2 stands for sad
3 neutral, not happy and not sad
4 stands for happy
5 stands for very happy
Easy, simple, and many practical applications
(search, personalization, etc)

5
Data and Labeling Process

Music dataset
201 popular songs from Brazil, Taiwan, Japan,
Africa, and the United States
Two people manually labeling the data
Human voice expresses emotion, but the lyrics
were not considered (no semantics)
One emotion per song (no segmentation)
Inter-annotator agreement

6
Songs Authors List

Aerosmith, African, Agalloch, Alanis Morissette,
A-mei, Anathema, Angelique Kidjo, Beth Carvalho,
Billy Gilman, Blossom Dearie, Bluem of Youth,
Boyz II Men, Caetano Veloso, Cai Chun Jia,
Cesaria Evora, Chen Guan Qian, Chen Yi Xun, Chico
Buarque, Ciacia, Comadre Florzinha, Dave Matthews
Band, David Huang, Djavan, Dogs Eye View, Dream
Theater, Dreams Come True, Dsound, Ed Motta, Edu
Lobo, Elegy, For Real, Gal Costa, George Michael,
Gilberto Gil, Goo Goo Dolls, Green Carnation,
Hanson, Ian Moore, Ivan Lins, Jackopierce, Jamie
Cullum, Jason Maraz, Jeff Buckley, Jiang Mei Qi,
João Donato, John Mayer, John Pizzarelli, JS,
Landy Wen, Lisa, Lisa Ono, Lizz Wright, Luna Sea,
Maria Bethania, Marisa Monte, Matchbox 20, Matsu
Takako, Mexericos, Misia, Natalie Imbruglia, Nina
Simone, Nine Inch Nails, Nirvana, Norah Jones,
Sticky Rice, Olodum, Opeth, Pink Floyd, Porcupine
Tree, Radiohead, REM, Rick Price, Rosa Passos,
Salif Keita, Sarah McLachlan, Shawn Colvin, Shawn
Stockman, Shino, The Smiths, Staind, Sting,
Yanzi, Tanya Chua, Terry Lin, The Badlees,
Timbalada, Tom Jobim , Elis Regina, Toni
Braxton, Train, Tribalistas, Tyrese, Faye Wang,
Xiao Yuan You Hui, Yo-yo Ma Rosa Passos, Zeca
Baleiro, Zelia Duncan

7
Inter-annotator agreement

Pearson's correlation (r)
-1 (total disagreement) to 1 (total agreement)
r0.643
Both average ratings are 3.23 (3 neutral)
happier bias

8
Feature Extraction Attempts

Tool for extracting useful features from music
data?
ESPS ? - speech only, not music
Praat ? - speech only
MARSYAS-0.1 - good features, but not stable
MARSYAS-0.2 !!!

9
Feature Sets in Marsyas

MARSYAS written mostly by George Tzanetakis
(marsyas.sourceforge.net/ )
In Marsyas-0.2, there are 4 sets of features
STFT-based, centroid, rolloff, flux,
zeroCrossing, etc
Spectral Flatness Measure (SFM) features
Spectral Crest Factor (SCF) features
Mel-Frequency Cepstral Coefficients (MFCC)
At every 20ms, all features are calculated. The
final features are their means and standard
deviations, obtained over a window of 1 second,
or 50 time-frames.

10
Final Feature Representation

EleanorRigby.wav sad f10.2, f2, f3 ,
EleanorRigby.wav sad f10.24, f2, f3,
EleanorRigby.wav sad f10.79, f2, f3,
girlFromIpanema.wav happy f10.21, f2, f3 ,
girlFromIpanema.wav happy f10.64, f2, f3,
girlFromIpanema.wav happy f10.99, f2, f3,
girlFromIpanema.wav happy f10.49, f2, f3,
girlFromIpanema.wav happy f10.93, f2, f3,
NeMeQuittePas.wav verySad f10.82, f2, f3 ,
NeMeQuittePas.wav verySad f10.14, f2, f3,
NeMeQuittePas.wav verySad f10.999, f2, f3,

5 seconds
11
Still on the Final Representation

The entire collection had to be turned into the
WAV format, with the following specifications
22050 Hz PCM sampling, 16-bit, mono.
Final feature files were huge, reaching 81 MB of
text only (52000 lines)

12
Experiments

2 Types
Binary Classification Happy versus Sad
Multi-Class problem 5-label classification
5-fold (or 2-fold) cross-validation
Majority vote to decide the final label
Minorthird classification package CMU
(minorthird.sourceforge.net/ )

13
Results Happy versus Sad
The StackLearner makes the final decision in two
steps. In the second step, the examples are
augmented with decisions of the previous step
classifier.
14
Results Happy versus Sad

Whats the most informative feature set?

(Decision Tree Classifier, 5-fold
cross-validation)

15
Results 5-label classification
16
Results 5-label classification

Whats the most informative feature set?
(Maximum Entropy Classifier, 2-fold CV)

17
Confusion Matrix
18
Lessons Learned

There are many sw packages for voice processing,
but only a few for music processing.
Using Marsyas was more complicated than expected
(poor documentation, limited number of input
formats, etc).

19
Conclusion