Singer similarity / identification - PowerPoint PPT Presentation

About This Presentation
Title:

Singer similarity / identification

Description:

Often builds on speech / music discrimination systems ... are used as discriminators for classification of unknown MP3 music objects ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 19
Provided by: thomas319
Category:

less

Transcript and Presenter's Notes

Title: Singer similarity / identification


1
Singer similarity / identification
  • Francois Thibault
  • MUMT 614B
  • McGill University

2
Introduction
  • Relatively easy for humans to identify singing
    voice in various contexts
  • Difficult to find time/environment invariant
    features for robust automatic identification
  • Growing demand for such systems as Network
    databases keep expanding

3
Background (1)
  • Significant research in speaker identification,
    systems perform poorly with singing voice
    (inadequate training)
  • Singer identification research can draw much of
    automatic instrument recognition systems
  • Artist / singer identification much harder than
    song identification (due to necessity of context
    invariant features)

4
Background (2)
  • Often builds on speech / music discrimination
    systems
  • Acoustical features heavily used to create
    N-dimensional Euclidean space loudness, pitch,
    brightness, bandwidth, harmonicity
  • Often uses the same tools as style identification
    because each singer correspond to a micro style

5
Kim and Whitman overview
  • Segmentation of vocal regions prior to singer
    identification algorithm
  • Assumes singing regions display strong harmonic
    energy in voice frequency range
  • Band-pass filter (200-2000 Hz)
  • Inverse comb filter bank to detect harmonicity
  • Identification classifier uses features based on
    LPC

6
K W features extraction
  • Determine formant location and amplitude by a
    12-poles linear predictor using the
    autocorrelation method
  • Augments low frequency resolution without
    increasing model order by warping the frequency
    representation with a function approximating the
    Bark scale

7
K W classification
  • Uses Gaussian mixture model (GMM) to capture
    behavior of a class
  • Parameters of Gaussians determined by Expectation
    Maximization (EM)
  • Run PCA prior to EM (normalizes the data
    variance, good for EM)
  • SVMs computes optimal hyperplane that can
    linearly separate classes

8
K W results
  • Testbed contained more than 200 songs by 17 solo
    singers
  • Half for training, half for testing
  • Vocal segmentation inaccurate (55)
  • Experimenting GMM and SVM for complete song and
    vocal parts only
  • Overall results well short of human performance

9
K W Experimental results
10
Liu and Huang overview
  • Singer classification of MP3 files
  • First segment audio into phonemes
  • Calculate feature vector and store phoneme
    feature vector with associated singer for
    training set
  • Above feature vectors are used as discriminators
    for classification of unknown MP3 music objects

11
L H System Architecture
12
L H segmentation features
  • Phoneme segmentation is derived from polyphase
    filter coefficients by obtaining a frame energy
    measurement

13
K W phoneme database
  • Phonemes are separated by a minimum in FE

14
L H Phoneme features
  • The phoneme features are obtained directly from
    the MDCT coefficients

15
L H classification (1)
  • Compares phonemes features with those in the
    phoneme database
  • Discriminating radius (Euclidean distance) is
    determines uniqueness of a phoneme
  • Number of neighbors by same singer within the
    discriminating radius is called frequency (w)

16
L H classification (2)
  • kNN classifier used to guess artist in unknown
    MP3 songs
  • For efficiency, only uses the first N phonemes in
    unknown MP3
  • Find the k closest neighbors in database and
    allow to vote if distance is within a threshold
  • For each neighbor, give a weighted vote dependent
    on frequency, and distance

where w is frequency and
17
K W results
  • 3 influencing factors
  • Number of neighbors (N)
  • Threshold for vote decision
  • Number of singers in database

18
Other works
  • Minnowmatch MIR engine including artist
    classification using NN and SVM (Whitman, Flake,
    Lawrence (NEC))
  • Quest for ground truth in musical artist
    similarity determine accurate measure of
    similarity given subjective nature of artist
    classification (Ellis, Whitman, Berenzweig,
    Lawrence)
Write a Comment
User Comments (0)
About PowerShow.com