Automatic Detection of Vocal Segments In Popular Songs - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Automatic Detection of Vocal Segments In Popular Songs

Description:

Voice signal tends to have a higher rate of change than instrumental music ... Segment songs into vocal and non-vocal segments using the MM-HMM classifier ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 18
Provided by: dblabCs
Category:

less

Transcript and Presenter's Notes

Title: Automatic Detection of Vocal Segments In Popular Songs


1
Automatic Detection of Vocal Segments In Popular
Songs
  • Tin Lay New
  • Ye Wang

2
Introduction
  • Voice signal tends to have a higher rate of
    change than instrumental music
  • The straightforward method to detect vocals is to
    note the energy within the frequencies bounded by
    the range of vocal energy
  • Features that can measure the harmonic content of
    the music signal are important for detecting
    vocals in a song

3
Abstract
  • Technique for the automatic classification of
    vocal and non-vocal regions
  • Improve the conventional Hidden Markov Model
    (HMM) for classification
  • Experimental evaluations conducted on a database
    of 20 popular songs

4
Acoustic Features
  • If vocals begin while instrumental is going on, a
    sudden increase in the energy level of the audio
    signal is observed
  • We extract feature parameters based on the
    distribution of energy in different frequency
    bands in the range from 130Hz to 16 kHz

5
Acoustic Features (cont.)
  • First, the test song is blocked into 200ms
    analysis frames
  • LFPC features are calculated from 20ms with 13ms
    overlapping subframes
  • Each frame is multiplied with a Hamming window to
    minimize signal discontinuities at the end of
    each frame, then Fast Fourier Transform (FFT) is
    computed

6
Log Frequency Power Coefficients (LFPC)
7
Log Frequency Power Coefficients (Cont.)
  • LFPC parameters provide an indication of energy
    distribution among subbands are calculated as
    follows

8
Classifier Formulation
  • Takes song structure information into account in
    song modeling
  • For example, signal strengths in different
    sections (intro, verse, chorus, bridge and outro)
    are usually different
  • Tempo and loudness are important attributes

9
Classifier Formulation (Cont.)
10
About HMM
  • Using Multi-model HMM (MM-HMM) classifier
  • What is HMM?
  • Hidden Markov Model??HMM
  • ???????(Pattern Recognition)???,????Training
    Data, ????Model,???????????,??Testing
    Data,???????????????,??????????
  • ???????????
  • 1.          ????(The Evaluation Problem)
  • 2.          ????(The Decoding Problem)
  • 3.          ????(The Learning Problem)

11
Classifier Formulation (Cont.)
  1. Annotated vocal and non-vocal segments of every
    test song to train their model by human
  2. Segment songs into vocal and non-vocal segments
    using the MM-HMM classifier
  3. Then use the segmentation as bootstrapped sample
    to build song-specific vocal and non-vocal models
    of the test song with a bootstrapping process

12
Experiments (About Frame Size)
13
Experiments (database)
14
Experiments (result)
15
Experiments (probability)
16
Improvement
  • Enable a semi-automatic system--
  • instead of choosing bootstrapping samples
    randomly, we could allow the user to check and to
    select the bootstrapped samples (vocal and
    non-vocal segments) manually from the initial
    segmentation performed by the MM-HMM

17
Conclusion
  • In a test dataset comprising 14 popular songs,
    our approach has achieved an accuracy of 84.3 in
    identifying vocal segments from non-vocal ones
  • But theres still one drawback
  • It is computationally expensive since it entails
    two training steps training the MM-HMM
    classifier and training the bootstrapped
    classifier
Write a Comment
User Comments (0)
About PowerShow.com