Locating Singing Voice Segments Within Music Signals - PowerPoint PPT Presentation

About This Presentation
Title:

Locating Singing Voice Segments Within Music Signals

Description:

Lyrics Recognition: Baby Steps. Segmentation. Forced Alignment. A Corpus ... Lyrics Recognition: Can YOU do it? Notoriously hard, even for humans. ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 21
Provided by: adamber
Category:

less

Transcript and Presenter's Notes

Title: Locating Singing Voice Segments Within Music Signals


1
Locating Singing Voice Segments Within Music
Signals
  • Adam Berenzweig and Daniel P.W. Ellis
  • LabROSA, Columbia University
  • alb63_at_columbia.edu, dpwe_at_ee.columbia.edu

2
LabROSA
  • What
  • Where
  • Who
  • Why you love us

3
The Future as We Hear It
  • Online Digital Music Libraries
  • The Coming Age of Streaming Music Services
  • Information Retrieval How do we find what we
    want?
  • Recommendation How do we know what we want to
    find?
  • Collaborative Filtering vs. Content-Based
  • What is Quality?

4
Motivation
  • Lyrics Recognition Baby Steps
  • Segmentation
  • Forced Alignment
  • A Corpus
  • Song structure through singing structure?
  • Fingerprinting
  • Retreival
  • Feature for similarity measures

5
Lyrics Recognition Can YOU do it?
  • Notoriously hard, even for humans.
  • amIright.com, kissThisGuy.com
  • Why so hard?
  • Noise, music, whatever.
  • Singing is not speech voice transformations
  • Strange word sequences (poetry)
  • Need a corpus

6
History of the Problem
  • Segmentation for Speech Recognition Music/Speech
  • Scheirer Slaney
  • Forced Alignment - Karaoke
  • Cano et al. REF NEEDED
  • Acoustic feature design Custom job or Kitchen
    Sink?
  • Idea! Use a speech recognizer PPF (Posterior
    Probability Features)
  • Williams Ellis
  • Ultimately Source separation, CASA

7
A Peek at the End
8
Architecture Overview
  • Entropy H
  • H/h
  • Dynamism D
  • P(h)

posteriogram
cepstra
Audio
PLP
Speech Recognizer (Neural Net)
Feature Calculation
Time- averaging
Segmentation (HMM)
Gaussian Model
Gaussian Model
9
Architecture Overview
posteriogram
cepstra
Audio
PLP
Speech Recognizer (Neural Net)
Neural Net
Segmentation (HMM)
Neural Net
10
So hows that working out for you, being clever?
  • Entropy
  • Entropy excluding background
  • Dynamism
  • Background probability
  • Distribution Match Likelihoods under single
    Gaussian model
  • Cepstra
  • PPF

11
Recovering context with the HMM
  • Transition probabilities
  • Inverse average segment duration
  • Emission probabilities
  • Gaussian fit to time-averaged distribution
  • Segmentation the Viterbi path
  • Evaluation
  • Frame error rate (no boundary consideration)

12
Results
  • Table, figures
  • Listen!
  • Good, bad
  • trigger stick
  • genre effects?

13
Results
14
  • E .075
  • P(h) in effect

15
  • E .68
  • P(h) gone bad

16
m,n
uw
ey
  • E .61
  • Strong phones trigger, but cant hold it
  • Production quality effect?

17
s
  • E .25
  • Trigger and Stick

18
bcl,dcl,b, d
l,r
  • E .54
  • False phones

19
  • E .20
  • Genre effect?

20
Discussion
  • The Moral of the Story Just give it the data
  • PPF is better than cepstra. Speech Recognizer is
    pretty powerful.
  • Why does the extra Gaussian model help PPF but
    not cepstra?
  • Time averaging helps PPF proves that its using
    the overall distribution, not short-time detail
    (at least, when modelled by single gaussians)
Write a Comment
User Comments (0)
About PowerShow.com