72x36 Poster Template - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

72x36 Poster Template

Description:

A Maximum Likelihood Approach to Multiple Fundamental Frequency Estimation From the Amplitude Spectrum Peaks – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 2
Provided by: A246
Category:

less

Transcript and Presenter's Notes

Title: 72x36 Poster Template


1
A Maximum Likelihood Approach to Multiple
Fundamental Frequency Estimation From the
Amplitude Spectrum Peaks Zhiyao Duan, Changshui
ZhangDepartment of Automation, Tsinghua
University, Beijing 100084, China.
Summary
Modeling
Experiment
The likelihood function
  • Acoustic materials 1500 notes from the Iowa
    music database
  • 18 wind and arco-string instruments
  • C2 (65Hz) B6 (1976Hz), mf ff
  • Training data 500 notes
  • Testing data generated using the other 1000
    notes
  • Mixed with equal mean square level
  • 1000 mixtures each for polyphony 1, 2, 3 and 4
  • A maximum likelihood approach in the frequency
    domain
  • Only the frequencies and amplitudes of the peaks
    in the amplitude spectrum rather than the whole
    complex spectrum are used
  • Considers the potential errors in the peak
    detection algorithm and treats each peak as a
    true and false one separately
  • The parameters of the likelihood function are
    learned from monophonic training samples
  • A Bayesian Information Criteria (BIC) is used to
    estimate the number of concurrent sounds
    (polyphony).

()
p(A, f) p(A, h) p(f, h)
b) Frequency part
F0s estimation White bar predominant F0 Grey
bar multiple F0 Black bar multiple F0 without
counting octave(s) errors Upper figure our
results Lower figure using the Gaussian
distribution to model the frequency deviation of
the true peaks.
45 lt f0 lt 55 55 lt f0 lt 65
where is the frequency deviation of peak i
from the nearest harmonic position of the given
F0. Assum. 5 there is always a true peak
detected in the semitone range around any
harmonic position of a F0. Assum. 6 the
frequency deviation is independent of its F0.
(right figures) Symmetric, long
tailed, not spiky Estimated using a GMM (4
kernels)
Formulation
  • Viewpoint view multiple F0 estimation as a
    parameter estimation problem from observations in
    the frequency domain.
  • Parameters to be estimated
  • Polyphony (number of F0s)
  • F0s
  • Observations the complex spectrum

65 lt f0 lt 75 75 lt f0 lt 85
false peak part
true peak part
where indicating whether a peak is true (1)
or false (0) True peak generated by the F0s
and the harmonics False peak caused by peak
detection errors Assum. 2 peaks are
conditionally independent with each other. Assum.
3 whether a peak is true or false is independent
of F0s.
  • The predominant-F0 remains almost the same with
    the increase of polyphony the greedy search
    strategy is feasible.
  • The octave errors take up almost the half of all
    the multiple-F0 errors the inherent limitations
    of our algorithm these errors are not that
    annoying in some scenarios, e.g. chord
    recognition.
  • The upper figure results are better than the
    lower the statistical
  • information about the peaks in the monophonic
    training data is more
  • helpful than a usually used non-informative
    Gaussian model.

A Maximum Likelihood method
No limitation with f0
2) False peak part likelihood (right
figure) Estimated using a Gaussian Mean Covarianc
e
1) True peak part likelihood
  • where
  • the N logarithmic fundamental frequencies
  • the possible frequency range of F0s
  • complex spectrum
  • the K logarithmic frequencies of the peaks
  • the logarithmic amplitudes of the peaks.
  • Assum. 1 The observation can be reduced to
    frequencies and amplitudes of the peaks in the
    amplitude spectrum.
  • Only reserving the peaks in the amplitude
    spectrum will cause little distortion for
    auditory perception
  • Peaks contain important information for F0
    estimation, since they appear at the harmonic
    positions of the F0s
  • The dimension of the observation is reduced
    dramatically.
  • Learning the model
  • From the monophonic training data
  • Easy to detect the F0s and peaks accurately
  • Statistics of their peaks are used to learn the
    parameters of the likelihood function.
  • Polyphony estimation
  • The weighted BIC is still not a proper method.

Histogram of the polyphony estimates
Amplitude part
Frequency part
where is the F0 that generates peak
i. Assum. 4 each true peak is generated by only
one F0.
  • Estimate the polyphony
  • The likelihood will increase with the number of
    F0s
  • Addressed by a weighted Bayesian Information
    Criteria
  • Find the F0s and polyphony that maximize BIC
  • The weight is adjusted manually and found proper
    for polyphony 1 to 4

Discussions
a) The amplitude part Change the conditions F0
? harmonic number of peak i, since the
correlation between Ai and F0 is much smaller
than that between Ai and hi.
  • How to bootstrap the modeling of the peaks in
    the testing data themselves? Iteratively learn
    the statistics and discriminate the true and
    false peaks in the testing data.
  • Extend to the quasi-harmonic sounds, e.g. piano
    sounds.
  • How to deal with the inherent limitation that
    being tend to estimate the half F0s? How about
    rectifying the likelihood function, such as
    increasing the spectral amplitudes at the
    harmonic positions of the F0s into the
    observation.
  • Integrate sound source separation into the
    algorithm and consider the time dependent
    information.

weight
BIC penalty
Log likelihood
  • A greedy search strategy
  • A combinational explosion problem
  • Estimate F0s one by one
  • Stop when BIC begins to decrease

The 3-d joint probability density is estimated
using a Parzen window (11115), as illustrated
by the three 2-d marginal density in following
figures
Write a Comment
User Comments (0)
About PowerShow.com