Piano Music Transcription - PowerPoint PPT Presentation

About This Presentation
Title:

Piano Music Transcription

Description:

Can make assumptions about strengths of various partials (Martin) ... Blackboard systems (Bello, Monti & Sandler, Martin) Neural networks ... – PowerPoint PPT presentation

Number of Views:189
Avg rating:3.0/5.0
Slides: 20
Provided by: mact1
Category:

less

Transcript and Presenter's Notes

Title: Piano Music Transcription


1
Piano Music Transcription
  • Wes Crusher Hatch
  • MUMT-614
  • Thurs., Feb.13

2
Introduction
  • Polyphonic pitch extraction
  • Want to realize computational scene analysis
    (Klaburi)
  • Problem is comparable to speech recognition

3
Current State of affairs
  • Many different approaches
  • Nothing is 100 reliable, or even 90or 80
  • Drawback no one heuristic means that no one is
    building on, or learning from, previous work and
    experience

4
Parameters to extract
  • Pitch
  • Amplitude
  • Onset and duration
  • Do NOT require
  • Spatial location
  • timbre

5
Benefits of knowing timbre
  • Can assume a piano sound for input, and
  • Simplifies things down the road
  • Dont need to calculate a sound source model of
    an instrument (Marolt)
  • Can make assumptions about strengths of various
    partials (Martin)
  • Makes other techniques possible (eg. differential
    spectrum analysis, Hawley)

6
Recent developments
  • A few techniques are gaining prominence
  • Blackboard systems (Bello, Monti Sandler,
    Martin)
  • Neural networks
  • Pitch perception models based on human audition
    (gammatone filterbank front-end)
  • To a lesser extent
  • Hidden Markov models

7
Benefits of Blackboards
  • Can incorporate all previous approaches, and
    methodologies
  • Top-down or bottom-up
  • Easily expandable
  • Can be easily updated to accommodate new
    technology

8
A very general heuristic
  • Front-end
  • Analysis, representation, pitch hypothesis
  • Top-down processes, (which in turn effects
    front-end analysis and pitch guesses)
  • Transcribed notes out (Guido, MIDI, etc.)

9
Commonalities between systems
  • transform data into freq. representation
  • STFT tracking phase vocoder (Dixon)
  • Sinusoid tracks (Martin)
  • Gammatone filterbank (Marolt, Martin)
  • Top-down organization
  • System has the ability to learn
  • Neural nets (Marolt, Bello)
  • HMM (Raphael)
  • timbre adaption (Dixon--soon)

10
Top-down is super
  • Bottom-up analysis --gt note hypothesis
  • Unidirectional
  • Doesnt know about past analysis, only concern
    is hierarchal flow of data
  • inflexible
  • Top-down high --gt low level
  • Different levels of the system are determined by
    predictive models and previous knowledge
  • Implemented by neural nets, blackboard system

11
Happy schematic
  • Low level --gt mid-level --gt high level

12
Front-end techniques
  • Sinusoidal
  • STFT
  • Constant frequency spacing means better
    resolution in high freq.s, poorer resolution in
    low freq. range
  • tracking phase vocoder
  • Sinusoid track
  • Track continuous regions of local energy maxima
    in time-frequency domain (eg. Dixon)

13
Front-end techniques, cont.
  • Correllation
  • Try to model human audition
  • Constant Q mimics log. resolution of human ear
  • Gammatone filterbank
  • output of each filter then processed by a model
    of inner hair cell dynamics
  • Further analysis by short-time auto-correllation
  • Variable filter widths filters generally
    implemented across 70 - 6000 Hz
  • Same problems as found in scene analysis

14
Onset detection
  • Neural nets
  • Differences between 6 ms and 18 ms amplitude
    envelopes (Martolt)
  • Change in high frequency content (Bello)
  • Zero-lag correlation for each filterbank channel
  • Running estimate of energy (Martin)

15
Analysis pitch hypothesis
  • Blackboards
  • contain a variety of KS
  • Neural nets
  • fuzzy logic
  • May contain front-end processing, or may be fed
    results thereof
  • Can be used for entire process (front-end, data
    representation, pitch hypothesis) or just to
    tabulate pitch guesses at the end

16
Analysis pitch hypothesis
  • Peak-picking together w/phase spectrum (helps to
    resolve low freq. uncertainties)
  • atoms of energy localized in time and frequency
    (Dixon)
  • HMM
  • Neural nets (note, chord recognizers)
  • trained to look for one given note (eg. C4)
  • Can also be a KS in blackboard system

17
Pitfalls
  • Octave errors most common error source
  • Some solutions
  • feedback to provide inhibition from the output
    of the note recognition stage to its input
    (Martolt)
  • Instrumental models (have knowledge about
    strengths of various partials--spectral shape)
  • Apply general musical knowledge (voice leading
    rules, harmony counterpoint, etc.) (Kashino)

18
Different systems results
  • Dixon 70-80 correct
  • SONIC (Martolt) 80-95 correct, (13-25
    extra notes)
  • Monti Sandler 74 correct
  • Raphael 39 wrong/missed
  • Bello, Martin no data available

19
Conclusions
  • Exponentially more difficult than monophonic
    transcriptions
  • Are slowly approaching very good, robust systems
  • Compare to Moore, 1975
  • Very few restrictions in the input data
  • Top-level organizations are key
  • Blackboards, neural networks
Write a Comment
User Comments (0)
About PowerShow.com