Title: Piano Music Transcription Systems
1Piano Music Transcription Systems
- Presented by Greg Eustace
2Overview
- Introduction to polyphonic transcription systems
including pioneering work, basic architecture,
piano transcription systems and current problems.
- Discussion of Marolts paper A Connectionist
Approach to Automatic Transcription of Piano
Music, including adaptive oscillator networks,
Neural Networks and the SONIC system for piano
transcription.
3Polyphonic transcription
- Transcription The extraction of symbolic
(notational) information from music, including
pitches, dynamic levels, onset times and
durations of notes. - For automatic transcription systems the input is
an audio file and the output could be a MIDI or
score file. - Polyphonic transcription systems have been in use
since the early 1970s beginning with the work of
Moorer whose system assumed only two voices,
separated in frequency range, having different
timbres and restricted intervallic relationships.
4Basic architecture of transcription systems
Front end processing
- The audio signal is converted to a time-frequency
representation such as provided by the short time
Fourier transform (STFT). - Partials present in a frame are identified and
their frequency and amplitude information is
extracted. Peak picking is commonly achieved by
setting an amplitude threshold. - Partial tracks are formed by connecting
partials across frames based on their amplitude
and frequency relationships.
5Basic architecture Blackboard systems for note
identification
- So called blackboard systems use various criteria
to group partials together in order to identify
notes. - Each criterion is represented by a knowledge
source. These could include information from
physics, psychoacoustics or music theory. - Blackboard system often unify top-down and
bottom-up approaches.
6Basic architecture Machine learning for note
Identification
- Other systems use machine learning for pattern
recognition, such as Hidden Markov Models and
Neural networks. This necessarily involves a
training stage in which input-output pairs from a
data set are introduced to the network.
7Assumptions of transcriptions systems
- Many transcriptions systems assume a specific
instrument as input. Thus, transcription problems
which are not specific to the instrument need not
be accounted for in the system. - Piano transcription systems dont need to deal
with notes that modulate in frequency. Piano
notes also have pronounced attacks making onset
detection easier.
8Current problems
- Types of error Missed spurious notes.
- Octave error (due to ambiguity between partials)
- High polyphony
- High low notes
- Repeated notes
- Short note durations
- Masking of low amplitude notes
9A Connectionist Approach to Automatic
Transcription of Piano Music
10Auditory model time frequency analysis
- The input audio is passed through a series of
logarithmically spaced gammatone filters, with
center frequencies between 70 and 6000 Hz. - The output from each filter is processed using
Meddis model of hair cell transduction,
involving half wave rectification, saturation and
reduction. This results in a quasi-periodic
impulsive signal that represents the firing
patterns of hair cells. Dynamic compression is
also inherent, meaning that low amplitude
partials will be more detectable.
11Auditory model time frequency analysis
- Figure 1 Analysis of three partials of piano
note F3 with the Auditory Model (Marolt, 2004).
12Partial tracking using adaptive oscillators
- Partial tracking is achieved using Large-Kolen
adaptive oscillators which synchronize to the
frequency and phase of the driving signal (i.e.
the output from auditory model). In this way
partials are identified if synchronization
occurs. - Synchronization operates according to the
modified gradient descent rule, minimizing an
error function that describes the difference
between input events and beginnings of
oscillation cycles. - The initial frequency of the oscillator is set to
that center frequency of the corresponding
filter. - The oscillator attempts synchronization at the
beginning of every cycle so lower frequencies are
slower to synchronize. - The authors have shown that adaptive oscillators
can successfully track frequency modulated
partials.
13Partial tracking using adaptive oscillators
- Figure 2 Partial tracking with adaptive
oscillators (Marolt, 2004).
14Partial tracking Adaptive oscillator networks
- Partial groups are tracked using 88 networks (up
to ten) of adaptive oscillators. Increasingly
smaller networks are used for higher frequency
notes in correspondence with an upper bound
specified at 6000 Hz for partial frequencies. The
frequency of each oscillator in the network is
initially set to an integer multiple of the
fundamental. - An excitatory relationship between the
oscillators in a network, allows synchronized
oscillators to change the frequency of
non-synchronized oscillators (based on harmonic
relationships), thus achieving faster
synchronization rates. - The output of a network is a weighted sum of the
outputs of its oscillators. Oscillators are
weighted according to their closeness to ideal
frequencies. An oscillator that deviates strongly
from the ideal contributes less to the output of
the network.
15Artificial Neural Networks
- Artificial Neural Network (ANN) Models the
neural structure of the brain. In simple terms,
the ANN receives information from various sources
through input neurons, combines or transforms
the information in some way (handled by neurons
in hidden layers) and outputs that information
via the firing of output neurons.
16Note detection using Neural Networks
- The system uses 76 neural networks.
- Each network is trained to recognize a particular
note (A1 to C8). - The input to each network is accepted from a
partial tracking module. - The output of the network is single neuron. A
neuron with a high value represents the presence
of the target note. - The data set for testing consisted of a
synthesized piano pieces and piano chords, thus
allowing for input-output patterns (300,000 in
total) to be presented to the network.
17Note detection using Neural Networks
- Several different types of neural networks were
tested, with time-delay NNs provided the best
results.
18Note detection using neural networks
- The authors compared the result of using their
partial tracking method as input for the
time-delay NNs, with that of a time-frequency
transform (similar to constant Q transform).
Their partial tracking method performed better. - Table 2 Average performance of systems with and
without partial tracking (Marolt, 2004).
19SONIC A system for transcription of piano music
- The partial tracking and note detection systems
were incorporated in to the SONIC system for
piano transcription. SONIC is capable of
detecting note onsets, lengths and loudness as
well as a particular pianos tuning and the
presence of repeated notes. - SONIC is available for download at
- http//lgm.fri.uni-lj.si/SONIC.html
20SONIC A system for transcription of piano music
- Figure 4 Structure of SONIC (Marolt, 2004).
21SONIC Onset detection
- The onset detector involves splitting the audio
into 22 frequency bands. The outputs are filtered
to give a positive value when the signal rises
and negative value otherwise. The filter outputs
control the activation of neurons which send
impulses that indicate onsets. Multilayer
perceptrons (MLP) are used to determine if the
impulse represents an onset or some other type of
amplitude disturbance.
22SONIC Repeated notes
- Detecting repeated notes poses a problem if notes
which share the same partials are present in a
chord containing a repeated note. SONIC uses MLPs
for tracking repeated notes.
23SONIC Tuning, note lengths and dynamics
- The pianos tuning needs to be detected prior to
transcription. Adaptive oscillators are used to
detect partials. Tuning is then calculated as a
weighted sum of the deviation of the partials
from ideal frequencies. - Notes terminations (and therefore lengths) are
indicated when the note activation networks fall
below a threshold. - Dynamics are calculated using the amplitude
envelope of the notes first harmonic.
24Performance Statistics
- Table 3 Performance statistics of transcriptions
of 3 synthesized and 3 real piano recordings
(Marolt, 2004). - Transcription results available at
- http//lgm.fri.uni-lj.si/SONIC.html
25Error Discussion
- The majority of errors encountered were concerned
with octave error and repeated notes. Additional
sources of error include fast passages (such as
arpeggios or thrills), masking of low amplitude
notes, missed onsets, high polyphony or very low
pitched notes.