Piano Music Transcription Systems - PowerPoint PPT Presentation

About This Presentation

Title:

Piano Music Transcription Systems

Description:

SONIC: A system for transcription of piano music ... SONIC is capable of detecting note onsets, lengths and loudness as well as a ... SONIC: Repeated notes ... – PowerPoint PPT presentation

Number of Views:124

Avg rating:3.0/5.0

Slides: 26

Provided by: Gre9198

Category:

more less

Transcript and Presenter's Notes

Title: Piano Music Transcription Systems

1
Piano Music Transcription Systems

Presented by Greg Eustace

2
Overview

Introduction to polyphonic transcription systems
including pioneering work, basic architecture,
piano transcription systems and current problems.
Discussion of Marolts paper A Connectionist
Approach to Automatic Transcription of Piano
Music, including adaptive oscillator networks,
Neural Networks and the SONIC system for piano
transcription.

3
Polyphonic transcription

Transcription The extraction of symbolic
(notational) information from music, including
pitches, dynamic levels, onset times and
durations of notes.
For automatic transcription systems the input is
an audio file and the output could be a MIDI or
score file.
Polyphonic transcription systems have been in use
since the early 1970s beginning with the work of
Moorer whose system assumed only two voices,
separated in frequency range, having different
timbres and restricted intervallic relationships.

4
Basic architecture of transcription systems
Front end processing

The audio signal is converted to a time-frequency
representation such as provided by the short time
Fourier transform (STFT).
Partials present in a frame are identified and
their frequency and amplitude information is
extracted. Peak picking is commonly achieved by
setting an amplitude threshold.
Partial tracks are formed by connecting
partials across frames based on their amplitude
and frequency relationships.

5
Basic architecture Blackboard systems for note
identification

So called blackboard systems use various criteria
to group partials together in order to identify
notes.
Each criterion is represented by a knowledge
source. These could include information from
physics, psychoacoustics or music theory.
Blackboard system often unify top-down and
bottom-up approaches.

6
Basic architecture Machine learning for note
Identification

Other systems use machine learning for pattern
recognition, such as Hidden Markov Models and
Neural networks. This necessarily involves a
training stage in which input-output pairs from a
data set are introduced to the network.

7
Assumptions of transcriptions systems

Many transcriptions systems assume a specific
instrument as input. Thus, transcription problems
which are not specific to the instrument need not
be accounted for in the system.
Piano transcription systems dont need to deal
with notes that modulate in frequency. Piano
notes also have pronounced attacks making onset
detection easier.

8
Current problems

Types of error Missed spurious notes.
Octave error (due to ambiguity between partials)
High polyphony
High low notes
Repeated notes
Short note durations
Masking of low amplitude notes

9
A Connectionist Approach to Automatic
Transcription of Piano Music

Matija Marolt

10
Auditory model time frequency analysis

The input audio is passed through a series of
logarithmically spaced gammatone filters, with
center frequencies between 70 and 6000 Hz.
The output from each filter is processed using
Meddis model of hair cell transduction,
involving half wave rectification, saturation and
reduction. This results in a quasi-periodic
impulsive signal that represents the firing
patterns of hair cells. Dynamic compression is
also inherent, meaning that low amplitude
partials will be more detectable.

11
Auditory model time frequency analysis

Figure 1 Analysis of three partials of piano
note F3 with the Auditory Model (Marolt, 2004).

12
Partial tracking using adaptive oscillators

Partial tracking is achieved using Large-Kolen
adaptive oscillators which synchronize to the
frequency and phase of the driving signal (i.e.
the output from auditory model). In this way
partials are identified if synchronization
occurs.
Synchronization operates according to the
modified gradient descent rule, minimizing an
error function that describes the difference
between input events and beginnings of
oscillation cycles.
The initial frequency of the oscillator is set to
that center frequency of the corresponding
filter.
The oscillator attempts synchronization at the
beginning of every cycle so lower frequencies are
slower to synchronize.
The authors have shown that adaptive oscillators
can successfully track frequency modulated
partials.

13
Partial tracking using adaptive oscillators

Figure 2 Partial tracking with adaptive
oscillators (Marolt, 2004).

14
Partial tracking Adaptive oscillator networks

Partial groups are tracked using 88 networks (up
to ten) of adaptive oscillators. Increasingly
smaller networks are used for higher frequency
notes in correspondence with an upper bound
specified at 6000 Hz for partial frequencies. The
frequency of each oscillator in the network is
initially set to an integer multiple of the
fundamental.
An excitatory relationship between the
oscillators in a network, allows synchronized
oscillators to change the frequency of
non-synchronized oscillators (based on harmonic
relationships), thus achieving faster
synchronization rates.
The output of a network is a weighted sum of the
outputs of its oscillators. Oscillators are
weighted according to their closeness to ideal
frequencies. An oscillator that deviates strongly
from the ideal contributes less to the output of
the network.

15
Artificial Neural Networks

Artificial Neural Network (ANN) Models the
neural structure of the brain. In simple terms,
the ANN receives information from various sources
through input neurons, combines or transforms
the information in some way (handled by neurons
in hidden layers) and outputs that information
via the firing of output neurons.

16
Note detection using Neural Networks

The system uses 76 neural networks.
Each network is trained to recognize a particular
note (A1 to C8).
The input to each network is accepted from a
partial tracking module.
The output of the network is single neuron. A
neuron with a high value represents the presence
of the target note.
The data set for testing consisted of a
synthesized piano pieces and piano chords, thus
allowing for input-output patterns (300,000 in
total) to be presented to the network.

17
Note detection using Neural Networks

Several different types of neural networks were
tested, with time-delay NNs provided the best
results.

18
Note detection using neural networks

The authors compared the result of using their
partial tracking method as input for the
time-delay NNs, with that of a time-frequency
transform (similar to constant Q transform).
Their partial tracking method performed better.
Table 2 Average performance of systems with and
without partial tracking (Marolt, 2004).

19
SONIC A system for transcription of piano music

The partial tracking and note detection systems
were incorporated in to the SONIC system for
piano transcription. SONIC is capable of
detecting note onsets, lengths and loudness as
well as a particular pianos tuning and the
presence of repeated notes.
SONIC is available for download at
http//lgm.fri.uni-lj.si/SONIC.html

20
SONIC A system for transcription of piano music

Figure 4 Structure of SONIC (Marolt, 2004).

21
SONIC Onset detection

The onset detector involves splitting the audio
into 22 frequency bands. The outputs are filtered
to give a positive value when the signal rises
and negative value otherwise. The filter outputs
control the activation of neurons which send
impulses that indicate onsets. Multilayer
perceptrons (MLP) are used to determine if the
impulse represents an onset or some other type of
amplitude disturbance.

22
SONIC Repeated notes

Detecting repeated notes poses a problem if notes
which share the same partials are present in a
chord containing a repeated note. SONIC uses MLPs
for tracking repeated notes.

23
SONIC Tuning, note lengths and dynamics

The pianos tuning needs to be detected prior to
transcription. Adaptive oscillators are used to
detect partials. Tuning is then calculated as a
weighted sum of the deviation of the partials
from ideal frequencies.
Notes terminations (and therefore lengths) are
indicated when the note activation networks fall
below a threshold.
Dynamics are calculated using the amplitude
envelope of the notes first harmonic.

24
Performance Statistics

Table 3 Performance statistics of transcriptions
of 3 synthesized and 3 real piano recordings
(Marolt, 2004).
Transcription results available at
http//lgm.fri.uni-lj.si/SONIC.html

25
Error Discussion

The majority of errors encountered were concerned
with octave error and repeated notes. Additional
sources of error include fast passages (such as
arpeggios or thrills), masking of low amplitude
notes, missed onsets, high polyphony or very low
pitched notes.

Write a Comment

User Comments (0)