CUIDADO UPF status report

About This Presentation

Title:

CUIDADO UPF status report

Description:

Develop music description schemes that can be used for the different CUIDADO ... Funk. M2 meeting. Other contributions: F0 estimation ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 38

Provided by: perfecto4

Category:

more less

Transcript and Presenter's Notes

Title: CUIDADO UPF status report

1
CUIDADO UPF status report

M2 meeting
March 2002
Perfecto Herrera
Universitat Pompeu Fabra

2
Outline

Music Description Schemes (Deliverable 2.2.1)
Modules Developed (Deliverable 2.2.2)
Melody description extraction module
Rhythm description extraction module
Interesting segments extraction module
Timescaling module
Other contributions
Whats next?

3
Music Description Schemes (Deliverable 2.2.1)

Segment description schemes
Melody description schemes
Rhythm description schemes
Instrument description schemes
Some useful DSs currently in MPEG-7

4
Music Description Schemes Goals

Develop music description schemes that can be
used for the different CUIDADO prototypes and
partners
Keep MPEG-7 XML-schema compatibility
Address different musical layers
Address different abstraction levels (from lower
to higher levels)

5
Segment description

An AudioSegment is the basic unit for audio
description.
Segments can be decomposed into other segments.
Descriptors and other audio DSs can be included
in segment descriptions

6
Segment description
Some segment boundaries can be more clear than
others, therefore we need a low-level descriptor
for indicating their reliability
7
Melody description
8
Melody description pending issues

Mid-level descriptors derived from pitch contour
scale, tonality, interval distributions,
tessiture...
Mid-level descriptors derived from score or grid
deviations
Higher-level subjective terms (energetic, solemn,
etc.)

Input from other partners is needed
9
Rhythm description
10
Rhythm description pending issues

Tempo variation
Meter extraction
Deviations from structure
Connections with instrument labelling
Rhythm pattern taxonomies
Higher-level subjective terms

Input from other partners is needed
11
Instrument description
ClassificationScheme DS can be used for building
conceptual taxonomies (ex genre, subjective
ratings, etc...) SoundCategory DS can be used
specifically for instrument taxonomies
12
Instrument description
Current SoundModel DS assumes a SpectrumBasis
decomposition other possible modeling
strategies should be included
13
Auxiliary DSs (no new proposals here)

Creation and Production DSs are the right
structures for describing the information about
the creation and production of the multimedia
content (examples who was the producer, where
was recorded, how good it is subjectively
rated.)
Media DSs are the structures for describing the
media-specific information of the multimedia data
(examples physical format, profiles of coding
parameters)
Usage DSs are intended for describing the
possible way of using or having used the content
(examples owners of the rights, usage
permissions where, when, how, by whom-, cost of
the creation, user settings...)

14
Music Description Extractors and Transformers
(Deliverable 2.2.2)

Interesting segments extractor
Melody description extractor
Rhythm description extractor
Timescaling module

15
Interesting segments extractor

Implementation of Footes (2000) algorithm
( low self similarity means novelty)
Self-similarity scale is user-controlled
Interestingness is defined by choosing a set of
low-level descriptors
Best run in supervised mode (user gets graphs,
then select better parameter values)
Output segment boundary points novelty measure
Functional dependency on Low-level descriptors
(D2.1.1.) Integration not yet solved
Enhanced functionalities when combined with some
post-processing (i.e. solo location)

16
Interesting segments extractor
17
Interesting segments extractor

Note segmentation using energy, kurtosis, and F0

18
Interesting segments extractor

Section segmentation using energy, centroid and
kurtosis

19
Melody description extractor

Currently it is only intended for monophonic
phrases
Limitations
Note segmentation needs improvements (better
transient detection)

20
Melody description extractor
21
Rhythm description extraction module

Extracts timing and rhythmic data of drum loops
Consists of sub-modules
onset detector
pulses (i.e. rhythmic levels) detector
Tick
Tempo
Instrument labeler (yet to be implemented)

22
Rhythm description extraction module
23
Instrument description instrument labelling

Towards automatic labelling of percussive slices
First step automatic labelling of isolated
percussive sounds
Next proceed with mixtures of 2 and 3 sounds

24
Rhythm description instrument labelling(submitte
d to ICMC2002)
Kick-Snare-OpenHH-ClosedHH-HiTom-MidTom-LoTom-Cras
h-Ride
Membranes-Plates
Kick-Snare-Hihat-Tom-Cymbals
SKEWNESS gt 4.619122 AND B40HZ70HZ gt 7.784892
AND MFCC3 lt 1.213368 Kick (105.0/0.0) KURTOSIS
gt 26.140138 AND TEMPORALCE lt 0.361035
AND ATTZCR gt 1.478743 Tom (103.0/0.0) B710KHZ
lt 0.948147 AND KURTOSIS lt 26.140138 AND ATTZCR
lt 22.661397 Snare (133.0/0.0) SPECCENTROID gt
11.491498 AND B1015KHZ gt 0.791702 HH
(100.0/2.0) SKEWNESS lt 4.485531
AND B160HZ190HZ lt 5.446338 AND MFCC3VAR gt
0.212043 AND MFCC4 gt -0.435871 Cymbal
(110.0/3.0)

Database with gt 600 of rock drum sounds
Initial set of 50 descriptors reduced to 20-30
after selection

25
Timescaling module

Time compression and expansion of audio content
Very high quality (without timbre or pitch
alteration), transient and stereo image
preservation
Usable for content-based transformation (provided
content descriptions), not only as it is

26
Timescaling module

Some examples
Vocal
Orchestral
Jazz
Funk

27
Other contributions F0 estimation

Monophonic F0 detector based on Two-Way Mismatch
(Maher Beauchamp, 1993)
Integration into 2.1.1 modules is going to use
single frame estimations this may not be
optimal, as context (previous F0, next F0,
instrument, etc.) is not considered
Polyphonic F0 detector (Klapuri 2000) using
bandwise processing
Intended mainly for polyphonic-monotimbral
instruments, or small ensembles, not for dense
mixtures of sounds
Estimation of candidates is performed for each
analysis frame. Several candidates are obtained
The tracking of candidates is not still
implemented
Our main interest is in deriving a predominant F0
only

28
Other contributions polyphonic F0 detection
29
UPF CUIDADO team

Xavier Amatriain
Lars Fabig
Emilia Gómez
Günter Geiger
Fabien Gouyon
Gilles Peterschmitt
Julien Ricard
Perfecto Herrera

30
Whats next? (beyond retrieval)

Examples of achievable functionalities
Music structure visualization
Melody description visualization and manipulation
(content-based timescaling)
Rhythm loops processing
Matching songs in playlists by tempo

31
Whats next? Music Structure visualization
32
Whats next? Melody description visualization
and manipulation (content-based timescaling)
33
Whats next? Visualizing, navigating and editing
with rhythm marks

Pulse marks visualization
Pulse-based edition (duplicating parts, snap to
pulse mark) and navigation (skip to next tick)

34
Whats next? Combining rhythm and instrument
descriptions

Navigation by instrument occurrence (skip to next
snare)
Muting an instrument
Processing (applying an effect to) an instrument

35
Whats next? Combining rhythm and instrument
descriptions

Building MIDI maps, re-constructing the loop,
and generating timbral variations

Original audio file
Sound Database
Tick extraction
Labeling
Instrument Category and/or Timbre
similarity Query
Reconstruction
MIDI map
MIDI file
new audio file
36
Whats next? Combining rhythm descriptions and
timescaling
Transition from one pattern to another that do
not match in tempo
37
Whats next? Matching songs by tempo for
playlist generation-

Write a Comment

User Comments (0)